papi-5.3.0/0000700003276200002170000000000012247131324012137 5ustar ralphundrgradpapi-5.3.0/ChangeLogP413.txt0000600003276200002170000005200612247131117015104 0ustar ralphundrgrad2011-05-10 * src/Rules.pfm_pe: The --with-bitmode parameter was not being passed along to libpfm3, so it was not possible to build perf_event PAPI on non-default bitmodes. This change passes along the $(BITFLAGS) value to the libpfm3 make invocation. * src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: The perf_events code was using __u64 instead of uint64_t and this was causing a warning when compiling for 64-bit Power. * src/libpfm-3.y/lib/amd64_events_fam15h.h: Added Robert Richter's patch with a few new events for AMD Family 15h. 2011-05-06 * INSTALL.txt: Load the 'gcc' module not 'gnu' module for Cray. * INSTALL.txt: Update the install instructions for Cray XT and XE systems. * src/ctests/: multiattach.c, multiattach2.c: Make the multiattach and multiattach2 failures into warnings. I have a proposed fix that makes the failures go away, but it has not been tested much and also causes some new fcntl() error messages under perfctr. So temporarily make the tests only warn for the release and I'll work on a proper fix for after. The behavior in these tests has been broken for a long time so it is not a recent regression. * src/papi_memory.c: Band-aid for the leak debugging statement in papi_memory.c on NO_VARARG_MACRO systems. (aix currently) 2011-05-05 * src/ctests/multiattach.c: Had the division backwards on the validation. * src/ctests/multiattach.c: Update the multiattach test to fail if the results aren't in the proper ratio. This was failing on perf_event kernels but since the results weren't checked it was never reported as an error. * delete_before_release.sh: delete cvs2cl.pl before release * ChangeLogP413.txt: First cut change log for the 4.1.3 release. Nothing's frozen yet... * cvs2cl.pl: Perl script to generate change logs. Keeping it with the project makes life easier. * INSTALL.txt: Change INSTALL to reflect that we support power7. * src/Makefile.in, src/configure, src/configure.in, src/papi.h, doc/Doxyfile, doc/Doxyfile-everything, papi.spec: Modfy version number for pending release: 4.1.3.0 2011-05-03 * src/: papi_internal.c, papi_internal.h, sys_perf_event_open.c, ctests/attach2.c: Cleanup the _papi_hwi_cleanup_eventset() function in papi_internal.c This function was re-using existing functionality to remove one event at a time before cleaning out the eventset. This is not strictly necessary and was breaking on perf_event eventsets that were attached to finished processes, as a call to update_control_state() would close/reopen the perf_event fd, failing when the finished process went away after the close. The new code removes all events from the eventset in one go before calling update_control_state. The change here also updates code comments as necessary, as some of the code in papi_internal.c can be a bit obscure. It also updates some of the comments in ctests/attach2.c to give better debugging info. 2011-04-28 * src/threads.c: Uncomment the actual signal passing functionality in _papi_hwi_broadcast_signal * src/papi_debug.h: Include files added to papi_debug.h * src/components/README: Added detailed instructions on how to build PAPI with the CUDA component 2011-04-27 * src/threads.c: Move an escape test to the outer loop in _papi_hwi_broadcast_signal. This cleans up an infinite loop where before we would only break out of the component look, not the thread list walking loop. * src/: papi.c, papi_internal.c, papi_internal.h, papi_protos.h: Clean up papi_internal.c so that functions not used outside are marked static. * src/: papi_pfm_events.c, papi_preset.c, pmapi-ppc64_events.c: papi: Fix some memory leaks Signed-off-by: Robert Richter * src/perf_events.c: papi: Make functions and variables static in perf_events.c All this functions and variables are not used outside perf_events.c. Making them static. Signed-off-by: Robert Richter * src/papi_pfm_events.c: papi: Fix crash in error handler for pfm_get_event_code_counter() Signed-off-by: Robert Richter * src/utils/native_avail.c: papi: Fix error check in native_avail.c Signed-off-by: Robert Richter 2011-04-26 * src/libpfm-3.y/: include/perfmon/pfmlib_amd64.h, lib/pfmlib_amd64.c: AMD architectural PMU could not be detected for family 15h as there was a strict check for AMD family 10h. Enabling it now for all families from 10h. Signed-off-by: Robert Richter * src/libpfm-3.y/lib/amd64_events_fam15h.h: There is no kernel support for AMD family 15h northbridge events, disabling them in libpfm3 to not report them as available native events. Patch from Robert Richter * src/: configure, configure.in, linux-common.c: Add some extra debug messages for better tracking of the --with-assumed-kernel configure option. 2011-04-25 * src/: configure, configure.in, linux-common.c: Add a new configure option: --with-assumed-kernel= This allows you to specify a kernel revision to (instead of being autodetected with uname) for perf_event workaround purposes. With this you can force PAPI to not use workarounds on kernels with backported versions of perf_event features. 2011-04-19 * src/: Makefile.inc, configure, configure.in, papi_debug.h, papi_internal.h, sys_perf_event_open.c: Add debugging to sys_perf_event_open.c to show exactly what values are being passed to the perf_event_open syscall. 2011-04-18 * src/: run_tests.sh, ctests/attach2.c, ctests/attach3.c: Fix for finding attach_target with execlp to search the path. 2011-04-14 * src/: Rules.pfm, configure, configure.in, linux-ia64-pfm.h, linux-ia64.c, linux-ia64.h, perfmon-ia64-pfm.h, perfmon-ia64.c, perfmon-ia64.h, perfmon.h: Rename the linux-ia64-* files to be called perfmon-ia64-* This is a more descriptive name, and makes it more obvious what the files are for. * src/libpfm-3.y/: include/perfmon/pfmlib_amd64.h, lib/pfmlib_amd64.c, lib/pfmlib_amd64_priv.h: Patch to have libpfm3 use 6 counters on Interlagos. Patch provided by Robert Richter * src/linux-memory.c: Fix the POWER cache detection routines to work properly on POWER7. Patch provided by Corey Ashford * src/: configure, configure.in: Have configure check for ifort if gfortran, etc, not found. Patch by Gary Mohr * src/ctests/johnmay2.c: Update the validation message on the ctests/johnmay2.c test to be less confusing. Also add some comments to the source code. Problem reported by Steve Kaufmann. 2011-04-13 * src/ctests/: multiattach2, multiattach2.c: Remove the accidentally added ctests/multiattach2 and add instead the proper ctests/multiattach2.c * src/Makefile.inc: components_config.h is cleaned out with make clobber, not make clean this should fix the build bot issues. * src/ctests/: Makefile, attach3.c, multiattach.c, multiattach2, zero_attach.c: Minor typos in comments. Discovered another bug in attach code demonstrated by multiattach2. You cannot have an eventset running that is self counting as well as one that is attached. PAPI thinks that both are running and throws an error. * src/perf_events.c: We must update the control state after attaching for perf_events, zero_attach now passes * src/ctests/: Makefile, attach2.c, attach3.c, do_loops.c: This commit adds testing of attaching to fork/exec'd executables. zero_attach and multiattach just test forks. This also modifies do_loops.c to be able to generate a test driver when -DDUMMY_DRIVER is defined so we can use it to generate flops as a sub process. Attach2 and attach3 have one important difference. Attach3 does a 'assign component' before attaching and then adding events. Attach2 does not assign a component and thus should inherit the default component. The current bug in PAPI is that: * The default component is not assigned until you add an event. * However, attaching an eventset without events is perfectly valid, but we get an error. Possible solution is that the default component should be assigned at create time. 2011-04-12 * src/ctests/multiattach.c: Make sure the two processes compute different numbers of flops to test attach 2011-04-05 * src/power7_events.h: Turns out Maynard Johnson answered my questions about the native_name enum back in December. ( this is a correct version of the events file ) As I found out, the AIX substrates do not use the native_name enum. But a hypothetical perfctr build would. 2011-04-04 * src/Makefile.inc: Clear out the components_config.h file on make clobber * src/: aix.c, power7_events.h: Initial support for power7 aix, the events file is a copy of power6_events.h with the number of groups changed. The native_name enum is unchanged, but unused? 2011-04-01 * src/configure.in: Commited wrong configure.in * src/: configure, configure.in: Clean up setting bitmode flags for non-gcc (xlc in this case) compilers. * src/papi_events.csv: Change the Nehalem PAPI_FP_OPS event from FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION+FP_COMP_OPS_EXE:DOUBLE_PRECISION to FP_COMP_OPS_EXE:SSE+FP_COMP_OPS_EXE:X87 The new event gives the same results as the previous one, with the added benefit of also counting 32-bit compiled x87 fp ops properly. More detailed analysis can be found here: http://web.eecs.utk.edu/~vweaver1/projects/nehalem-fp_ops/ 2011-03-28 * src/utils/multiplex_cost.c: Turns out that getopt_long isn't as standard as I had hoped. Convert multiplex_cost to use only getopt. -s disables software multiplexing -k disables kernel multiplexing 2011-03-25 * src/: configure, utils/Makefile, utils/multiplex_cost.c, configure.in: Multiplex_cost utility. * src/utils/: Makefile, cost.c, cost_utils.c, cost_utils.h: Split off the statistics functions from cost. 2011-03-22 * src/: run_tests_exclude_cuda.txt, run_tests.sh: Exclude some fork/thread tests from fulltest that won't run with CUDA (reason: cannot invoke same GPU from different threads) 2011-03-21 * src/utils/cost.c: Add a test for DERIVED_[ADD | SUB ] events to papi_cost. 2011-03-18 * src/components/cuda/linux-cuda.c: all_native_events ctest failed when CUDA Component is used. Reason: removing cuda events from the eventset is currently not supported. According to the NVIDIA folks this is a bug in cuda 4.0rc and will be fixed in rc2. Note also, several fork and thread tests fail since it's illegal to invoke the same GPU device from different processes / threads. We need a mechanism that allows us to run tests for the CPU component only. 2011-03-15 * src/utils/cost.c: Add a test case to cost util, look for a derived-postfix event and if found, give timing information for read calls to it. This is just a first run at the test, Core2 and AMD have candidate events and the test runs, but that is the extent of my testing so far. 2011-03-11 * src/components/: README, cuda/Makefile.cuda.in, cuda/Rules.cuda, cuda/configure, cuda/configure.in, cuda/linux-cuda.c, cuda/linux-cuda.h: Added CUDA component, a hardware performance counter measurement technology for the NVIDIA CUDA platform which provides access to the hardware counters inside the GPU. PAPI CUDA is based on CUPTI support - shipped with CUDA 4.0rc - in the NVIDIA driver library. In any environment where the CUPTI-enabled driver is installed, the PAPI CUDA component can provide detailed performance counter information regarding the execution of GPU kernels. * src/components/: coretemp/linux-coretemp.c, lustre/linux-lustre.c: Add some missing includes to components. Thanks to Will Cohen for reminding us warnings matter. :) * src/: configure, configure.in, perf_events.c: The SYNC_READ workaround in perf_events.c was being handled at compile time, rather than at run-time like all of our other workarounds. Change it to be like our other kernel-version related workarounds. 2011-03-09 * src/ctests/multiplex1_pthreads.c: Between 4.0.0 and 4.1.0 a pthread_exit() call was added to ctest/multiplex1_pthreads.c that caused the test to exit partway through the test and without doing a proper PASS/FAIL result. This changeset backs out that change, though the original change was marked as a memory leak fix so a different fix may be needed. Reported by Steve Kaufmann * src/linux-timer.c: Add missing header needed by --with-virtualtimer=times build. Reported by Steve Kaufmann 2011-03-01 * src/: papi_pfm_events.c, perf_events.c: Fix broken Linux/PPC build caused by my pfm_events code movement changes. 2011-02-25 * src/: papi_pfm_events.c, papi_pfm_events.h, perfctr-x86.h: My changes yesterday broke the perfctr build. This should fix it. 2011-02-24 * src/ctests/inherit.c: Make the inherit test respect TESTS_QUIET so that it does not print extra output during a run_tests.sh run * src/ctests/overflow.c: Fix missing newline in the overflow output. Reported by Gary Mohr * src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: Move the libpfm3 specific functions from perf_events.c into papi_pfm_events.c * src/perf_events.c: Separate the libpfm3-specific code from _papi_pe_init_substrate() and _papi_pe_update_control_state() into their own functions. This will allow eventual code sharing and also make the libpfm4 merge easier. * src/perf_events.c: Some minor cleanups I found after reviewing the inherit merge. + Add missing "static inline" to the new kernel-version codes + Remove duplicated test for Pentium 4 + Fix a warning only seen if --with-debug is enabled * src/: papi.c, papi.h, papi_internal.h, perf_events.c, perf_events.h, ctests/Makefile, ctests/inherit.c, ctests/test_utils.c: Merging Gary Mohr's re-implementation of inherit into the code base. Thanks, Gary! 2011-02-23 * src/: any-null.h, freebsd.h, linux-bgp.h, linux-common.c, linux-common.h, linux-context.h, linux-ia64.c, linux-ia64.h, linux-lock.h, linux-memory.c, linux-ppc64.h, linux-timer.c, papi_internal.h, papi_pfm_events.c, perf_events.c, perf_events.h, perfctr-x86.h, perfctr.c, perfmon.h, solaris-niagara2.h, solaris-ultra.h, solaris.h, x86_cache_info.c: Move some more duplicated OS common code (in this case the locking code and the context accessing code) out of the various substrate include files and into a common location. 2011-02-22 * src/perf_events.c: Separate out the kernel-version dependent checks and group them together near the beginning of the code. This not only allows us to easily see which routines are kernel-version dependent, but it makes it easier to disable the checks one-by-one when debugging kernel-version related issues like those found with the inherit patches. 2011-02-21 * src/papi_internal.c: Extend _papi_hwi_cleanup_eventset to free memory and better cleanup after us. 2011-02-18 * src/papi.c: PAPI_assign_eventset_component changed; refuses to reassign components. 2011-02-17 * src/: papi_events.csv, libpfm-3.y/include/perfmon/pfmlib.h, libpfm-3.y/lib/amd64_events.h, libpfm-3.y/lib/amd64_events_fam10h.h, libpfm-3.y/lib/amd64_events_fam15h.h, libpfm-3.y/lib/pfmlib_amd64.c, libpfm-3.y/lib/pfmlib_amd64_priv.h: Add support for AMD Family 15h processors. Also adds suport for Family 10h RevE Patches provided by Robert Richter * src/utils/native_avail.c: Modify papi_native_avail to properly handle event names with libpfm4-style "::" separators in them. 2011-02-15 * src/Makefile.inc: make install-doxyman will build/install the doxygen version of the manpages. Note that these pages are very rough right now, much work is needed to get them to be a drop in replacement for the current man pages. (mostly formatting related/use related issues, eg man PAPI_start will not work yet; the content is there.) * doc/Makefile: Add install target for doxygen generated man pages. 2011-02-11 * src/: perfctr-x86.c, perfctr.c: perfctr-2.6.42 introduced PERFCTR_X86_INTEL_WSTMR PAPI added support for PERFCTR_X86_INTEL_WSMR notice the missing T Fix PAPI to use the proper define. This should fix Westmere support on perfctr kernels. 2011-02-09 * src/: papi_protos.h, papi_vector.c, papi_vector.h, papi_vector_redefine.h: Added function pointer destroy_eventset to the PAPI vector table. Needed for the CUDA Component to disable CUDA eventGroups, to destroy floating CUDA context, and to free perfmon hardware on the GPU. (Note: the CUDA Component cannot be released yet since we are still under NDA with NVIDIA. Stay tuned.) 2011-02-07 * src/x86_cache_info.c: The cpuid leaf2 code was printing a message to stderr if leaf4 was needed (only happens on Westmere currently). Change this to be a MEMDBG() debug message instead. 2011-02-03 * src/: papi_events.csv, perfctr-x86.c: perfctr-x86 was reporting "Core i7" instead of "Nehalem". i7 can mean Westmere or Sandy Bridge too, so change the code to properly report Nehalem. 2011-01-27 * src/ctests/all_native_events.c: Fix this ctest. It failed when the package was built with several components because the eventset was reused and failed to add events that were not from the first component. In order to fix it, I recreate & destroy the eventset when the current event does not belong to the previous component. 2011-01-26 * src/: configure, configure.in, linux-timer.c, perfmon.c: Fix Cray CLE build. * src/: configure, configure.in: Putting -Wall in cflags now requires CC = gcc * src/: aix.c, freebsd.c, linux-bgp.c, linux-common.c, linux-memory.c, linux-memory.h, papi.c, papi_protos.h, papi_vector.c, papi_vector.h, solaris-niagara2.c, solaris-ultra.c, windows-common.c, windows-memory.c: Change the paramaters passed to update_shlib_info() to match better with those passed to get_system_info(). This only affects the substrates, outside users of PAPI will not notice this change. 2011-01-25 * src/: configure, configure.in: Make sure that aix gets -g. * src/: configure, configure.in: Give everyone else -g when configuring with debug. To wit, we pass gcc -g3 but neglected platforms where CC!=gcc. * src/aix.c: First run at supporting power7. NOTE: this code is only good for getting event listings eg papi_native_avail, passing PM_GET_GROUPS causes our code to segfault later on, a buffer overflow I'm still tracking down. * src/perfctr-x86.c: Accidentally converted a function to _perfctr_ that should have stayed _linux_. * src/: perfctr-x86.c, perfctr.c: Rename the various perfctr functions to be _perfctr_ rather than _linux_. This way _linux_ is reserved for the common functions used by all. * src/: linux-common.c, linux-memory.c, linux-timer.c, perf_events.c, windows-common.c, windows-memory.c, windows-timer.c: Split the WIN32 specific code out from the new linux common code. In most cases very little code was shared (it tended to be a big #ifdef block) and it is confusing to have windows-specific code in files named linux-* 2011-01-24 * src/linux-timer.c: Fix a compile error that only shows up on PPC. * src/linux-timer.c: Fix compile warning if mmtimer is enabled. * src/perfctr-x86.c: Missing comma in the perfctr code. * src/: Makefile.inc, aix.c, configure, configure.in, hwinfo_linux.c, linux-bgp.c, linux-common.c, linux-common.h, linux-ia64.c, linux-timer.c, linux-timer.h, papi_vector.h, perf_events.c, perfctr-x86.c, perfctr.c, perfmon.c, solaris-niagara2.c, solaris-ultra.c: One last batch of consolidation changes. This one moves get_system_info and get_cpu_info into linux-common.c, plus moves some other routines from perf_events.c there that are shared by the future libpfm4 version. Some non-linux substrates are touched here; these are just short fixes to make sure the get_system_info() function pointed to by the papi_vector has the same format on all substrates. * src/: Makefile.inc, configure, configure.in, linux-memory.c, linux-memory.h, perf_events.c, perfctr-x86.c, perfctr.c, perfmon.c: Move the various Linux update_shlib_info() functions into a common place. * src/: Makefile.inc, linux-timer.c, linux-timer.h, perf_events.c, perfctr-x86.c, perfctr.c, perfmon.c: Move the various timer-related functions to linux-timer.c This gets rid of the duplicated code spread throughout the substrates. 2011-01-21 * delete_before_release.sh, release_procedure.txt: Updated the release docs with what I learned when making the 4.1.2.1 release. * src/: configure, configure.in, freebsd-memory.c, linux-ia64-memory.c, linux-memory.c, linux-memory.h, linux-mx-memory.c, linux-ppc64-memory.c, perf_events.c, perfctr-x86.c, perfmon-memory.c, perfmon.c: Currently there are at least 3 identical copies of the linux memory detection code spread throughout the PAPI source code. This change puts them all in linux-memory.c, and then has all the individual substrates use the common code. papi-5.3.0/ChangeLogP412.txt0000600003276200002170000004360212247131117015105 0ustar ralphundrgrad2011-01-17 * src/configure: Ran autoconf to generate updated configure file. 2011-01-16 * src/components/README: Adding a component for the FreeBSD OS that reports the value of the thermal sensors available in the Intel Core processors. There are as many counters as cores, and the value reported by each counter is in Kelvin degrees. * src/freebsd.c: Implemented missing _papi_freebsd_ntv_name_to_code. * src/: Makefile.in, Makefile.inc, configure.in, ctests/Makefile: Fix dependency on -ldl Now configure checks if dl* symbols are in the base system libraries (i.e., no -ldl needed). If so, avoid adding -ldl to shlib example. If dl* symbols are not find in the base system libraries, then check for -ldl, and if it exists, pass it to ctests/Makefile through Makefile. If -ldl is not found, fail at configure time. * src/ctests/multiattach.c: Fix to compile in FreeBSD. * src/: freebsd-memory.c, freebsd.c: Code cleanup. 2011-01-14 * src/: perf_events.c, perfmon.c: [PATCH 18/18] papi: make _perfmon2_pfm_pmu_type variable static In perf_events.c and perfmon.c the variable _perfmon2_pfm_pmu_type is used locally only, making it static. Signed-off-by: Robert Richter * src/: linux-bgp.c, linux-ia64.c, perf_events.c, perfctr.c, perfmon.c: [PATCH 17/18] papi: remove inline_static macro in Linux only code We better replace the macro with 'static inline'. Not sure if this works for all compilers, so doing it for Linux only files. Signed-off-by: Robert Richter * src/x86_cache_info.c: [PATCH 16/18] papi: remove static inline function declaration By moving the static inline function cpuid() to the begin of the file we may remove its declaration. Signed-off-by: Robert Richter * src/linux.h: [PATCH 15/18] papi: remove unused linux.h header file This file is included nowhere, removing it. Signed-off-by: Robert Richter * src/linux-ia64.c: [PATCH 14/18] papi: fix array out of bounds access Fixing the following warning: linux-ia64.c: In function ?_ia64_init_substrate?: linux-ia64.c:1123:22: warning: array subscript is above array bounds Signed-off-by: Robert Richter * src/: configure, configure.in: [PATCH 13/18] papi: remove unnecassary checks in configure.in The check is obsolete and covered by default. Signed-off-by: Robert Richter * src/: papi_pfm_events.c, perf_events.c, perfmon.c, perfmon.h: [PATCH 12/18] papi: include perfmon header files only where necessary This patch includes perfmon header files only where necessary. Declarations in perfmon/perfmon.h are never used, removing its inclusion. Itanium header files are needed only in perfmon.c and perf_events.c. Signed-off-by: Robert Richter * src/: papi_pfm_events.c, perfctr-x86.c: [PATCH 11/18] papi: make some functions in papi_pfm_events.c static Functions _pfm_decode_native_event() and _pfm_convert_umask() are internally used only. Remove export declaration and make it static. Signed-off-by: Robert Richter * src/: Rules.pfm, linux-ia64-pfm.h, linux-ia64.c, pfmwrap.h: [PATCH 10/18] papi: rename pfmwrap.h -> linux-ia64-pfm.h pfmwrap.h actually only contains IA64 code included by linux-ia64.c. Rename it to linux-ia64-pfm.h. Signed-off-by: Robert Richter * src/: linux-ia64.c, pfmwrap.h: [PATCH 09/18] papi, linux-ia64: make inline functions static Inline functions should be static. Fixing it. Signed-off-by: Robert Richter * src/: linux-ia64.c, papi_pfm_events.c: [PATCH 08/18] papi: fix _papi_pfm_ntv_name_to_code() function interface The function is supposed to return a PAPI error code which is an integer. Make the function's return code an integer too. Signed-off-by: Robert Richter * src/perfctr-ppc64.c: [PATCH 07/18] papi: fix spelling modifer -> modifier Fix spelling: modifer -> modifier Signed-off-by: Robert Richter * src/: linux-ia64.c, papi_pfm_events.c, papi_pfm_events.h, perf_events.c, perfctr-x86.c, perfmon.c: [PATCH 06/18] papi: define function interface in papi_pfm_events.h The header file should define the interface that papi_pfm_events.c provides. Declarations used internally only in papi_pfm_events.c are moved there. Now papi_pfm_events.h only contains the prototype functions. Remapping of definitions is removed too. This cleanup removes duplicate code and better defines the interface. Signed-off-by: Robert Richter * src/: Rules.perfctr, Rules.perfctr-pfm, linux.c, multiplex.c, papi_vector.c, perfctr-x86.c, perfctr.c, ctests/test_utils.c: [PATCH 05/18] papi: rename linux.c -> perfctr.c The name of linux.c is misleading, it only implements perfctr functionality. Thus renaming it to perfctr.c. Signed-off-by: Robert Richter * src/: papi_pfm_events.c, perfctr-x86.c: [PATCH 04/18] papi: make _papi_pfm_init() static by moving it to perfctr-x86.c _papi_pfm_init() is only used in perfctr-x86.c but implemented in papi_pfm_events.c. Move it to perfctr-x86.c and make it static. Signed-off-by: Robert Richter * src/perfmon.c: [PATCH 03/18] papi: make some functions static in perfmon.c The functions are only used in perfmon.c, making it static. Signed-off-by: Robert Richter * src/: Rules.pfm, Rules.pfm_pe: [PATCH 02/18] papi: do not compile libpfm examples to support cross compilation Signed-off-by: Robert Richter * src/Rules.pfm: To cross compile papi we need to pass the architecture to libpfm. Otherwise it will be confused and tries to build the host's make targets with the cross compiler ending up in the following error: pfmlib_amd64.c: In function ?cpuid?: pfmlib_amd64.c:166:3: error: impossible register constraint in ?asm? pfmlib_amd64.c:172:1: error: impossible register constraint in ?asm? make[2]: *** [pfmlib_amd64.o] Error 1 Signed-off-by: Robert Richter * src/ctests/Makefile: Temporarily back out the FreeBSD makefile change that breaks the build so that I can properly test some other changes. * src/papi_events.csv: Change the Core2 L1_TCM preset to be LLC_REFERENCES The current event (L2_RQSTS:SELF:MESI) returns an event equivelent to LLC_REFERENCES on libpfm3, but in libpfm4 L2_RQSTS:SELF:MESI maps instead to L2_RQSTS:SELF:MESI:ALL which counts prefetches too. By moving to LLC_REFERENCES both libpfm3 and libpfm4 count the proper value. This also makes the "tenth" benchmark pass when using PAPI/libpfm4. * src/configure: Update to match current configure.in * src/ctests/Makefile: Fix the if / fi syntax of the last change. 2011-01-13 * src/: Makefile.inc, configure.in, freebsd-memory.c, freebsd.c, ctests/Makefile, ctests/zero_attach.c: Changes from Harald Servat for freebsd support. Note that configure has not been regenerated from this version of configure.in. * papi.spec, doc/Doxyfile, doc/Doxyfile-everything, src/Makefile.in, src/configure.in, src/papi.h: Change version numbers to 4.1.2 in preparation for a release. 2011-01-12 * src/ctests/code2name.c: The code2name test was assuming that the native events start right at PAPI_NATIVE_MASK. We specifically document elsewhere this might not be the case, and indeed for the libpfm4 code this fails. This fix changes the code to properly enunmerate the native events for the test. 2011-01-06 * src/: papi.c, papi_internal.c: Fix a long-standing bug where we were walking off the end of the EventInfoArray in remap_event_position(). This was noticed by Richard Strong when instrumenting some of the PARSEC benchmarks. In papi_internal.c in the remap_event_position() function we have the loop for ( i = 0; i <= total_events; i++ ) { It seems weird that we are doing a <= compare, and in fact this is why we walk off the end of the array sometimes. But why only somtimes? If I change that <= to a < then many of the regression tests fail. It turns out that the two calls to remap_event_position() in papi_internal.c are called with ESI->NumberOfEvents being one less than it should be, as it is incremented after the remap_event_position() call (though the new events are added before the call). This is why <= is used. However the call in PAPI_start() happens with ESI->NumberOfEvents with the right value. In this case < should be used. The fix I've come up with has a NumberOfEvents value passed in as a parameter to remap_event_position(). This way the value+1 can be passed in the former cases. 2010-12-20 * src/aix.c: Problem on POWER6 with AIX: pm_initialize() cannot be called multiple times with PM_CURRENT. Instead, use the actual proc type - here PM_POWER6 - and multiple invocations are no longer a problem. Ctests/multiplex1.c passes now. 2010-12-15 * src/run_tests.sh: If we don't run any tests, get buildbot's attention. 2010-12-14 * src/aix.c: number_of_nodes var was set to zero in _aix_get_system_info. This caused the papi utilities to report that the number of total CPUs is zero. This also caused ctests/hwinfo to fail on POWER6 with AIX. 2010-12-13 * src/papi_internal.h: Slight re-ordering of the no_vararg_macro debug statements. (I actually tested the changes with --with-debug and without on aix) 2010-12-10 * src/run_tests.sh: Change the syntax on our find command to be more posix compliant. GNU is Not UNIX, cute acronym or massive compatibility conspiracy. I fall back to posix, you decide! * src/: configure, configure.in: Update configure file to be aware of the existence of AIX-Power7. PAPI still won't build, but it gets further than before. 2010-12-09 * src/run_tests.sh: Make our grep invocation posix compliant. (--invert-match == -v & --regex == -e ) * src/ctests/overflow_allcounters.c: Separate 'indent' check-in so that the previous modifications are comprehensible :) * src/ctests/overflow_allcounters.c: The overflow_allcounters test failed on Power6 with AIX (pmapi) but passes on Power6 with linux (perf_events | perfctr). Therefore detect if we're running on AIX, print a warning, but still pass the test. * src/run_tests.sh: Move away from echo -n to the shell builtin printf (echo -n is not portable) non-argumented instances of echo are fine. * src/run_tests_exclude.txt: Skip the non-test ctests/burn executable. * src/Matlab/: PAPI_Matlab.c, PAPI_Matlab.readme: Change documentation for matlab integration to reflect the need to link to the libpapi.so library and not the static one. Also listed me and the ptools-perfapi list as points of contact for future questions *gulp* 2010-12-08 * src/: configure, configure.in, run_tests.sh: Clean up (purge) references to libpfm-2.x in configure and run_tests.sh * src/Matlab/PAPI_Matlab.c: MATLAB fixups: Calls to PAPI('stop') now stop counting even if we ignore the return values. * src/Matlab/PAPI_Matlab.c: Fixup for papi matlab integration. Calls to PAPI('stop') don't cause errors now. If you call PAPI('stop') with out capturing its return value, it does nothing. * src/Matlab/PAPI_Matlab.c: mex does not like c++ style comments (double-backslash) 2010-12-06 * src/solaris-ultra.c: Resolved a couple type cast warnings. Also initialized a variable and enabled GET_OVERFLOW_ADDRESS code in two places. The overflow test suite still has a number of failures and is disabled in configure. 2010-11-24 * src/papi_internal.h: That last commit was lacking in creativity... By having the debug function names still a macro, we get all the goodness of __FILE__ etc bing in the right place and still not using variadic macros. #define SUBDBG do{ if (_papi_hwi_debug & DEBUG_SUBSTRATE ) print_the_label; } while (0); _SUBDBG was the clever line that eluded me yesterday. 2010-11-23 * src/papi_internal.h: Turns out that when DEBUG and NO_VARARG_MACRO are true, we didn't correctly implement component-level debug functions. This change uses variable argument lists ( man stdarg) to correctly handle this case. ( papi_internal.h defines these) Note that debugging information is not completly useful; due to functions which use variable argument lists not being inlinable ( the inline keyword is afterall only a sugestion), all messages appear to come from papi_internal.h:PAPIDEBUG:525:22619 and I am not clever enough to get around that in general right now. Thanks to Maynard Johnson for reporting. * src/papi_events.csv: Enable the PAPI_HW_INT event on Nehalem, as tests show the HW_INT:RCV event is the proper one to use here. 2010-11-22 * src/papi_events.csv: Update the preset events for Nehalem, as contributed by Michel Brown. 2010-11-19 * src/: perf_events.h, perf_events.c: Address problem with overflow handler continuing to count events. Add overflow status field to determine if an event set has any events enabled for overflow. Use IOC_REFRESH instead of IOC_ENABLE when overflowing. Implement IOC_REFRESH at end of overflow handler. None of this worked. Also implemented an IOC_DISABLE at top of overflow handler. That worked, even though it's suboptimal. 2010-11-17 * src/utils/command_line.c: test_fail_exit() substituted for test_fail(). This became necessary because PAPI_event_name_to_code now returns a PAPI_EATTR error if the base name matches but attribute names don't. This utility was producing an error message and then running the test. Perfctr implementations will happily add a base name with no umasks and then generate 0 counts. This fix prevents that behavior. * src/ctests/test_utils.c: Rewrite of test_fail_exit() to call test_fail(). It should be noted that test_fail_exit() behaves the way test_fail() used to behave, i.e. it exits after printing the fail message. However, test_fail no longer exits as that was causing problems with multi-threaded tests not freeing memory. In those cases where an exit is desired, calls to test_fail_exit() should be substituted for calls to test_fail(). * src/: papi.h, papi_data.c, papi_pfm_events.c, perfmon.c: Added 3 new error codes: PAPI_EATTR, PAPI_ECOUNT, and PAPI_ECOMBO. These map onto equivalent errors in libpfm and are provided to give more detail on failures in libpfm calls. A new error mapping function has been added to papi_pfm_events.c to map libpfm errors to PAPI errors, and this function is employed in the compute_kernel_args function in perfmon.c. It could also be deployed elsewhere, but so far is not. 2010-11-09 * src/x86_cache_info.c: The cpuid change yesterday broke compilation on a 32-bit Pentium 3. Fix the inline assembly to compile properly there too. 2010-11-08 * src/: configure, configure.in: Fix configure script to properly detect Pentium M machines. * src/x86_cache_info.c: Add cpuid leaf4 cache detection support. This has been available on intel processors since Late model P4s and all Core2 and newer. It returns cache info in a different way than the older leaf2 method. Currently we only use leaf4 data if the leaf2 results tell us to (apparently Westmere does that). Otherwise we use the old method. It might be interesting to use more of the leaf4 info. It can tell us things such as how many processors share a socket, how many processors share a cache, and info on the inclusivity of a cache. * src/: linux.c, perfctr-x86.c: Add perfctr Westmere support. * src/perfctr-2.6.x/: patches/aliases, usr.lib/Makefile: Fix conflicts from perfctr merge. 2010-11-06 * src/perf_events.c: Replace KERNEL_CHECKS_SCHEDUABILITY_UPON_OPEN with the proper dynamic kernel version number checking. This should be the last place in our perf_events code that was using a hard-coded rather than dynamic check for a kernel-version related bugfix. * src/perf_events.c: This patch allows PAPI to read multiple events at a time out of the kernel when the kernel is new enough (2.6.34 or newer). The previous code required setting a #define by hand to get this behavior, this new code picks the proper way to do things based on the kernel version number. The patch was supplied by Gary Mohr 2010-11-04 * src/: linux.c, perfctr-x86.c: Replace occurrances of PERFCTR_X86_INTEL_COREI7 with PERFCTR_X86_INTEL_NHLM as the former has been documented as being deprecated as of perfctr 2.6.41. 2010-11-03 * src/cycle.h: Change "unicos" to "CLE" since "unicos" no longer exists. 2010-10-26 * src/examples/locks_pthreads.c: Add a call to PAPI_thread_init(), Thanks to Martin Schindewolf for pointing this out. 2010-10-21 * src/: papi.c, components/lmsensors/linux-lmsensors.h: Fixup url's that checkbot was finding in error. 2010-10-05 * src/ctests/: multiattach.c, zero_attach.c: The zero_attach and multiattach were forking before off children before testing that PAPI in fact is available. Then when PAPI_init() failed the children weren't being cleaned up properly. This was confusing build bot. This changeset moves the fork to after the check plus do a fail_exit() on failure. * src/: configure, configure.in: Solaris build will fail if /usr/ccs/bin isn't in the path. Have it check there for "ar" on Solaris systems if it can't be found by normal methods. * src/: configure, configure.in: Only run the EAR tests on itanium systems. * src/: configure, configure.in: Pentium4-perfctr was skipping most of the CTESTS. Make sure they are all run. papi-5.3.0/doc/0000700003276200002170000000000012247131324012704 5ustar ralphundrgradpapi-5.3.0/doc/Doxyfile-man10000600003276200002170000000477112247131117015257 0ustar ralphundrgrad# This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for PAPI utilities man-pages # The following overrides default values in Doxyfile-common # # All text after a hash (#) is considered a comment and will be ignored # The format is: # TAG = value [value, ...] # For lists items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (" ") @INCLUDE = Doxyfile-common #--------------------------------------------------------------------------- # configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag can be used to specify the files and/or directories that contain # documented source files. You may enter file names like "myfile.cpp" or # directories like "/usr/src/myproject". Separate the files or directories # with spaces. INPUT = ../src/utils/ FILE_PATTERNS = *.c # The RECURSIVE tag can be used to turn specify whether or not subdirectories # should be searched for input files as well. Possible values are YES and NO. # If left blank NO is used. RECURSIVE = YES # If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) # defined locally in source files will be included in the documentation. # If set to NO only classes defined in header files are included. EXTRACT_LOCAL_CLASSES = NO #--------------------------------------------------------------------------- # configuration options related to the man page output #--------------------------------------------------------------------------- # If the GENERATE_MAN tag is set to YES (the default) Doxygen will # generate man pages GENERATE_MAN = YES # The MAN_OUTPUT tag is used to specify where the man pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `man' will be used as the default path. MAN_OUTPUT = man # The MAN_EXTENSION tag determines the extension that is added to # the generated man pages (default is the subroutine's section .3) MAN_EXTENSION = .1 # If the MAN_LINKS tag is set to YES and Doxygen generates man output, # then it will generate one additional man file for each entity # documented in the real man page(s). These additional files # only source the real man page, but without them the man command # would be unable to find the correct page. The default is NO. MAN_LINKS = NO papi-5.3.0/doc/Makefile0000600003276200002170000000151212247131117014345 0ustar ralphundrgrad.PHONY: clean install force_me all all: man @echo "Built PAPI user documentation" html: force_me doxygen Doxyfile-html man: man/man1 man/man3 man/man3: ../src/papi.h ../src/papi.c ../src/papi_hl.c ../src/papi_fwrappers.c doxygen Doxyfile-man3 man/man1: ../src/utils/avail.c ../src/utils/clockres.c ../src/utils/command_line.c ../src/utils/component.c ../src/utils/cost.c ../src/utils/decode.c ../src/utils/error_codes.c ../src/utils/event_chooser.c ../src/utils/event_info.c ../src/utils/mem_info.c ../src/utils/multiplex_cost.c ../src/utils/native_avail.c ../src/utils/version.c doxygen Doxyfile-man1 clean: rm -rf man html doxyerror install: man -rm -f man/man3/HighLevelInfo.3 -rm -f man/man3/papi_data_structures.3 -rm -r ../man/man1/*.1 ../man/man3/*.3 -cp -R man/man1/*.1 ../man/man1 -cp -R man/man3/*.3 ../man/man3 papi-5.3.0/doc/Doxyfile-html0000600003276200002170000004570312247131117015367 0ustar ralphundrgrad# Doxyfile 1.6.2 # This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for a project # # All text after a hash (#) is considered a comment and will be ignored # The format is: # TAG = value [value, ...] # For lists items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (" ") @INCLUDE = Doxyfile-common #--------------------------------------------------------------------------- # Configuration options related to the preprocessor #--------------------------------------------------------------------------- # If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will # evaluate all C-preprocessor directives found in the sources and include # files. ENABLE_PREPROCESSING = YES # If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro # names in the source code. If set to NO (the default) only conditional # compilation will be performed. Macro expansion can be done in a controlled # way by setting EXPAND_ONLY_PREDEF to YES. MACRO_EXPANSION = YES # If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES # then the macro expansion is limited to the macros specified with the # PREDEFINED and EXPAND_AS_DEFINED tags. EXPAND_ONLY_PREDEF = YES # The PREDEFINED tag can be used to specify one or more macro names that # are defined before the preprocessor is started (similar to the -D option of # gcc). The argument of the tag is a list of macros of the form: name # or name=definition (no spaces). If the definition and the = are # omitted =1 is assumed. To prevent a macro definition from being # undefined via #undef or recursively expanded use the := operator # instead of the = operator. PREDEFINED = DEBUG # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then # this tag can be used to specify a list of macro names that should be expanded. # The macro definition that is found in the sources will be used. # Use the PREDEFINED tag if you want to use a different macro definition that # overrules the definition found in the source code. EXPAND_AS_DEFINED = PAPIERROR LEAKDBG MEMDBG MPXDBG OVFDBG PAPIDEBUG SUBDBG PRFDBG INTDBG THRDBG APIDBG #--------------------------------------------------------------------------- # Build related configuration options #--------------------------------------------------------------------------- # If the CREATE_SUBDIRS tag is set to YES, then doxygen will create # 4096 sub-directories (in 2 levels) under the output directory of each output # format and will distribute the generated files over these directories. # Enabling this option can be useful when feeding doxygen a huge amount of # source files, where putting all generated files in the same directory would # otherwise cause performance problems for the file system. CREATE_SUBDIRS = YES # If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in # documentation are documented, even if no documentation was available. # Private class members and static file members will be hidden unless # the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES EXTRACT_ALL = YES # If the EXTRACT_STATIC tag is set to YES all static members of a file # will be included in the documentation. EXTRACT_STATIC = YES # The INTERNAL_DOCS tag determines if documentation # that is typed after a \internal command is included. If the tag is set # to NO (the default) then the documentation will be excluded. # Set it to YES to include the internal documentation. INTERNAL_DOCS = YES # If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate # file names in lower-case letters. If set to YES upper-case letters are also # allowed. This is useful if you have classes or files whose names only differ # in case and if your file system supports case sensitive file names. Windows # and Mac users are advised to set this option to NO. CASE_SENSE_NAMES = YES # The GENERATE_TODOLIST tag can be used to enable (YES) or # disable (NO) the todo list. This list is created by putting \todo # commands in the documentation. GENERATE_TODOLIST = YES # The GENERATE_TESTLIST tag can be used to enable (YES) or # disable (NO) the test list. This list is created by putting \test # commands in the documentation. GENERATE_TESTLIST = YES # The GENERATE_BUGLIST tag can be used to enable (YES) or # disable (NO) the bug list. This list is created by putting \bug # commands in the documentation. GENERATE_BUGLIST = YES # The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or # disable (NO) the deprecated list. This list is created by putting # \deprecated commands in the documentation. GENERATE_DEPRECATEDLIST= YES # Set the SHOW_USED_FILES tag to NO to disable the list of files generated # at the bottom of the documentation of classes and structs. If set to YES the # list will mention the files that were used to generate the documentation. SHOW_USED_FILES = YES # If the sources in your project are distributed over multiple directories # then setting the SHOW_DIRECTORIES tag to YES will show the directory hierarchy # in the documentation. The default is NO. SHOW_DIRECTORIES = NO # Set the SHOW_FILES tag to NO to disable the generation of the Files page. # This will remove the Files entry from the Quick Index and from the # Folder Tree View (if specified). The default is YES. SHOW_FILES = YES # Set the SHOW_NAMESPACES tag to NO to disable the generation of the # Namespaces page. # This will remove the Namespaces entry from the Quick Index # and from the Folder Tree View (if specified). The default is YES. SHOW_NAMESPACES = YES #--------------------------------------------------------------------------- # configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag can be used to specify the files and/or directories that contain # documented source files. You may enter file names like "myfile.cpp" or # directories like "/usr/src/myproject". Separate the files or directories # with spaces. INPUT = ../src ../src/components/README # The RECURSIVE tag can be used to turn specify whether or not subdirectories # should be searched for input files as well. Possible values are YES and NO. # If left blank NO is used. RECURSIVE = YES #--------------------------------------------------------------------------- # configuration options related to source browsing #--------------------------------------------------------------------------- # If the SOURCE_BROWSER tag is set to YES then a list of source files will # be generated. Documented entities will be cross-referenced with these sources. # Note: To get rid of all source code in the generated output, make sure also # VERBATIM_HEADERS is set to NO. SOURCE_BROWSER = YES # Setting the INLINE_SOURCES tag to YES will include the body # of functions and classes directly in the documentation. INLINE_SOURCES = YES # Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct # doxygen to hide any special comment blocks from generated source code # fragments. Normal C and C++ comments will always remain visible. STRIP_CODE_COMMENTS = YES # If the REFERENCED_BY_RELATION tag is set to YES # then for each documented function all documented # functions referencing it will be listed. REFERENCED_BY_RELATION = NO # If the REFERENCES_RELATION tag is set to YES # then for each documented function all documented entities # called/used by that function will be listed. REFERENCES_RELATION = NO # If the REFERENCES_LINK_SOURCE tag is set to YES (the default) # and SOURCE_BROWSER tag is set to YES, then the hyperlinks from # functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will # link to the source code. # Otherwise they will link to the documentation. REFERENCES_LINK_SOURCE = YES #--------------------------------------------------------------------------- # configuration options related to the HTML output #--------------------------------------------------------------------------- # If the GENERATE_HTML tag is set to YES (the default) Doxygen will # generate HTML output. GENERATE_HTML = YES # The HTML_OUTPUT tag is used to specify where the HTML docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `html' will be used as the default path. HTML_OUTPUT = html # The HTML_FILE_EXTENSION tag can be used to specify the file extension for # each generated HTML page (for example: .htm,.php,.asp). If it is left blank # doxygen will generate files with .html extension. HTML_FILE_EXTENSION = .html # The HTML_HEADER tag can be used to specify a personal HTML header for # each generated HTML page. If it is left blank doxygen will generate a # standard header. HTML_HEADER = # The HTML_FOOTER tag can be used to specify a personal HTML footer for # each generated HTML page. If it is left blank doxygen will generate a # standard footer. HTML_FOOTER = # The HTML_STYLESHEET tag can be used to specify a user-defined cascading # style sheet that is used by each HTML page. It can be used to # fine-tune the look of the HTML output. If the tag is left blank doxygen # will generate a default style sheet. Note that doxygen will try to copy # the style sheet file to the HTML output directory, so don't put your own # stylesheet in the HTML output directory as well, or it will be erased! HTML_STYLESHEET = # If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML # page will contain the date and time when the page was generated. Setting # this to NO can help when comparing the output of multiple runs. HTML_TIMESTAMP = YES # If the HTML_ALIGN_MEMBERS tag is set to YES, the members of classes, # files or namespaces will be aligned in HTML using tables. If set to # NO a bullet list will be used. HTML_ALIGN_MEMBERS = YES # If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML # documentation will contain sections that can be hidden and shown after the # page has loaded. For this to work a browser that supports # JavaScript and DHTML is required (for instance Mozilla 1.0+, Firefox # Netscape 6.0+, Internet explorer 5.0+, Konqueror, or Safari). HTML_DYNAMIC_SECTIONS = NO # This tag can be used to set the number of enum values (range [1..20]) # that doxygen will group on one line in the generated HTML documentation. ENUM_VALUES_PER_LINE = 4 # The GENERATE_TREEVIEW tag is used to specify whether a tree-like index # structure should be generated to display hierarchical information. # If the tag value is set to YES, a side panel will be generated # containing a tree-like index structure (just like the one that # is generated for HTML Help). For this to work a browser that supports # JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). # Windows users are probably better off using the HTML help feature. GENERATE_TREEVIEW = YES # By enabling USE_INLINE_TREES, doxygen will generate the Groups, Directories, # and Class Hierarchy pages using a tree view instead of an ordered list. USE_INLINE_TREES = NO # If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be # used to set the initial width (in pixels) of the frame in which the tree # is shown. TREEVIEW_WIDTH = 250 # Use this tag to change the font size of Latex formulas included # as images in the HTML documentation. The default is 10. Note that # when you change the font size after a successful doxygen run you need # to manually remove any form_*.png images from the HTML output directory # to force them to be regenerated. FORMULA_FONTSIZE = 10 # When the SEARCHENGINE tag is enabled doxygen will generate a search box for the HTML output. The underlying search engine uses javascript # and DHTML and should work on any modern browser. Note that when using HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET) there is already a search function so this one should # typically be disabled. For large projects the javascript based search engine # can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. SEARCHENGINE = YES # When the SERVER_BASED_SEARCH tag is enabled the search engine will be implemented using a PHP enabled web server instead of at the web client using Javascript. Doxygen will generate the search PHP script and index # file to put on the web server. The advantage of the server based approach is that it scales better to large projects and allows full text search. The disadvances is that it is more difficult to setup # and does not have live searching capabilities. SERVER_BASED_SEARCH = NO #--------------------------------------------------------------------------- # Configuration options related to the dot tool #--------------------------------------------------------------------------- # If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will # generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base # or super classes. Setting the tag to NO turns the diagrams off. Note that # this option is superseded by the HAVE_DOT option below. This is only a # fallback. It is recommended to install and use dot, since it yields more # powerful graphs. CLASS_DIAGRAMS = YES # If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect inheritance relations. Setting this tag to YES will force the # the CLASS_DIAGRAMS tag to NO. CLASS_GRAPH = YES # If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect implementation dependencies (inheritance, containment, and # class references variables) of the class with other documented classes. COLLABORATION_GRAPH = YES # If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen # will generate a graph for groups, showing the direct groups dependencies GROUP_GRAPHS = YES # If the UML_LOOK tag is set to YES doxygen will generate inheritance and # collaboration diagrams in a style similar to the OMG's Unified Modeling # Language. UML_LOOK = NO # If set to YES, the inheritance and collaboration graphs will show the # relations between templates and their instances. TEMPLATE_RELATIONS = NO # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT # tags are set to YES then doxygen will generate a graph for each documented # file showing the direct and indirect include dependencies of the file with # other documented files. INCLUDE_GRAPH = YES # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and # HAVE_DOT tags are set to YES then doxygen will generate a graph for each # documented header file showing the documented files that directly or # indirectly include this file. INCLUDED_BY_GRAPH = YES # If the CALL_GRAPH and HAVE_DOT options are set to YES then # doxygen will generate a call dependency graph for every global function # or class method. Note that enabling this option will significantly increase # the time of a run. So in most cases it will be better to enable call graphs # for selected functions only using the \callgraph command. CALL_GRAPH = YES # If the CALLER_GRAPH and HAVE_DOT tags are set to YES then # doxygen will generate a caller dependency graph for every global function # or class method. Note that enabling this option will significantly increase # the time of a run. So in most cases it will be better to enable caller # graphs for selected functions only using the \callergraph command. CALLER_GRAPH = YES # If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen # will graphical hierarchy of all classes instead of a textual one. GRAPHICAL_HIERARCHY = YES # If the DIRECTORY_GRAPH, SHOW_DIRECTORIES and HAVE_DOT tags are set to YES # then doxygen will show the dependencies a directory has on other directories # in a graphical way. The dependency relations are determined by the #include # relations between the files in the directories. DIRECTORY_GRAPH = NO # The DOT_IMAGE_FORMAT tag can be used to set the image format of the images # generated by dot. Possible values are png, jpg, or gif # If left blank png will be used. DOT_IMAGE_FORMAT = png # The tag DOT_PATH can be used to specify the path where the dot tool can be # found. If left blank, it is assumed the dot tool can be found in the path. DOT_PATH = # The DOTFILE_DIRS tag can be used to specify one or more directories that # contain dot files that are included in the documentation (see the # \dotfile command). DOTFILE_DIRS = # The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of # nodes that will be shown in the graph. If the number of nodes in a graph # becomes larger than this value, doxygen will truncate the graph, which is # visualized by representing a node as a red box. Note that doxygen if the # number of direct children of the root node in a graph is already larger than # DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note # that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. DOT_GRAPH_MAX_NODES = 50 # The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the # graphs generated by dot. A depth value of 3 means that only nodes reachable # from the root by following a path via at most 3 edges will be shown. Nodes # that lay further from the root node will be omitted. Note that setting this # option to 1 or 2 may greatly reduce the computation time needed for large # code bases. Also note that the size of a graph can be further restricted by # DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. MAX_DOT_GRAPH_DEPTH = 0 # Set the DOT_TRANSPARENT tag to YES to generate images with a transparent # background. This is disabled by default, because dot on Windows does not # seem to support this out of the box. Warning: Depending on the platform used, # enabling this option may lead to badly anti-aliased labels on the edges of # a graph (i.e. they become hard to read). DOT_TRANSPARENT = NO # Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output # files in one run (i.e. multiple -o and -T options on the command line). This # makes dot run faster, but since only newer versions of dot (>1.8.10) # support this, this feature is disabled by default. DOT_MULTI_TARGETS = NO # If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will # generate a legend page explaining the meaning of the various boxes and # arrows in the dot generated graphs. GENERATE_LEGEND = YES # If the DOT_CLEANUP tag is set to YES (the default) Doxygen will # remove the intermediate dot files that are used to generate # the various graphs. DOT_CLEANUP = YES papi-5.3.0/doc/Doxyfile-man30000600003276200002170000000451412247131117015254 0ustar ralphundrgrad# This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for PAPI utilities man-pages # The following overrides default values in Doxyfile-common # # All text after a hash (#) is considered a comment and will be ignored # The format is: # TAG = value [value, ...] # For lists items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (" ") @INCLUDE = Doxyfile-common #--------------------------------------------------------------------------- # configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag can be used to specify the files and/or directories that contain # documented source files. You may enter file names like "myfile.cpp" or # directories like "/usr/src/myproject". Separate the files or directories # with spaces. INPUT = ../src/papi.h ../src/papi.c ../src/papi_hl.c \ ../src/papi_fwrappers.c FILE_PATTERNS = *.c *.h # The RECURSIVE tag can be used to turn specify whether or not subdirectories # should be searched for input files as well. Possible values are YES and NO. # If left blank NO is used. RECURSIVE = NO #--------------------------------------------------------------------------- # configuration options related to the man page output #--------------------------------------------------------------------------- # If the GENERATE_MAN tag is set to YES (the default) Doxygen will # generate man pages GENERATE_MAN = YES # The MAN_OUTPUT tag is used to specify where the man pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `man' will be used as the default path. MAN_OUTPUT = man # The MAN_EXTENSION tag determines the extension that is added to # the generated man pages (default is the subroutine's section .3) MAN_EXTENSION = .3 # If the MAN_LINKS tag is set to YES and Doxygen generates man output, # then it will generate one additional man file for each entity # documented in the real man page(s). These additional files # only source the real man page, but without them the man command # would be unable to find the correct page. The default is NO. MAN_LINKS = NO papi-5.3.0/doc/Doxyfile-common0000600003276200002170000021553412247131117015714 0ustar ralphundrgrad# Doxyfile 1.7.4 # This file describes the settings to be used by the documentation system # doxygen (www.doxygen.org) for a project. # # All text after a hash (#) is considered a comment and will be ignored. # The format is: # TAG = value [value, ...] # For lists items can also be appended using: # TAG += value [value, ...] # Values that contain spaces should be placed between quotes (" "). #--------------------------------------------------------------------------- # Project related configuration options #--------------------------------------------------------------------------- # This tag specifies the encoding used for all characters in the config file # that follow. The default is UTF-8 which is also the encoding used for all # text before the first occurrence of this tag. Doxygen uses libiconv (or the # iconv built into libc) for the transcoding. See # http://www.gnu.org/software/libiconv for the list of possible encodings. DOXYFILE_ENCODING = UTF-8 # The PROJECT_NAME tag is a single word (or a sequence of words surrounded # by quotes) that should identify the project. PROJECT_NAME = PAPI # The PROJECT_NUMBER tag can be used to enter a project or revision number. # This could be handy for archiving the generated documentation or # if some version control system is used. PROJECT_NUMBER = 5.3.0.0 # Using the PROJECT_BRIEF tag one can provide an optional one line description # for a project that appears at the top of each page and should give viewer # a quick idea about the purpose of the project. Keep the description short. PROJECT_BRIEF = # With the PROJECT_LOGO tag one can specify an logo or icon that is # included in the documentation. The maximum height of the logo should not # exceed 55 pixels and the maximum width should not exceed 200 pixels. # Doxygen will copy the logo to the output directory. PROJECT_LOGO = # The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) # base path where the generated documentation will be put. # If a relative path is entered, it will be relative to the location # where doxygen was started. If left blank the current directory will be used. OUTPUT_DIRECTORY = ./ # If the CREATE_SUBDIRS tag is set to YES, then doxygen will create # 4096 sub-directories (in 2 levels) under the output directory of each output # format and will distribute the generated files over these directories. # Enabling this option can be useful when feeding doxygen a huge amount of # source files, where putting all generated files in the same directory would # otherwise cause performance problems for the file system. CREATE_SUBDIRS = NO # The OUTPUT_LANGUAGE tag is used to specify the language in which all # documentation generated by doxygen is written. Doxygen will use this # information to generate all constant output in the proper language. # The default language is English, other supported languages are: # Afrikaans, Arabic, Brazilian, Catalan, Chinese, Chinese-Traditional, # Croatian, Czech, Danish, Dutch, Esperanto, Farsi, Finnish, French, German, # Greek, Hungarian, Italian, Japanese, Japanese-en (Japanese with English # messages), Korean, Korean-en, Lithuanian, Norwegian, Macedonian, Persian, # Polish, Portuguese, Romanian, Russian, Serbian, Serbian-Cyrillic, Slovak, # Slovene, Spanish, Swedish, Ukrainian, and Vietnamese. OUTPUT_LANGUAGE = English # If the BRIEF_MEMBER_DESC tag is set to YES (the default) Doxygen will # include brief member descriptions after the members that are listed in # the file and class documentation (similar to JavaDoc). # Set to NO to disable this. BRIEF_MEMBER_DESC = YES # If the REPEAT_BRIEF tag is set to YES (the default) Doxygen will prepend # the brief description of a member or function before the detailed description. # Note: if both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the # brief descriptions will be completely suppressed. REPEAT_BRIEF = NO # This tag implements a quasi-intelligent brief description abbreviator # that is used to form the text in various listings. Each string # in this list, if found as the leading text of the brief description, will be # stripped from the text and the result after processing the whole list, is # used as the annotated text. Otherwise, the brief description is used as-is. # If left blank, the following values are used ("$name" is automatically # replaced with the name of the entity): "The $name class" "The $name widget" # "The $name file" "is" "provides" "specifies" "contains" # "represents" "a" "an" "the" ABBREVIATE_BRIEF = # If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then # Doxygen will generate a detailed section even if there is only a brief # description. ALWAYS_DETAILED_SEC = NO # If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all # inherited members of a class in the documentation of that class as if those # members were ordinary class members. Constructors, destructors and assignment # operators of the base classes will not be shown. INLINE_INHERITED_MEMB = NO # If the FULL_PATH_NAMES tag is set to YES then Doxygen will prepend the full # path before files name in the file list and in the header files. If set # to NO the shortest path that makes the file name unique will be used. FULL_PATH_NAMES = NO # If the FULL_PATH_NAMES tag is set to YES then the STRIP_FROM_PATH tag # can be used to strip a user-defined part of the path. Stripping is # only done if one of the specified strings matches the left-hand part of # the path. The tag can be used to show relative paths in the file list. # If left blank the directory from which doxygen is run is used as the # path to strip. STRIP_FROM_PATH = # The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of # the path mentioned in the documentation of a class, which tells # the reader which header file to include in order to use a class. # If left blank only the name of the header file containing the class # definition is used. Otherwise one should specify the include paths that # are normally passed to the compiler using the -I flag. STRIP_FROM_INC_PATH = # If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter # (but less readable) file names. This can be useful if your file system # doesn't support long names like on DOS, Mac, or CD-ROM. SHORT_NAMES = NO # If the JAVADOC_AUTOBRIEF tag is set to YES then Doxygen # will interpret the first line (until the first dot) of a JavaDoc-style # comment as the brief description. If set to NO, the JavaDoc # comments will behave just like regular Qt-style comments # (thus requiring an explicit @brief command for a brief description.) JAVADOC_AUTOBRIEF = NO # If the QT_AUTOBRIEF tag is set to YES then Doxygen will # interpret the first line (until the first dot) of a Qt-style # comment as the brief description. If set to NO, the comments # will behave just like regular Qt-style comments (thus requiring # an explicit \brief command for a brief description.) QT_AUTOBRIEF = NO # The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make Doxygen # treat a multi-line C++ special comment block (i.e. a block of //! or /// # comments) as a brief description. This used to be the default behaviour. # The new default is to treat a multi-line C++ comment block as a detailed # description. Set this tag to YES if you prefer the old behaviour instead. MULTILINE_CPP_IS_BRIEF = NO # If the INHERIT_DOCS tag is set to YES (the default) then an undocumented # member inherits the documentation from any documented member that it # re-implements. INHERIT_DOCS = YES # If the SEPARATE_MEMBER_PAGES tag is set to YES, then doxygen will produce # a new page for each member. If set to NO, the documentation of a member will # be part of the file/class/namespace that contains it. SEPARATE_MEMBER_PAGES = NO # The TAB_SIZE tag can be used to set the number of spaces in a tab. # Doxygen uses this value to replace tabs by spaces in code fragments. TAB_SIZE = 4 # This tag can be used to specify a number of aliases that acts # as commands in the documentation. An alias has the form "name=value". # For example adding "sideeffect=\par Side Effects:\n" will allow you to # put the command \sideeffect (or @sideeffect) in the documentation, which # will result in a user-defined paragraph with heading "Side Effects:". # You can put \n's in the value part of an alias to insert newlines. ALIASES = # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C # sources only. Doxygen will then generate output that is more tailored for C. # For instance, some of the names that are used will be different. The list # of all members will be omitted, etc. OPTIMIZE_OUTPUT_FOR_C = YES # Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java # sources only. Doxygen will then generate output that is more tailored for # Java. For instance, namespaces will be presented as packages, qualified # scopes will look different, etc. OPTIMIZE_OUTPUT_JAVA = NO # Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran # sources only. Doxygen will then generate output that is more tailored for # Fortran. OPTIMIZE_FOR_FORTRAN = NO # Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL # sources. Doxygen will then generate output that is tailored for # VHDL. OPTIMIZE_OUTPUT_VHDL = NO # Doxygen selects the parser to use depending on the extension of the files it # parses. With this tag you can assign which parser to use for a given extension. # Doxygen has a built-in mapping, but you can override or extend it using this # tag. The format is ext=language, where ext is a file extension, and language # is one of the parsers supported by doxygen: IDL, Java, Javascript, CSharp, C, # C++, D, PHP, Objective-C, Python, Fortran, VHDL, C, C++. For instance to make # doxygen treat .inc files as Fortran files (default is PHP), and .f files as C # (default is Fortran), use: inc=Fortran f=C. Note that for custom extensions # you also need to set FILE_PATTERNS otherwise the files are not read by doxygen. EXTENSION_MAPPING = # If you use STL classes (i.e. std::string, std::vector, etc.) but do not want # to include (a tag file for) the STL sources as input, then you should # set this tag to YES in order to let doxygen match functions declarations and # definitions whose arguments contain STL classes (e.g. func(std::string); v.s. # func(std::string) {}). This also makes the inheritance and collaboration # diagrams that involve STL classes more complete and accurate. BUILTIN_STL_SUPPORT = NO # If you use Microsoft's C++/CLI language, you should set this option to YES to # enable parsing support. CPP_CLI_SUPPORT = NO # Set the SIP_SUPPORT tag to YES if your project consists of sip sources only. # Doxygen will parse them like normal C++ but will assume all classes use public # instead of private inheritance when no explicit protection keyword is present. SIP_SUPPORT = NO # For Microsoft's IDL there are propget and propput attributes to indicate getter # and setter methods for a property. Setting this option to YES (the default) # will make doxygen replace the get and set methods by a property in the # documentation. This will only work if the methods are indeed getting or # setting a simple type. If this is not the case, or you want to show the # methods anyway, you should set this option to NO. IDL_PROPERTY_SUPPORT = YES # If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC # tag is set to YES, then doxygen will reuse the documentation of the first # member in the group (if any) for the other members of the group. By default # all members of a group must be documented explicitly. DISTRIBUTE_GROUP_DOC = NO # Set the SUBGROUPING tag to YES (the default) to allow class member groups of # the same type (for instance a group of public functions) to be put as a # subgroup of that type (e.g. under the Public Functions section). Set it to # NO to prevent subgrouping. Alternatively, this can be done per class using # the \nosubgrouping command. SUBGROUPING = YES # When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and # unions are shown inside the group in which they are included (e.g. using # @ingroup) instead of on a separate page (for HTML and Man pages) or # section (for LaTeX and RTF). INLINE_GROUPED_CLASSES = NO # When TYPEDEF_HIDES_STRUCT is enabled, a typedef of a struct, union, or enum # is documented as struct, union, or enum with the name of the typedef. So # typedef struct TypeS {} TypeT, will appear in the documentation as a struct # with name TypeT. When disabled the typedef will appear as a member of a file, # namespace, or class. And the struct will be named TypeS. This can typically # be useful for C code in case the coding convention dictates that all compound # types are typedef'ed and only the typedef is referenced, never the tag name. TYPEDEF_HIDES_STRUCT = YES # The SYMBOL_CACHE_SIZE determines the size of the internal cache use to # determine which symbols to keep in memory and which to flush to disk. # When the cache is full, less often used symbols will be written to disk. # For small to medium size projects (<1000 input files) the default value is # probably good enough. For larger projects a too small cache size can cause # doxygen to be busy swapping symbols to and from disk most of the time # causing a significant performance penalty. # If the system has enough physical memory increasing the cache will improve the # performance by keeping more symbols in memory. Note that the value works on # a logarithmic scale so increasing the size by one will roughly double the # memory usage. The cache size is given by this formula: # 2^(16+SYMBOL_CACHE_SIZE). The valid range is 0..9, the default is 0, # corresponding to a cache size of 2^16 = 65536 symbols SYMBOL_CACHE_SIZE = 0 #--------------------------------------------------------------------------- # Build related configuration options #--------------------------------------------------------------------------- # If the EXTRACT_ALL tag is set to YES doxygen will assume all entities in # documentation are documented, even if no documentation was available. # Private class members and static file members will be hidden unless # the EXTRACT_PRIVATE and EXTRACT_STATIC tags are set to YES EXTRACT_ALL = NO # If the EXTRACT_PRIVATE tag is set to YES all private members of a class # will be included in the documentation. EXTRACT_PRIVATE = NO # If the EXTRACT_STATIC tag is set to YES all static members of a file # will be included in the documentation. EXTRACT_STATIC = YES # If the EXTRACT_LOCAL_CLASSES tag is set to YES classes (and structs) # defined locally in source files will be included in the documentation. # If set to NO only classes defined in header files are included. EXTRACT_LOCAL_CLASSES = YES # This flag is only useful for Objective-C code. When set to YES local # methods, which are defined in the implementation section but not in # the interface are included in the documentation. # If set to NO (the default) only methods in the interface are included. EXTRACT_LOCAL_METHODS = NO # If this flag is set to YES, the members of anonymous namespaces will be # extracted and appear in the documentation as a namespace called # 'anonymous_namespace{file}', where file will be replaced with the base # name of the file that contains the anonymous namespace. By default # anonymous namespaces are hidden. EXTRACT_ANON_NSPACES = NO # If the HIDE_UNDOC_MEMBERS tag is set to YES, Doxygen will hide all # undocumented members of documented classes, files or namespaces. # If set to NO (the default) these members will be included in the # various overviews, but no documentation section is generated. # This option has no effect if EXTRACT_ALL is enabled. HIDE_UNDOC_MEMBERS = NO # If the HIDE_UNDOC_CLASSES tag is set to YES, Doxygen will hide all # undocumented classes that are normally visible in the class hierarchy. # If set to NO (the default) these classes will be included in the various # overviews. This option has no effect if EXTRACT_ALL is enabled. HIDE_UNDOC_CLASSES = NO # If the HIDE_FRIEND_COMPOUNDS tag is set to YES, Doxygen will hide all # friend (class|struct|union) declarations. # If set to NO (the default) these declarations will be included in the # documentation. HIDE_FRIEND_COMPOUNDS = NO # If the HIDE_IN_BODY_DOCS tag is set to YES, Doxygen will hide any # documentation blocks found inside the body of a function. # If set to NO (the default) these blocks will be appended to the # function's detailed documentation block. HIDE_IN_BODY_DOCS = NO # The INTERNAL_DOCS tag determines if documentation # that is typed after a \internal command is included. If the tag is set # to NO (the default) then the documentation will be excluded. # Set it to YES to include the internal documentation. INTERNAL_DOCS = NO # If the CASE_SENSE_NAMES tag is set to NO then Doxygen will only generate # file names in lower-case letters. If set to YES upper-case letters are also # allowed. This is useful if you have classes or files whose names only differ # in case and if your file system supports case sensitive file names. Windows # and Mac users are advised to set this option to NO. CASE_SENSE_NAMES = YES # If the HIDE_SCOPE_NAMES tag is set to NO (the default) then Doxygen # will show members with their full class and namespace scopes in the # documentation. If set to YES the scope will be hidden. HIDE_SCOPE_NAMES = NO # If the SHOW_INCLUDE_FILES tag is set to YES (the default) then Doxygen # will put a list of the files that are included by a file in the documentation # of that file. SHOW_INCLUDE_FILES = NO # If the FORCE_LOCAL_INCLUDES tag is set to YES then Doxygen # will list include files with double quotes in the documentation # rather than with sharp brackets. FORCE_LOCAL_INCLUDES = NO # If the INLINE_INFO tag is set to YES (the default) then a tag [inline] # is inserted in the documentation for inline members. INLINE_INFO = YES # If the SORT_MEMBER_DOCS tag is set to YES (the default) then doxygen # will sort the (detailed) documentation of file and class members # alphabetically by member name. If set to NO the members will appear in # declaration order. SORT_MEMBER_DOCS = YES # If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the # brief documentation of file, namespace and class members alphabetically # by member name. If set to NO (the default) the members will appear in # declaration order. SORT_BRIEF_DOCS = NO # If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen # will sort the (brief and detailed) documentation of class members so that # constructors and destructors are listed first. If set to NO (the default) # the constructors will appear in the respective orders defined by # SORT_MEMBER_DOCS and SORT_BRIEF_DOCS. # This tag will be ignored for brief docs if SORT_BRIEF_DOCS is set to NO # and ignored for detailed docs if SORT_MEMBER_DOCS is set to NO. SORT_MEMBERS_CTORS_1ST = NO # If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the # hierarchy of group names into alphabetical order. If set to NO (the default) # the group names will appear in their defined order. SORT_GROUP_NAMES = NO # If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be # sorted by fully-qualified names, including namespaces. If set to # NO (the default), the class list will be sorted only by class name, # not including the namespace part. # Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. # Note: This option applies only to the class list, not to the # alphabetical list. SORT_BY_SCOPE_NAME = NO # If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to # do proper type resolution of all parameters of a function it will reject a # match between the prototype and the implementation of a member function even # if there is only one candidate or it is obvious which candidate to choose # by doing a simple string match. By disabling STRICT_PROTO_MATCHING doxygen # will still accept a match between prototype and implementation in such cases. STRICT_PROTO_MATCHING = NO # The GENERATE_TODOLIST tag can be used to enable (YES) or # disable (NO) the todo list. This list is created by putting \todo # commands in the documentation. GENERATE_TODOLIST = NO # The GENERATE_TESTLIST tag can be used to enable (YES) or # disable (NO) the test list. This list is created by putting \test # commands in the documentation. GENERATE_TESTLIST = NO # The GENERATE_BUGLIST tag can be used to enable (YES) or # disable (NO) the bug list. This list is created by putting \bug # commands in the documentation. GENERATE_BUGLIST = NO # The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or # disable (NO) the deprecated list. This list is created by putting # \deprecated commands in the documentation. GENERATE_DEPRECATEDLIST= NO # The ENABLED_SECTIONS tag can be used to enable conditional # documentation sections, marked by \if sectionname ... \endif. ENABLED_SECTIONS = # The MAX_INITIALIZER_LINES tag determines the maximum number of lines # the initial value of a variable or macro consists of for it to appear in # the documentation. If the initializer consists of more lines than specified # here it will be hidden. Use a value of 0 to hide initializers completely. # The appearance of the initializer of individual variables and macros in the # documentation can be controlled using \showinitializer or \hideinitializer # command in the documentation regardless of this setting. MAX_INITIALIZER_LINES = 30 # Set the SHOW_USED_FILES tag to NO to disable the list of files generated # at the bottom of the documentation of classes and structs. If set to YES the # list will mention the files that were used to generate the documentation. SHOW_USED_FILES = NO # If the sources in your project are distributed over multiple directories # then setting the SHOW_DIRECTORIES tag to YES will show the directory hierarchy # in the documentation. The default is NO. SHOW_DIRECTORIES = NO # Set the SHOW_FILES tag to NO to disable the generation of the Files page. # This will remove the Files entry from the Quick Index and from the # Folder Tree View (if specified). The default is YES. SHOW_FILES = NO # Set the SHOW_NAMESPACES tag to NO to disable the generation of the # Namespaces page. # This will remove the Namespaces entry from the Quick Index # and from the Folder Tree View (if specified). The default is YES. SHOW_NAMESPACES = NO # The FILE_VERSION_FILTER tag can be used to specify a program or script that # doxygen should invoke to get the current version for each file (typically from # the version control system). Doxygen will invoke the program by executing (via # popen()) the command , where is the value of # the FILE_VERSION_FILTER tag, and is the name of an input file # provided by doxygen. Whatever the program writes to standard output # is used as the file version. See the manual for examples. FILE_VERSION_FILTER = # The LAYOUT_FILE tag can be used to specify a layout file which will be parsed # by doxygen. The layout file controls the global structure of the generated # output files in an output format independent way. The create the layout file # that represents doxygen's defaults, run doxygen with the -l option. # You can optionally specify a file name after the option, if omitted # DoxygenLayout.xml will be used as the name of the layout file. LAYOUT_FILE = #--------------------------------------------------------------------------- # configuration options related to warning and progress messages #--------------------------------------------------------------------------- # The QUIET tag can be used to turn on/off the messages that are generated # by doxygen. Possible values are YES and NO. If left blank NO is used. QUIET = NO # The WARNINGS tag can be used to turn on/off the warning messages that are # generated by doxygen. Possible values are YES and NO. If left blank # NO is used. WARNINGS = YES # If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate warnings # for undocumented members. If EXTRACT_ALL is set to YES then this flag will # automatically be disabled. WARN_IF_UNDOCUMENTED = NO # If WARN_IF_DOC_ERROR is set to YES, doxygen will generate warnings for # potential errors in the documentation, such as not documenting some # parameters in a documented function, or documenting parameters that # don't exist or using markup commands wrongly. WARN_IF_DOC_ERROR = YES # The WARN_NO_PARAMDOC option can be enabled to get warnings for # functions that are documented, but have no documentation for their parameters # or return value. If set to NO (the default) doxygen will only warn about # wrong or incomplete parameter documentation, but not about the absence of # documentation. WARN_NO_PARAMDOC = NO # The WARN_FORMAT tag determines the format of the warning messages that # doxygen can produce. The string should contain the $file, $line, and $text # tags, which will be replaced by the file and line number from which the # warning originated and the warning text. Optionally the format may contain # $version, which will be replaced by the version of the file (if it could # be obtained via FILE_VERSION_FILTER) WARN_FORMAT = "$file:$line: $text" # The WARN_LOGFILE tag can be used to specify a file to which warning # and error messages should be written. If left blank the output is written # to stderr. WARN_LOGFILE = doxyerror #--------------------------------------------------------------------------- # configuration options related to the input files #--------------------------------------------------------------------------- # The INPUT tag can be used to specify the files and/or directories that contain # documented source files. You may enter file names like "myfile.cpp" or # directories like "/usr/src/myproject". Separate the files or directories # with spaces. INPUT = # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is # also the default input encoding. Doxygen uses libiconv (or the iconv built # into libc) for the transcoding. See http://www.gnu.org/software/libiconv for # the list of possible encodings. INPUT_ENCODING = UTF-8 # If the value of the INPUT tag contains directories, you can use the # FILE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp # and *.h) to filter out the source-files in the directories. If left # blank the following patterns are tested: # *.c *.cc *.cxx *.cpp *.c++ *.d *.java *.ii *.ixx *.ipp *.i++ *.inl *.h *.hh # *.hxx *.hpp *.h++ *.idl *.odl *.cs *.php *.php3 *.inc *.m *.mm *.dox *.py # *.f90 *.f *.for *.vhd *.vhdl FILE_PATTERNS = *.c *.h # The RECURSIVE tag can be used to turn specify whether or not subdirectories # should be searched for input files as well. Possible values are YES and NO. # If left blank NO is used. RECURSIVE = NO # The EXCLUDE tag can be used to specify files and/or directories that should # excluded from the INPUT source files. This way you can easily exclude a # subdirectory from a directory tree whose root is specified with the INPUT tag. EXCLUDE = # The EXCLUDE_SYMLINKS tag can be used select whether or not files or # directories that are symbolic links (a Unix file system feature) are excluded # from the input. EXCLUDE_SYMLINKS = NO # If the value of the INPUT tag contains directories, you can use the # EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude # certain files from those directories. Note that the wildcards are matched # against the file with absolute path, so to exclude all test directories # for example use the pattern */test/* EXCLUDE_PATTERNS = */Matlab/* */CVS/* */libpfm-2.x/* */libpfm-3.x/* \ */libpfm-3.y/* */libpfm4/* */perfctr-1.6.1/* */perfctr-2.3.12/* \ */perfctr-2.4.1/* */perfctr-2.4.5/* */perfctr-2.4.x/* \ */perfctr-2.6.x/* */perfctr-2.6.x.old/* */perfctr-2.7.x/* \ */linux-bgp.c # The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names # (namespaces, classes, functions, etc.) that should be excluded from the # output. The symbol name can be a fully qualified name, a word, or if the # wildcard * is used, a substring. Examples: ANamespace, AClass, # AClass::ANamespace, ANamespace::*Test EXCLUDE_SYMBOLS = # The EXAMPLE_PATH tag can be used to specify one or more files or # directories that contain example code fragments that are included (see # the \include command). EXAMPLE_PATH = # If the value of the EXAMPLE_PATH tag contains directories, you can use the # EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp # and *.h) to filter out the source-files in the directories. If left # blank all files are included. EXAMPLE_PATTERNS = # If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be # searched for input files to be used with the \include or \dontinclude # commands irrespective of the value of the RECURSIVE tag. # Possible values are YES and NO. If left blank NO is used. EXAMPLE_RECURSIVE = NO # The IMAGE_PATH tag can be used to specify one or more files or # directories that contain image that are included in the documentation (see # the \image command). IMAGE_PATH = # The INPUT_FILTER tag can be used to specify a program that doxygen should # invoke to filter for each input file. Doxygen will invoke the filter program # by executing (via popen()) the command , where # is the value of the INPUT_FILTER tag, and is the name of an # input file. Doxygen will then use the output that the filter program writes # to standard output. # If FILTER_PATTERNS is specified, this tag will be # ignored. INPUT_FILTER = # The FILTER_PATTERNS tag can be used to specify filters on a per file pattern # basis. # Doxygen will compare the file name with each pattern and apply the # filter if there is a match. # The filters are a list of the form: # pattern=filter (like *.cpp=my_cpp_filter). See INPUT_FILTER for further # info on how filters are used. If FILTER_PATTERNS is empty or if # non of the patterns match the file name, INPUT_FILTER is applied. FILTER_PATTERNS = # If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using # INPUT_FILTER) will be used to filter the input files when producing source # files to browse (i.e. when SOURCE_BROWSER is set to YES). FILTER_SOURCE_FILES = NO # The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file # pattern. A pattern will override the setting for FILTER_PATTERN (if any) # and it is also possible to disable source filtering for a specific pattern # using *.ext= (so without naming a filter). This option only has effect when # FILTER_SOURCE_FILES is enabled. FILTER_SOURCE_PATTERNS = #--------------------------------------------------------------------------- # configuration options related to source browsing #--------------------------------------------------------------------------- # If the SOURCE_BROWSER tag is set to YES then a list of source files will # be generated. Documented entities will be cross-referenced with these sources. # Note: To get rid of all source code in the generated output, make sure also # VERBATIM_HEADERS is set to NO. SOURCE_BROWSER = NO # Setting the INLINE_SOURCES tag to YES will include the body # of functions and classes directly in the documentation. INLINE_SOURCES = NO # Setting the STRIP_CODE_COMMENTS tag to YES (the default) will instruct # doxygen to hide any special comment blocks from generated source code # fragments. Normal C and C++ comments will always remain visible. STRIP_CODE_COMMENTS = YES # If the REFERENCED_BY_RELATION tag is set to YES # then for each documented function all documented # functions referencing it will be listed. REFERENCED_BY_RELATION = NO # If the REFERENCES_RELATION tag is set to YES # then for each documented function all documented entities # called/used by that function will be listed. REFERENCES_RELATION = NO # If the REFERENCES_LINK_SOURCE tag is set to YES (the default) # and SOURCE_BROWSER tag is set to YES, then the hyperlinks from # functions in REFERENCES_RELATION and REFERENCED_BY_RELATION lists will # link to the source code. # Otherwise they will link to the documentation. REFERENCES_LINK_SOURCE = YES # If the USE_HTAGS tag is set to YES then the references to source code # will point to the HTML generated by the htags(1) tool instead of doxygen # built-in source browser. The htags tool is part of GNU's global source # tagging system (see http://www.gnu.org/software/global/global.html). You # will need version 4.8.6 or higher. USE_HTAGS = NO # If the VERBATIM_HEADERS tag is set to YES (the default) then Doxygen # will generate a verbatim copy of the header file for each class for # which an include is specified. Set to NO to disable this. VERBATIM_HEADERS = YES #--------------------------------------------------------------------------- # configuration options related to the alphabetical class index #--------------------------------------------------------------------------- # If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index # of all compounds will be generated. Enable this if the project # contains a lot of classes, structs, unions or interfaces. ALPHABETICAL_INDEX = YES # If the alphabetical index is enabled (see ALPHABETICAL_INDEX) then # the COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns # in which this list will be split (can be a number in the range [1..20]) COLS_IN_ALPHA_INDEX = 5 # In case all classes in a project start with a common prefix, all # classes will be put under the same header in the alphabetical index. # The IGNORE_PREFIX tag can be used to specify one or more prefixes that # should be ignored while generating the index headers. IGNORE_PREFIX = #--------------------------------------------------------------------------- # configuration options related to the HTML output #--------------------------------------------------------------------------- # If the GENERATE_HTML tag is set to YES (the default) Doxygen will # generate HTML output. GENERATE_HTML = NO # The HTML_OUTPUT tag is used to specify where the HTML docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `html' will be used as the default path. HTML_OUTPUT = html # The HTML_FILE_EXTENSION tag can be used to specify the file extension for # each generated HTML page (for example: .htm,.php,.asp). If it is left blank # doxygen will generate files with .html extension. HTML_FILE_EXTENSION = .html # The HTML_HEADER tag can be used to specify a personal HTML header for # each generated HTML page. If it is left blank doxygen will generate a # standard header. Note that when using a custom header you are responsible # for the proper inclusion of any scripts and style sheets that doxygen # needs, which is dependent on the configuration options used. # It is adviced to generate a default header using "doxygen -w html # header.html footer.html stylesheet.css YourConfigFile" and then modify # that header. Note that the header is subject to change so you typically # have to redo this when upgrading to a newer version of doxygen or when changing the value of configuration settings such as GENERATE_TREEVIEW! HTML_HEADER = # The HTML_FOOTER tag can be used to specify a personal HTML footer for # each generated HTML page. If it is left blank doxygen will generate a # standard footer. HTML_FOOTER = # The HTML_STYLESHEET tag can be used to specify a user-defined cascading # style sheet that is used by each HTML page. It can be used to # fine-tune the look of the HTML output. If the tag is left blank doxygen # will generate a default style sheet. Note that doxygen will try to copy # the style sheet file to the HTML output directory, so don't put your own # stylesheet in the HTML output directory as well, or it will be erased! HTML_STYLESHEET = # The HTML_EXTRA_FILES tag can be used to specify one or more extra images or # other source files which should be copied to the HTML output directory. Note # that these files will be copied to the base HTML output directory. Use the # $relpath$ marker in the HTML_HEADER and/or HTML_FOOTER files to load these # files. In the HTML_STYLESHEET file, use the file name only. Also note that # the files will be copied as-is; there are no commands or markers available. HTML_EXTRA_FILES = # The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. # Doxygen will adjust the colors in the stylesheet and background images # according to this color. Hue is specified as an angle on a colorwheel, # see http://en.wikipedia.org/wiki/Hue for more information. # For instance the value 0 represents red, 60 is yellow, 120 is green, # 180 is cyan, 240 is blue, 300 purple, and 360 is red again. # The allowed range is 0 to 359. HTML_COLORSTYLE_HUE = 220 # The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of # the colors in the HTML output. For a value of 0 the output will use # grayscales only. A value of 255 will produce the most vivid colors. HTML_COLORSTYLE_SAT = 100 # The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to # the luminance component of the colors in the HTML output. Values below # 100 gradually make the output lighter, whereas values above 100 make # the output darker. The value divided by 100 is the actual gamma applied, # so 80 represents a gamma of 0.8, The value 220 represents a gamma of 2.2, # and 100 does not change the gamma. HTML_COLORSTYLE_GAMMA = 80 # If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML # page will contain the date and time when the page was generated. Setting # this to NO can help when comparing the output of multiple runs. HTML_TIMESTAMP = YES # If the HTML_ALIGN_MEMBERS tag is set to YES, the members of classes, # files or namespaces will be aligned in HTML using tables. If set to # NO a bullet list will be used. HTML_ALIGN_MEMBERS = YES # If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML # documentation will contain sections that can be hidden and shown after the # page has loaded. For this to work a browser that supports # JavaScript and DHTML is required (for instance Mozilla 1.0+, Firefox # Netscape 6.0+, Internet explorer 5.0+, Konqueror, or Safari). HTML_DYNAMIC_SECTIONS = NO # If the GENERATE_DOCSET tag is set to YES, additional index files # will be generated that can be used as input for Apple's Xcode 3 # integrated development environment, introduced with OSX 10.5 (Leopard). # To create a documentation set, doxygen will generate a Makefile in the # HTML output directory. Running make will produce the docset in that # directory and running "make install" will install the docset in # ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find # it at startup. # See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html # for more information. GENERATE_DOCSET = NO # When GENERATE_DOCSET tag is set to YES, this tag determines the name of the # feed. A documentation feed provides an umbrella under which multiple # documentation sets from a single provider (such as a company or product suite) # can be grouped. DOCSET_FEEDNAME = "Doxygen generated docs" # When GENERATE_DOCSET tag is set to YES, this tag specifies a string that # should uniquely identify the documentation set bundle. This should be a # reverse domain-name style string, e.g. com.mycompany.MyDocSet. Doxygen # will append .docset to the name. DOCSET_BUNDLE_ID = org.doxygen.Project # When GENERATE_PUBLISHER_ID tag specifies a string that should uniquely identify # the documentation publisher. This should be a reverse domain-name style # string, e.g. com.mycompany.MyDocSet.documentation. DOCSET_PUBLISHER_ID = org.doxygen.Publisher # The GENERATE_PUBLISHER_NAME tag identifies the documentation publisher. DOCSET_PUBLISHER_NAME = Publisher # If the GENERATE_HTMLHELP tag is set to YES, additional index files # will be generated that can be used as input for tools like the # Microsoft HTML help workshop to generate a compiled HTML help file (.chm) # of the generated HTML documentation. GENERATE_HTMLHELP = NO # If the GENERATE_HTMLHELP tag is set to YES, the CHM_FILE tag can # be used to specify the file name of the resulting .chm file. You # can add a path in front of the file if the result should not be # written to the html output directory. CHM_FILE = # If the GENERATE_HTMLHELP tag is set to YES, the HHC_LOCATION tag can # be used to specify the location (absolute path including file name) of # the HTML help compiler (hhc.exe). If non-empty doxygen will try to run # the HTML help compiler on the generated index.hhp. HHC_LOCATION = # If the GENERATE_HTMLHELP tag is set to YES, the GENERATE_CHI flag # controls if a separate .chi index file is generated (YES) or that # it should be included in the master .chm file (NO). GENERATE_CHI = NO # If the GENERATE_HTMLHELP tag is set to YES, the CHM_INDEX_ENCODING # is used to encode HtmlHelp index (hhk), content (hhc) and project file # content. CHM_INDEX_ENCODING = # If the GENERATE_HTMLHELP tag is set to YES, the BINARY_TOC flag # controls whether a binary table of contents is generated (YES) or a # normal table of contents (NO) in the .chm file. BINARY_TOC = NO # The TOC_EXPAND flag can be set to YES to add extra items for group members # to the contents of the HTML help documentation and to the tree view. TOC_EXPAND = NO # If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and # QHP_VIRTUAL_FOLDER are set, an additional index file will be generated # that can be used as input for Qt's qhelpgenerator to generate a # Qt Compressed Help (.qch) of the generated HTML documentation. GENERATE_QHP = NO # If the QHG_LOCATION tag is specified, the QCH_FILE tag can # be used to specify the file name of the resulting .qch file. # The path specified is relative to the HTML output folder. QCH_FILE = # The QHP_NAMESPACE tag specifies the namespace to use when generating # Qt Help Project output. For more information please see # http://doc.trolltech.com/qthelpproject.html#namespace QHP_NAMESPACE = # The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating # Qt Help Project output. For more information please see # http://doc.trolltech.com/qthelpproject.html#virtual-folders QHP_VIRTUAL_FOLDER = doc # If QHP_CUST_FILTER_NAME is set, it specifies the name of a custom filter to # add. For more information please see # http://doc.trolltech.com/qthelpproject.html#custom-filters QHP_CUST_FILTER_NAME = # The QHP_CUST_FILT_ATTRS tag specifies the list of the attributes of the # custom filter to add. For more information please see # # Qt Help Project / Custom Filters. QHP_CUST_FILTER_ATTRS = # The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this # project's # filter section matches. # # Qt Help Project / Filter Attributes. QHP_SECT_FILTER_ATTRS = # If the GENERATE_QHP tag is set to YES, the QHG_LOCATION tag can # be used to specify the location of Qt's qhelpgenerator. # If non-empty doxygen will try to run qhelpgenerator on the generated # .qhp file. QHG_LOCATION = # If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files # will be generated, which together with the HTML files, form an Eclipse help # plugin. To install this plugin and make it available under the help contents # menu in Eclipse, the contents of the directory containing the HTML and XML # files needs to be copied into the plugins directory of eclipse. The name of # the directory within the plugins directory should be the same as # the ECLIPSE_DOC_ID value. After copying Eclipse needs to be restarted before # the help appears. GENERATE_ECLIPSEHELP = NO # A unique identifier for the eclipse help plugin. When installing the plugin # the directory name containing the HTML and XML files should also have # this name. ECLIPSE_DOC_ID = org.doxygen.Project # The DISABLE_INDEX tag can be used to turn on/off the condensed index at # top of each HTML page. The value NO (the default) enables the index and # the value YES disables it. DISABLE_INDEX = NO # The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values # (range [0,1..20]) that doxygen will group on one line in the generated HTML # documentation. Note that a value of 0 will completely suppress the enum # values from appearing in the overview section. ENUM_VALUES_PER_LINE = 4 # The GENERATE_TREEVIEW tag is used to specify whether a tree-like index # structure should be generated to display hierarchical information. # If the tag value is set to YES, a side panel will be generated # containing a tree-like index structure (just like the one that # is generated for HTML Help). For this to work a browser that supports # JavaScript, DHTML, CSS and frames is required (i.e. any modern browser). # Windows users are probably better off using the HTML help feature. GENERATE_TREEVIEW = NO # By enabling USE_INLINE_TREES, doxygen will generate the Groups, Directories, # and Class Hierarchy pages using a tree view instead of an ordered list. USE_INLINE_TREES = NO # If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be # used to set the initial width (in pixels) of the frame in which the tree # is shown. TREEVIEW_WIDTH = 250 # When the EXT_LINKS_IN_WINDOW option is set to YES doxygen will open # links to external symbols imported via tag files in a separate window. EXT_LINKS_IN_WINDOW = NO # Use this tag to change the font size of Latex formulas included # as images in the HTML documentation. The default is 10. Note that # when you change the font size after a successful doxygen run you need # to manually remove any form_*.png images from the HTML output directory # to force them to be regenerated. FORMULA_FONTSIZE = 10 # Use the FORMULA_TRANPARENT tag to determine whether or not the images # generated for formulas are transparent PNGs. Transparent PNGs are # not supported properly for IE 6.0, but are supported on all modern browsers. # Note that when changing this option you need to delete any form_*.png files # in the HTML output before the changes have effect. FORMULA_TRANSPARENT = YES # Enable the USE_MATHJAX option to render LaTeX formulas using MathJax # (see http://www.mathjax.org) which uses client side Javascript for the # rendering instead of using prerendered bitmaps. Use this if you do not # have LaTeX installed or if you want to formulas look prettier in the HTML # output. When enabled you also need to install MathJax separately and # configure the path to it using the MATHJAX_RELPATH option. USE_MATHJAX = NO # When MathJax is enabled you need to specify the location relative to the # HTML output directory using the MATHJAX_RELPATH option. The destination # directory should contain the MathJax.js script. For instance, if the mathjax # directory is located at the same level as the HTML output directory, then # MATHJAX_RELPATH should be ../mathjax. The default value points to the # mathjax.org site, so you can quickly see the result without installing # MathJax, but it is strongly recommended to install a local copy of MathJax # before deployment. MATHJAX_RELPATH = http://www.mathjax.org/mathjax # When the SEARCHENGINE tag is enabled doxygen will generate a search box # for the HTML output. The underlying search engine uses javascript # and DHTML and should work on any modern browser. Note that when using # HTML help (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets # (GENERATE_DOCSET) there is already a search function so this one should # typically be disabled. For large projects the javascript based search engine # can be slow, then enabling SERVER_BASED_SEARCH may provide a better solution. SEARCHENGINE = YES # When the SERVER_BASED_SEARCH tag is enabled the search engine will be # implemented using a PHP enabled web server instead of at the web client # using Javascript. Doxygen will generate the search PHP script and index # file to put on the web server. The advantage of the server # based approach is that it scales better to large projects and allows # full text search. The disadvantages are that it is more difficult to setup # and does not have live searching capabilities. SERVER_BASED_SEARCH = NO #--------------------------------------------------------------------------- # configuration options related to the LaTeX output #--------------------------------------------------------------------------- # If the GENERATE_LATEX tag is set to YES (the default) Doxygen will # generate Latex output. GENERATE_LATEX = NO # The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `latex' will be used as the default path. LATEX_OUTPUT = latex # The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be # invoked. If left blank `latex' will be used as the default command name. # Note that when enabling USE_PDFLATEX this option is only used for # generating bitmaps for formulas in the HTML output, but not in the # Makefile that is written to the output directory. LATEX_CMD_NAME = latex # The MAKEINDEX_CMD_NAME tag can be used to specify the command name to # generate index for LaTeX. If left blank `makeindex' will be used as the # default command name. MAKEINDEX_CMD_NAME = makeindex # If the COMPACT_LATEX tag is set to YES Doxygen generates more compact # LaTeX documents. This may be useful for small projects and may help to # save some trees in general. COMPACT_LATEX = NO # The PAPER_TYPE tag can be used to set the paper type that is used # by the printer. Possible values are: a4, letter, legal and # executive. If left blank a4wide will be used. PAPER_TYPE = a4 # The EXTRA_PACKAGES tag can be to specify one or more names of LaTeX # packages that should be included in the LaTeX output. EXTRA_PACKAGES = # The LATEX_HEADER tag can be used to specify a personal LaTeX header for # the generated latex document. The header should contain everything until # the first chapter. If it is left blank doxygen will generate a # standard header. Notice: only use this tag if you know what you are doing! LATEX_HEADER = # The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for # the generated latex document. The footer should contain everything after # the last chapter. If it is left blank doxygen will generate a # standard footer. Notice: only use this tag if you know what you are doing! LATEX_FOOTER = # If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated # is prepared for conversion to pdf (using ps2pdf). The pdf file will # contain links (just like the HTML output) instead of page references # This makes the output suitable for online browsing using a pdf viewer. PDF_HYPERLINKS = YES # If the USE_PDFLATEX tag is set to YES, pdflatex will be used instead of # plain latex in the generated Makefile. Set this option to YES to get a # higher quality PDF documentation. USE_PDFLATEX = YES # If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode. # command to the generated LaTeX files. This will instruct LaTeX to keep # running if errors occur, instead of asking the user for help. # This option is also used when generating formulas in HTML. LATEX_BATCHMODE = NO # If LATEX_HIDE_INDICES is set to YES then doxygen will not # include the index chapters (such as File Index, Compound Index, etc.) # in the output. LATEX_HIDE_INDICES = NO # If LATEX_SOURCE_CODE is set to YES then doxygen will include # source code with syntax highlighting in the LaTeX output. # Note that which sources are shown also depends on other settings # such as SOURCE_BROWSER. LATEX_SOURCE_CODE = NO #--------------------------------------------------------------------------- # configuration options related to the RTF output #--------------------------------------------------------------------------- # If the GENERATE_RTF tag is set to YES Doxygen will generate RTF output # The RTF output is optimized for Word 97 and may not look very pretty with # other RTF readers or editors. GENERATE_RTF = NO # The RTF_OUTPUT tag is used to specify where the RTF docs will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `rtf' will be used as the default path. RTF_OUTPUT = rtf # If the COMPACT_RTF tag is set to YES Doxygen generates more compact # RTF documents. This may be useful for small projects and may help to # save some trees in general. COMPACT_RTF = NO # If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated # will contain hyperlink fields. The RTF file will # contain links (just like the HTML output) instead of page references. # This makes the output suitable for online browsing using WORD or other # programs which support those fields. # Note: wordpad (write) and others do not support links. RTF_HYPERLINKS = NO # Load stylesheet definitions from file. Syntax is similar to doxygen's # config file, i.e. a series of assignments. You only have to provide # replacements, missing definitions are set to their default value. RTF_STYLESHEET_FILE = # Set optional variables used in the generation of an rtf document. # Syntax is similar to doxygen's config file. RTF_EXTENSIONS_FILE = #--------------------------------------------------------------------------- # configuration options related to the man page output #--------------------------------------------------------------------------- # If the GENERATE_MAN tag is set to YES (the default) Doxygen will # generate man pages GENERATE_MAN = NO # The MAN_OUTPUT tag is used to specify where the man pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `man' will be used as the default path. MAN_OUTPUT = man # The MAN_EXTENSION tag determines the extension that is added to # the generated man pages (default is the subroutine's section .3) MAN_EXTENSION = .3 # If the MAN_LINKS tag is set to YES and Doxygen generates man output, # then it will generate one additional man file for each entity # documented in the real man page(s). These additional files # only source the real man page, but without them the man command # would be unable to find the correct page. The default is NO. MAN_LINKS = NO #--------------------------------------------------------------------------- # configuration options related to the XML output #--------------------------------------------------------------------------- # If the GENERATE_XML tag is set to YES Doxygen will # generate an XML file that captures the structure of # the code including all documentation. GENERATE_XML = NO # The XML_OUTPUT tag is used to specify where the XML pages will be put. # If a relative path is entered the value of OUTPUT_DIRECTORY will be # put in front of it. If left blank `xml' will be used as the default path. XML_OUTPUT = xml # The XML_SCHEMA tag can be used to specify an XML schema, # which can be used by a validating XML parser to check the # syntax of the XML files. XML_SCHEMA = # The XML_DTD tag can be used to specify an XML DTD, # which can be used by a validating XML parser to check the # syntax of the XML files. XML_DTD = # If the XML_PROGRAMLISTING tag is set to YES Doxygen will # dump the program listings (including syntax highlighting # and cross-referencing information) to the XML output. Note that # enabling this will significantly increase the size of the XML output. XML_PROGRAMLISTING = YES #--------------------------------------------------------------------------- # configuration options for the AutoGen Definitions output #--------------------------------------------------------------------------- # If the GENERATE_AUTOGEN_DEF tag is set to YES Doxygen will # generate an AutoGen Definitions (see autogen.sf.net) file # that captures the structure of the code including all # documentation. Note that this feature is still experimental # and incomplete at the moment. GENERATE_AUTOGEN_DEF = NO #--------------------------------------------------------------------------- # configuration options related to the Perl module output #--------------------------------------------------------------------------- # If the GENERATE_PERLMOD tag is set to YES Doxygen will # generate a Perl module file that captures the structure of # the code including all documentation. Note that this # feature is still experimental and incomplete at the # moment. GENERATE_PERLMOD = NO # If the PERLMOD_LATEX tag is set to YES Doxygen will generate # the necessary Makefile rules, Perl scripts and LaTeX code to be able # to generate PDF and DVI output from the Perl module output. PERLMOD_LATEX = NO # If the PERLMOD_PRETTY tag is set to YES the Perl module output will be # nicely formatted so it can be parsed by a human reader. # This is useful # if you want to understand what is going on. # On the other hand, if this # tag is set to NO the size of the Perl module output will be much smaller # and Perl will parse it just the same. PERLMOD_PRETTY = YES # The names of the make variables in the generated doxyrules.make file # are prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. # This is useful so different doxyrules.make files included by the same # Makefile don't overwrite each other's variables. PERLMOD_MAKEVAR_PREFIX = #--------------------------------------------------------------------------- # Configuration options related to the preprocessor #--------------------------------------------------------------------------- # If the ENABLE_PREPROCESSING tag is set to YES (the default) Doxygen will # evaluate all C-preprocessor directives found in the sources and include # files. ENABLE_PREPROCESSING = YES # If the MACRO_EXPANSION tag is set to YES Doxygen will expand all macro # names in the source code. If set to NO (the default) only conditional # compilation will be performed. Macro expansion can be done in a controlled # way by setting EXPAND_ONLY_PREDEF to YES. MACRO_EXPANSION = YES # If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES # then the macro expansion is limited to the macros specified with the # PREDEFINED and EXPAND_AS_DEFINED tags. EXPAND_ONLY_PREDEF = NO # If the SEARCH_INCLUDES tag is set to YES (the default) the includes files # pointed to by INCLUDE_PATH will be searched when a #include is found. SEARCH_INCLUDES = YES # The INCLUDE_PATH tag can be used to specify one or more directories that # contain include files that are not input files but should be processed by # the preprocessor. INCLUDE_PATH = # You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard # patterns (like *.h and *.hpp) to filter out the header-files in the # directories. If left blank, the patterns specified with FILE_PATTERNS will # be used. INCLUDE_FILE_PATTERNS = # The PREDEFINED tag can be used to specify one or more macro names that # are defined before the preprocessor is started (similar to the -D option of # gcc). The argument of the tag is a list of macros of the form: name # or name=definition (no spaces). If the definition and the = are # omitted =1 is assumed. To prevent a macro definition from being # undefined via #undef or recursively expanded use the := operator # instead of the = operator. PREDEFINED = # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then # this tag can be used to specify a list of macro names that should be expanded. # The macro definition that is found in the sources will be used. # Use the PREDEFINED tag if you want to use a different macro definition that # overrules the definition found in the source code. EXPAND_AS_DEFINED = # If the SKIP_FUNCTION_MACROS tag is set to YES (the default) then # doxygen's preprocessor will remove all references to function-like macros # that are alone on a line, have an all uppercase name, and do not end with a # semicolon, because these will confuse the parser if not removed. SKIP_FUNCTION_MACROS = YES #--------------------------------------------------------------------------- # Configuration::additions related to external references #--------------------------------------------------------------------------- # The TAGFILES option can be used to specify one or more tagfiles. # Optionally an initial location of the external documentation # can be added for each tagfile. The format of a tag file without # this location is as follows: # # TAGFILES = file1 file2 ... # Adding location for the tag files is done as follows: # # TAGFILES = file1=loc1 "file2 = loc2" ... # where "loc1" and "loc2" can be relative or absolute paths or # URLs. If a location is present for each tag, the installdox tool # does not have to be run to correct the links. # Note that each tag file must have a unique name # (where the name does NOT include the path) # If a tag file is not located in the directory in which doxygen # is run, you must also specify the path to the tagfile here. TAGFILES = # When a file name is specified after GENERATE_TAGFILE, doxygen will create # a tag file that is based on the input files it reads. GENERATE_TAGFILE = # If the ALLEXTERNALS tag is set to YES all external classes will be listed # in the class index. If set to NO only the inherited external classes # will be listed. ALLEXTERNALS = NO # If the EXTERNAL_GROUPS tag is set to YES all external groups will be listed # in the modules index. If set to NO, only the current project's groups will # be listed. EXTERNAL_GROUPS = YES # The PERL_PATH should be the absolute path and name of the perl script # interpreter (i.e. the result of `which perl'). PERL_PATH = /usr/bin/perl #--------------------------------------------------------------------------- # Configuration options related to the dot tool #--------------------------------------------------------------------------- # If the CLASS_DIAGRAMS tag is set to YES (the default) Doxygen will # generate a inheritance diagram (in HTML, RTF and LaTeX) for classes with base # or super classes. Setting the tag to NO turns the diagrams off. Note that # this option also works with HAVE_DOT disabled, but it is recommended to # install and use dot, since it yields more powerful graphs. CLASS_DIAGRAMS = YES # You can define message sequence charts within doxygen comments using the \msc # command. Doxygen will then run the mscgen tool (see # http://www.mcternan.me.uk/mscgen/) to produce the chart and insert it in the # documentation. The MSCGEN_PATH tag allows you to specify the directory where # the mscgen tool resides. If left empty the tool is assumed to be found in the # default search path. MSCGEN_PATH = # If set to YES, the inheritance and collaboration graphs will hide # inheritance and usage relations if the target is undocumented # or is not a class. HIDE_UNDOC_RELATIONS = YES # If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is # available from the path. This tool is part of Graphviz, a graph visualization # toolkit from AT&T and Lucent Bell Labs. The other options in this section # have no effect if this option is set to NO (the default) HAVE_DOT = YES # The DOT_NUM_THREADS specifies the number of dot invocations doxygen is # allowed to run in parallel. When set to 0 (the default) doxygen will # base this on the number of processors available in the system. You can set it # explicitly to a value larger than 0 to get control over the balance # between CPU load and processing speed. DOT_NUM_THREADS = 0 # By default doxygen will write a font called Helvetica to the output # directory and reference it in all dot files that doxygen generates. # When you want a differently looking font you can specify the font name # using DOT_FONTNAME. You need to make sure dot is able to find the font, # which can be done by putting it in a standard location or by setting the # DOTFONTPATH environment variable or by setting DOT_FONTPATH to the directory # containing the font. DOT_FONTNAME = Helvetica # The DOT_FONTSIZE tag can be used to set the size of the font of dot graphs. # The default size is 10pt. DOT_FONTSIZE = 10 # By default doxygen will tell dot to use the output directory to look for the # FreeSans.ttf font (which doxygen will put there itself). If you specify a # different font using DOT_FONTNAME you can set the path where dot # can find it using this tag. DOT_FONTPATH = # If the CLASS_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect inheritance relations. Setting this tag to YES will force the # the CLASS_DIAGRAMS tag to NO. CLASS_GRAPH = YES # If the COLLABORATION_GRAPH and HAVE_DOT tags are set to YES then doxygen # will generate a graph for each documented class showing the direct and # indirect implementation dependencies (inheritance, containment, and # class references variables) of the class with other documented classes. COLLABORATION_GRAPH = YES # If the GROUP_GRAPHS and HAVE_DOT tags are set to YES then doxygen # will generate a graph for groups, showing the direct groups dependencies GROUP_GRAPHS = YES # If the UML_LOOK tag is set to YES doxygen will generate inheritance and # collaboration diagrams in a style similar to the OMG's Unified Modeling # Language. UML_LOOK = NO # If set to YES, the inheritance and collaboration graphs will show the # relations between templates and their instances. TEMPLATE_RELATIONS = NO # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDE_GRAPH, and HAVE_DOT # tags are set to YES then doxygen will generate a graph for each documented # file showing the direct and indirect include dependencies of the file with # other documented files. INCLUDE_GRAPH = YES # If the ENABLE_PREPROCESSING, SEARCH_INCLUDES, INCLUDED_BY_GRAPH, and # HAVE_DOT tags are set to YES then doxygen will generate a graph for each # documented header file showing the documented files that directly or # indirectly include this file. INCLUDED_BY_GRAPH = YES # If the CALL_GRAPH and HAVE_DOT options are set to YES then # doxygen will generate a call dependency graph for every global function # or class method. Note that enabling this option will significantly increase # the time of a run. So in most cases it will be better to enable call graphs # for selected functions only using the \callgraph command. CALL_GRAPH = NO # If the CALLER_GRAPH and HAVE_DOT tags are set to YES then # doxygen will generate a caller dependency graph for every global function # or class method. Note that enabling this option will significantly increase # the time of a run. So in most cases it will be better to enable caller # graphs for selected functions only using the \callergraph command. CALLER_GRAPH = NO # If the GRAPHICAL_HIERARCHY and HAVE_DOT tags are set to YES then doxygen # will generate a graphical hierarchy of all classes instead of a textual one. GRAPHICAL_HIERARCHY = YES # If the DIRECTORY_GRAPH, SHOW_DIRECTORIES and HAVE_DOT tags are set to YES # then doxygen will show the dependencies a directory has on other directories # in a graphical way. The dependency relations are determined by the #include # relations between the files in the directories. DIRECTORY_GRAPH = YES # The DOT_IMAGE_FORMAT tag can be used to set the image format of the images # generated by dot. Possible values are svg, png, jpg, or gif. # If left blank png will be used. DOT_IMAGE_FORMAT = png # The tag DOT_PATH can be used to specify the path where the dot tool can be # found. If left blank, it is assumed the dot tool can be found in the path. DOT_PATH = # The DOTFILE_DIRS tag can be used to specify one or more directories that # contain dot files that are included in the documentation (see the # \dotfile command). DOTFILE_DIRS = # The MSCFILE_DIRS tag can be used to specify one or more directories that # contain msc files that are included in the documentation (see the # \mscfile command). MSCFILE_DIRS = # The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of # nodes that will be shown in the graph. If the number of nodes in a graph # becomes larger than this value, doxygen will truncate the graph, which is # visualized by representing a node as a red box. Note that doxygen if the # number of direct children of the root node in a graph is already larger than # DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note # that the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH. DOT_GRAPH_MAX_NODES = 50 # The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the # graphs generated by dot. A depth value of 3 means that only nodes reachable # from the root by following a path via at most 3 edges will be shown. Nodes # that lay further from the root node will be omitted. Note that setting this # option to 1 or 2 may greatly reduce the computation time needed for large # code bases. Also note that the size of a graph can be further restricted by # DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction. MAX_DOT_GRAPH_DEPTH = 0 # Set the DOT_TRANSPARENT tag to YES to generate images with a transparent # background. This is disabled by default, because dot on Windows does not # seem to support this out of the box. Warning: Depending on the platform used, # enabling this option may lead to badly anti-aliased labels on the edges of # a graph (i.e. they become hard to read). DOT_TRANSPARENT = NO # Set the DOT_MULTI_TARGETS tag to YES allow dot to generate multiple output # files in one run (i.e. multiple -o and -T options on the command line). This # makes dot run faster, but since only newer versions of dot (>1.8.10) # support this, this feature is disabled by default. DOT_MULTI_TARGETS = NO # If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will # generate a legend page explaining the meaning of the various boxes and # arrows in the dot generated graphs. GENERATE_LEGEND = NO # If the DOT_CLEANUP tag is set to YES (the default) Doxygen will # remove the intermediate dot files that are used to generate # the various graphs. DOT_CLEANUP = YES papi-5.3.0/doc/doxygen_procedure.txt0000600003276200002170000000451712247131117017203 0ustar ralphundrgrad******************************************************************************** Check the version of doxygen you're using, there is a bug with older versions ( < 1.7.4 ) ******************************************************************************** USAGE ======================= To invoke doxygen, cd $(papi_dir)/doc make (alternativly doxygen Doxyfile-{html,man1,man3} This command produces documentation for the PAPI user-exposed api and data-structures. There are several different configuration files are present: Doxyfile-html - generates documentation for everything under src. This will take a long time to run, and generates north of 600 megs of files. Requires the program dot, for dependency graphs. Doxyfile-man1 - generates man-pages for the utilities. Doxyfile-man3 - generates man-pages for the API, see papi.h Commenting the Code ======================= To get doxygen's attention, in general, use a special comment block /** */ thing_to_be_commented Doxygen responds to several special commands, denoted by @command (if you're feeling texy, \command) As an artifact of how doxygen started life, we call our api functions 'classes' to get doxygen to generate man-pages for the function. /** @class MY_FUNCTION @brief gives a brief overview of what the function does, limited to 1 line or 1 sentence if you need the space. @param arg1 describes a parameter to the function @return describes the functions return value @retval allows you to enumerate return values Down here we have more detailed information about the function Which can span many lines And paragraphs (feeling texy now?) @par Examples: @code This is the way to get examples to format nicely code goes here.... @endcode @bug Here you get a section of freeform text to describe bugs you encounter. */ @internal keeps comment blocks marked as such out of the documentation (unless the INTERNAL_DOCS flag is set in the config file) In several places /**< */ appears, this means that the comment pertains to the previous element. int foo; /**< This comment is about foo */ TODO ======================= Doxygen provides options for [ab]using the preprocessor, Do we need to look into this? Probably not more than we already do -J Document the ctests? See http://www.stack.nl/~dimitri/doxygen/docblocks.html for more detail on doxygen. papi-5.3.0/ChangeLogP520.txt0000600003276200002170000016550612247131117015115 0ustar ralphundrgrad2013-08-02 * 6b62d586 man/man1/papi_avail.1 man/man1/papi_clockres.1 man/man1/papi_command_line.1...: Update the manpages for a pending 5.2 release. New pages for PAPI[F]_epc and papi_version. * 1ae08835 src/linux-common.c: try to properly detect number of sockets Use totalcpus rather than ncpu in the calculation. This change fixes things on a Sandybridge-EP machine. We should maybe find a more robust way to detect this. * 79c37fbf .../perf_event_uncore/tests/perf_event_uncore.c .../tests/perf_event_uncore_multiple.c: perf_event_uncore: have tests skip if component disabled rather than fail * 638ccf6b .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: change order of uncore detection logic This way it will report an error of "no uncore found" before it reports "not enough permissions". That way a user won't waste time getting permissions only to find out they didn't have an uncore anyway. * 30582773 src/components/perf_event/pe_libpfm4_events.c: perf_event: fix papi_native_avail output A recent change of mine that added stricter error checking for libpfm4 event lookup broke event enumeration on perf_event, specifically papi_native_avail output. libpfm4 will return an error on some events if no UMASK or improper UMASK is supplied, but papi_native_avail always wants to print the root event and umasks separately. this temporary fix just ignores libpfm4 umask errors; we might in the future want to properly indicate which events are only valid when certain umasks are present. * c7612326 src/utils/native_avail.c: papi_native_avail: fix empty component case If a component had no events, papi_native_avail would ignore the error returned by PAPI_enum_cmp_event( PAPI_ENUM_FIRST ); and try to print a first event anyway. * e1b064eb .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: disable component if no events found This can happen on older (pre 3.6) kernels with the new libpfm4 that does proper uncore detection. 2013-08-01 * 9a54633a src/components/host_micpower/linux-host_micpower.c src/components/infiniband/linux-infiniband.c src/components/nvml/linux-nvml.c...: Components: Use the cuda dlopen fix all cases. See 4cb76a9b for details, the short version is if you call dlopen when you have been statically linked to libc, it gets ugly. 2013-07-31 * dbc44ed1 src/components/perf_event/pe_libpfm4_events.c .../perf_event_uncore/perf_event_uncore.c .../perf_event_uncore/peu_libpfm4_events.c: perf_event libpfm4 events -- correctly handle invalid events It was possible for event names to be obtained from libpfm4 during enumeration that were not valid events. This usually happens with uncore events, where the uncore is listed as available based on cpuid but when libpfm4 tries to get the uncore type from the kernel finds out it is unsupported. This change makes this properly fail, instead of just returning "0" for all the event paramaters (which is a valid event on x86). Also make this change in the regular perf_event component, even though it is less likely to happen in practice. * 4720890a .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: remove check_permissions() test It was trying to see if an EventSet was runnable by using the current permissions and adding the PERF_HW_INSTRUCTIONS event. That doesn't really make sense on uncore. The perf_event component uses this test to try to give errors early, at set_opt() time rather than at the first run time, although in practice now we can probably make intelligent guesses based on the current permission levels. * 113d35f7 .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: remove unused kernel workarounds uncore only works on Linux 3.6 or newer so all of the pre-2.6.35 workarounds aren't necessary. If someone has backported the uncore support to kernels that old, hopefully they've also backported all the other bugfixes too. 2013-07-25 * 4cb76a9b src/components/cuda/linux-cuda.c: Trial fix for the cuda component static libc linking issue. Weak link against _dl_non_dynamic_init, this appears in my limited testing to be in gnu libc.a and not in the so. For background, it was reported by Steve Kaufmann that statically linking tools with a PAPI library configured with the CUDA component segfaulted. It appears that calling any of the dynamic linker functions from a static executable is asking for pain. See Trac bug 182 https://icl.cs.utk.edu/trac/papi/ticket/182 2013-07-24 * ad47cfb9 src/configure src/configure.in: Add linux-pfm-ia64 to configure I'm not sure if this is enough to fix itanium support but it's a start. * 098294c5 src/components/example/tests/example_basic.c .../example/tests/example_multiple_components.c: Fixed tests for example component. Both tests failed due to incorrect check of the components PAPI has been configured with. 2013-07-23 * c0c4caf4 src/linux-memory.c src/papi_events.csv: Add initial support for IBM POWER8 processor Add initial support for IBM POWER8 processor The IBM POWER8 processor (to be publicly announced at some future date) has some preliminary support in libpfm with a subset of native events. These POWER8-related libpfm changes were pulled into PAPI on July 3, so further updates in PAPI were required to support this new processor. This patch adds that required support. NOTE: Due to the fact that only a subset of native events have been publicised at this point (and pushed into libpfm), not all of the usual PAPI preset events have corresponding native events. The rest of the POWER8 native events will be pushed upstream once they are verified, and then we can flesh out the PAPI preset events. With this initial POWER8 support patch, 5 of the ctests and ftests fail, compared to 3 when PAPI is run on a POWER7. At least one of the failing testcases is due to testing being done on an early POWER8 processor with some known hardware problems. We presume the number of failing tests will decrease once we have GA-level hardware to test on. 2013-07-22 * 6c231d1a src/configure: Rerun autoconf for f4ec143e Correct versioning of libpapi.so * f4ec143e src/configure.in: Correct versioning of libpapi.so The configure for linux always set the soname to libpapi.so. This causes problems when /sbin/ldconfig tries to update the library information on linux. The shared library is installed as /lib{64}/libpapi.so.$VERSION, but the shared library has the soname of libpapi.so. ldconfig makes a symbolic link from /lib/libpapi.so to the actual versioned shared library, /lib/{64}/libpapi.so$VERSION. The configure should get the soname correct to avoid creating this symbolic link. This patch only addresses the issues for some of the possible platforms and similar patches may be needed for other platforms. 2013-07-19 * 92356bbd src/papi.c src/threads.c src/threads.h: Attempt to fix a memory leak in fork2 test. Fork2 does the following: PAPI_library_init() fork(); / \ parent child wait() PAPI_shutdown() -> _papi_hwi_shutdown_global_threads() -> foreach(threadinfo we allocated): _papi_hwi_shutdown_thread() PAPI_library_init() _papi_hwi_shutdown_thread checks who allocated a ThreadInfo entry in the global list, and will only free it if our thread did the allocation. When threading is not initialized, we fall back to getpid(), now in the child process, the one ThreadInfo item on the list was allocated by our parent, so at shutdown time we don't free this, and thus leak it. Solution is to add a parameter to _hwi_shutdown_thread to force shutdown even if we didn't allocate it. At _papi_hwi_shutdown_global_threads() time, who cares, its closing time. * c04d908e src/cpus.c: Fix a deadlock in _papi_hwi_lookup_cpu(). If cpu_num is not found by _papi_hwi_lookup_cpu(), _papi_hwi_initialize_cpu() calls insert_cpu(), which locks CPUS_LOCK, which was already held by _papi_hwi_lookup_cpu(). * efac24c4 src/components/micpower/linux-micpower.c: micpower: fix return value check Also add a time check at stop time. 2013-07-16 * b9fd9dd1 src/configure src/configure.in: configure: Fix AIX build perfctr_ppc was not the only system that relied on ppc64_events.h, power*.h, and friends. First run at a fix is -Icomponents/perfctr_ppc for the C and F flags... * 46042e68 src/components/micpower/linux-micpower.c: micpower: update some indexing code 2013-07-15 * 5220e7d2 INSTALL.txt: INSTALL.txt: typo --with-arch=, not --arch=; Thanks to Karl Schulz for catching this. * 207e0ee0 src/papi_libpfm_events.h: papi_libpfm_events: needs include files for types. Include papi.h and papi_vector.h for papi_vector_t and PAPI_component_info_t * d96c01c7 src/components/perfctr/perfctr.c: perfctr: cleanup a warning Include papi_libpfm_events.h for _papi_libpfm_init() decl. * 367e1b38 src/components/perfctr/perfctr-x86.c src/components/perfctr/perfctr.c: perfctr: refactor out setup_x86_presets The setup_presets function served only to call _papi_libpfm_init, so we go the rest of the way and completly remove the function, calling _papi_libpfm_init directly from _perfctr_init_component. * 1ba38ce5 src/components/perfctr/perfctr-x86.c: perfctr: cleanup unused parameter warning. The perfctr code was refactored to only call into the table loading code one time. This had the side effect of removing most of what setup_x86_presets does. * 02710ced src/configure src/configure.in: configure: remove debugging message The compiler detection code had a stray AC_MSG_RESULT. 2013-07-12 * 028ce29d src/components/lustre/linux-lustre.c: lustre: use whole directory name as event Gary Mohr reported that on a trial system he was seeing many events of the form fs3-* which were all chopped to fs3, not helpful. I've not actually been able to figure out exactly how lustre names things, I've seen it described as - But have no clue what uid promisses. 2013-07-15 * 129d4587 src/papi.c: allow more than one EventSet attach to a CPU at a time This is necessary for perf_event_uncore support, as multiple uncores will want to attach to a CPU. It looks like this change won't break anything, and the tests pass on my test machines. I am a bit concerned about cpu->running_eventset, though no one seems to use that value... * bcda5ddd src/components/perf_event_uncore/tests/Makefile .../tests/perf_event_uncore_nogran.c: perf_event_uncore: remove perf_event_uncore_nogran test It is unnecessary after recent changes to the uncore component. * b1b9f654 src/components/perf_event_uncore/tests/Makefile .../tests/perf_event_uncore_cbox.c: perf_event_uncore: add perf_event_uncore_cbox test This adds a non-trivial test of the CBOX uncores. It turned up various bugs in the PAPI uncore implementation. * df1b6453 src/linux-common.c: linux: properly set hwinfo->socket value It was being derived from hwinfo->ncpu but being calculated before hwinfo->ncpu was set. 2013-07-13 * ee537448 .../perf_event_uncore/perf_event_uncore.c .../perf_event_uncore/peu_libpfm4_events.c .../perf_event_uncore/peu_libpfm4_events.h: perf_event_uncore: properly report number of total counters available * 7eb93917 src/components/perf_event/Rules.perf_event src/components/perf_event/pe_libpfm4_events.c src/components/perf_event/pe_libpfm4_events.h...: perf_event/perf_event_uncore/libpfm4 -- rearrange files Give perf_event and perf_event_uncore copies of papi_libpfm4_events to work with, as they will have different needs for the code. Get rid of the perf_event_lib stuff. It was a hack to begin with and in the end not much code will be shared. Maybe we can re-share things once uncore support is complete. 2013-07-12 * 6810af2a src/components/perf_event/perf_event.c .../perf_event_uncore/perf_event_uncore.c src/papi_libpfm4_events.c...: papi_libpfm4: properly call pfm_terminate() in papi_libpfm4_shutdown * 010497f4 src/components/perf_event/perf_event.c .../perf_event_uncore/perf_event_uncore.c src/papi_libpfm4_events.c...: split papi_libpfm4_init() split this function because the perf_event_uncore() component is going to want to initialize things differently than plain perf_event * d9023411 src/components/perf_event/perf_event.c: perf_event: on old kernels if SW Multiplex enabled, then report proper number of MPX counters available it may be different than the amount HW supports * 7595a840 src/components/perf_event/perf_event_lib.c: perf_event: use PERF_IOC_FLAG_GROUP when resetting events This ioctl argument specifies to reset all events in a group, so we don't have to iterate. This argument dates back to the introduction of perf_event and it makes the code a bit cleaner. * f220fd19 src/ctests/Makefile src/ctests/reset_multiplex.c: Add reset_multiplex.c PAPI_reset() potentially exercises different paths when resetting normal and multiplexed eventsets, so make sure we test both. * f784a489 src/components/lustre/linux-lustre.c: lustre: botched a conflict resolution properly do error checking on addCounter() * c1350fc8 src/components/perf_event/perf_event.c src/components/perf_event/perf_event_lib.c src/components/perf_event/perf_event_lib.h: perf_event: move overflow and profile code out of common lib the perf_event_uncore component doesn't need it * 8dde03fc .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: remove profiling and overflow code perf_event doesn't support sampling or overflow on uncore * 30d23636 src/components/lustre/linux-lustre.c: lustre component: Several fixes 1. create a dynamic native events table in pathalogical cases, lustre can have lots of events. 2. resolve some warnings change signature of init_component properly error check addCounter 3. Add a preprocessor flag to fake interface Set LIBCFLAGS="-DFAKE_LUSTRE" * 7ef51566 .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: remove dispatch timer call perf_event doesn't support sampling on uncore events * 667661c6 src/components/perf_event/perf_event.c src/components/perf_event/perf_event_lib.c src/components/perf_event/perf_event_lib.h: perf_event: move rdpmc detection back into perf_event.c It was in the perf_event_lib but uncore won't use the feature. * d46f01e1 .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: check the paranoid file Disable the component if paranoid isn't 0 or lower, and we're not running as root. * e4ec67d1 src/components/perf_event/perf_event.c: perf_event and paranoid level 2 If paranoid level 2 (no kernel events) was set we were removing PAPI_DOM_KERNEL from the allowable domains We were doing this even if the user was root. This code checks for uid 0 and overrides the restriction. * c5501081 src/components/perf_event/perf_event_lib.c src/components/perf_event/perf_event_lib.h: rename sys_perf_event_open2() call back to sys_perf_event_open() This was changed when merging code to avoid a conflict but wasn't renamed back whe the conflict was fixed. 2013-07-11 * e263ea60 src/configure src/configure.in: configure: libpfm selection logic rework If configure detected perfctr it would force libpfm3 to be used, even with --with-perf_events, now force libpfm4 if perf_events is requested. 2013-07-10 * 7a3ce030 .../host_micpower/Makefile.host_micpower.in src/components/host_micpower/Rules.host_micpower src/components/host_micpower/configure...: Component: host_micpower This is a component that exports power information for Intel Xeon Phi cards (MIC). The component makes use of the MicAccessAPI distributed with the Intel Manycore Platform Software Stack. k-mpss) * 9d9bd9c2 src/ctests/shlib.c: Fwd: Re: [Ptools-perfapi] ctests/shlib FAILED Should have sent this to the papi devel list. -Will -------- Original Message -------- Subject: Re: [Ptools-perfapi] ctests/shlib FAILED Date: Tue, 09 Jul 2013 23:20:10 -0400 From: William Cohen To: ptools-perfapi@eecs.utk.edu On 03/09/2012 03:40 PM, William Cohen wrote: > I was looking through the test results and found that ctests/shlib FAILED on all the machines I tested on because libm shared library is already linked in. There is no difference in the number of shared libraries before and after the dlopen. The test ctests/shlib fails as a reult of this. > > -Will > _______________________________________________ > Ptools-perfapi mailing list > Ptools-perfapi@eecs.utk.edu > http://lists.eecs.utk.edu/mailman/listinfo/ptools-perfapi > I did some more investigation of this problem today. I found that the lmsensor component implicitly pulls in the libm. As an alternative, I wrote the attached patch that uses setkey() and encrypt() in libcrypt.so instead. It works on various linux machines, but I do not know whether it is going to work on other OS. -Will >From c53c97e1de2d1c7dc0bca64d1906287ff73343c6 Mon Sep 17 00:00:00 2001 From: William Cohen Date: Tue, 9 Jul 2013 22:37:27 -0400 Subject: [PATCH] Avoid using libm.so for ctests/shlib because of implicit use in some components The lmsensors component can implicitly pull in libm.so into the executable. Unfortunately, the ctests/shlib test expects that libm.so is not loaded and will fail because there is no change in the count of shared libraries. The patch uses libcrypt.so library setkey and encrypt functions to test PAPI_get_shared_lib_info( ) instead of libm.so library pow function. 2013-07-09 * bdc9b34b .../tests/perf_event_amd_northbridge.c: Perf_event_amd_northbridge_test: Use buffer event_name instead of uncore_event The variable uncore_event is initialized to NULL and is never changed during execution of the test. PAPI_add_named_event fails and the event set cannot be started. The correct event name is stored in event_name, replacing all occurrences of uncore_event with event_name therefore fixes the problem metioned above. 2013-07-08 * a1678388 src/components/micpower/linux-micpower.c: micpower: Fix output in native_avail and component_avail. It uses cmp_info.name, not .short_name? Native Events in Component: mic-power Name: mic-power Component for reading power on Intel Xeon Phi (MIC) Should both match what is prepended to event names, so change .name from mic-power to micpower. * e0582f2d src/components/micpower/linux-micpower.c: Micpower: fix a typo subsystem, not sybsystem... * c7b357ec INSTALL.txt: INSTALL.txt: update instructions for MIC. * 34a1124e src/components/perf_event_uncore/tests/Makefile .../tests/perf_event_amd_northbridge.c: Add perf_event_amd_northbridge test The test should show how to write a program using AMD fam15h NB with a 3.9 kernel. Once libpfm4 gets updated we can see if it's possible to also have the test properly run on 3.10 kernels (in that case the regular perf_event_uncore test should work w/o changes) * 41b6507c .../perf_event_uncore/tests/perf_event_uncore.c .../tests/perf_event_uncore_multiple.c: Make perf_event_uncore tests use PAPI_get_component_index() They were open-coding the component name search for no good reason. 2013-07-05 * abf38945 src/papi_libpfm4_events.c: avoid having a "default" PMU for the uncore component on the main CPU component we have a "default" PMU where you can leave out the PMU part of the event name. This is unnecessary and sometimes confusing on uncore, so always print the full event name if it's an uncore PMU. * b9fe5c3e .../perf_event_uncore/tests/perf_event_uncore.c .../tests/perf_event_uncore_multiple.c: Update perf_event_uncore tests to properly fail if they don't have enough permissions * 32ae1686 .../perf_event_uncore/tests/perf_event_uncore.c: perf_event_uncore_test : properly use uncore component The sample code was still hardcoding to component "0" which shouldn't have worked. Thanks to Claris Castillo for pointing out this problem. * 59e73b51 src/papi_libpfm4_events.c: have _papi_libpfm4_ntv_name_to_code properly check pmu_type With the existing code, uncore events were being found by the perf_event component even when that component has uncore events distabled. 2013-07-03 * a01394eb .../tests/perf_event_uncore_lib.c: perf_event_uncore: fix ivb event in uncore test Now that libpfm4 officially supports plain ivb uncore, make sure the test event we were using matches what libpfm4 supports. 2013-07-01 * f10342a8 src/utils/cost.c: Clean up option handling in papi_cost The papi_cost used strstr to seach for the substring that matched the option. this is pretty inexact. Made sure that the options matched exactly and the option argments for -b and -t were greater than 0. Also make papi_cost print out the help if there was an option that it didn't understand. * b5adc561 src/utils/native_avail.c: Clean up option handling for papi_native_avail Corrected the help to reflect the name of the option "--noumasks". Print error message if the "-i", "-e", and "-x" option arguments are invalid. Avoid using strstr() for "-h", use strcmp instead. Also check for "--help" option. * 8933be9b src/utils/decode.c: Clean up option handling in papi_decode papi_decode used strstr() to match options; this can lead to inexact matchs. The code should used strcmp instead. Make sure command name is not processed as an option. Also print help iformation is some argument is not understood. * d94ac43a src/utils/component.c: Improve option matching in papi_component and add "--help" option * bb63fe5c src/utils/command_line.c: Add options to papi_command_line man page and improve opt handling Add options mention in the -h to the man page. Also improve the matching of the options. * 09059c82 doc/Makefile src/utils/version.c: Add information for papi_version to be complete * 4f2eee8c src/configure src/configure.in: add a --disable-perf-event-uncore option to configure 2013-06-29 * 901c5cc2 src/components/perf_event/perf_event.c src/components/perf_event/perf_event_lib.c .../perf_event_uncore/perf_event_uncore.c...: remove syscalls.h it's no longer needed * 4d7e3666 src/Rules.perfmon2 src/components/perfmon2/Rules.perfmon2 src/components/perfmon2/perfmon.c...: move perfmon modules to their own component directory * a7e9c5f1 src/Rules.perfctr src/Rules.perfctr-pfm src/components/perfctr/Rules.perfctr...: move perfctr files to components/perfctr directory verified that perfctr-x86 still builds and works perfctr_ppc has all the files to build, but it doesn't work. It looks like no one has tried to build perfctr-ppc for a very very long time. 2013-06-27 * e9dec1fd src/ctests/hl_rates.c src/papi.h src/papi_fwrappers.c...: debugged versions of these files * e282034e src/utils/native_avail.c: native_avail: Fix parse_unit_mask code Reported by Steve Kaufmann -------------------------- I noticed while developing a new component that the output from papi_native_avail was incorrectly presented for the component. I believe this is because the ":::" prefix is not being taken into account, so the base event name is interpreted as a unit mask and is prepend with a : before each legitimate unit mask associated with the event. I think this is just now happening because mine is the first component that has unit masks. I have include a fix below. The output of the unit masks by papi_native_avail now appears correctly for my component. Thanks, Steve 2013-06-26 * ff096786 src/ctests/fork2.c: fork2: Return fork2 test to its old functionality Once upon a time fork2 did: PAPI_library_init() … if ( fork() == 0) PAPI_shutdown() PAPI_library_init() … 2013-06-25 * 978d0d3d src/examples/PAPI_add_remove_event.c src/papi.c: Modify PAPI_list_events functionality to match documentation. You can now pass in a NULL event array and a zero count to get back the valid number of events. This can then be used to allocate the array and retrieve the exact number of events. Thanks to Nils Smeds and Alain Miniussi for pointing this out. * 13c52402 src/examples/PAPI_add_remove_event.c src/papi.c: Modify PAPI_list_events functionality to match documentation. You can now pass in a NULL event array and a zero count to get back the valid number of events. This can then be used to allocate the array and retrieve the exact number of events. Thanks to Nils Smeds and Alain Miniussi for pointing this out. * 656e703e src/ctests/zero_fork.c: zero_fork ctest : make documentation match code * 96aad0c7 src/ctests/forkexec.c: forkexec ctest : make comments match code * b7c70953 src/ctests/forkexec4.c: forkexec4 ctest : make comments match the code * 7ffb0245 src/ctests/forkexec3.c: forkexec3 ctest : make documentation match code * 55ea846c src/ctests/forkexec2.c: forkexec2 ctest: have comments match what source does * 7a601e2a src/ctests/Makefile src/ctests/fork2.c: fork2 ctest: remove; was an exact duplicate of fork * 9deff49b src/ctests/fork.c: fork ctest: make comments match what file actually does 2013-06-24 * 2770d2c5 src/components/perf_event/perf_event_lib.c: perf_event: fix failure on ARM due to domain settings forgot to git add the perf_event_lib.c file :( * bf7c4c50 src/components/perf_event/perf_event.c src/components/perf_event/perf_event_lib.h: perf_event: fix failure on ARM due to domain settings On Cortex A8 and A9 it's not possible to set exclude_kernel (hardware does not support it). Make sure the rdpmc detection code doesn't try to set exclude_kernel. 2013-06-18 * 2b1433d8 src/ctests/all_native_events.c src/ctests/get_event_component.c: ctests: Skip calling into disabled components. This patch fixes a problem that was causing two test cases to abort when they were run on a system which has disabled components. Code was added to check if the component is disabled and just go to the next component in the list when the check is true. This prevents calls to code in components which may abort because the component was unable to initialize itself correctly. Thanks to Gary Mohr and Chuck LaCasse from Bull for reporting. 2013-06-14 * 1872453c src/testlib/do_loops.c: testlib: don't change the iter count The first argument to do_misses is an iteration count, for some reason the code was dividing this in half before doing work. Most places that call do_misses call it as do_misses ( 1, ...) void do_misses( int n, int bytes ) { {...} n = n / 2; for ( j = 0; j < n; j++ ) { 1/2 == 0; so our do_misses call was usually not. Thanks Nils Smeds for reporting. 2013-06-12 * c113e5b6 src/components/infiniband/Makefile.infiniband.in src/components/infiniband/Rules.infiniband src/components/infiniband/configure...: Infiniband component: switch over to weak linking Thnaks to Gary Mohr for the patch. ---------------------------------- The infiniband component needs include files and libraries from both the infiniband ibmad and ibumad packages. When these packages are installed on a system, both packages normally install their files in the same place (includes in /usr/include/infiniband and libraries in /usr/lib64). The current component configure script allows you to provide a single include path and a single library path which gets used to access files from both packages. If these two packages have different install prefixes (or you are trying to build from install images of each package which are not located under the same directory) then the configure script fails because it can not find all the files it needs. These changes modify the configure script to replace the include and library dir's with an ibmad_dir and ibumad_dir and then uses the correct packages directory when looking for includes and libraries from that package. This makes it work like the cuda and nvml components with respect to configuring how to find files from a package the component depends on. There are also changes in this patch file to remove an unneeded variable in the dlopen code to resolve some defects reported by coverity. 2013-06-11 * d5be5643 src/components/rapl/tests/rapl_basic.c: rapl tests: make the error messages a little more verbose * 0c9f1a8c src/run_tests_exclude.txt src/run_tests_exclude_cuda.txt: run_tests_exclude files: Exclude a template file ------------------------------------------- It also adds the cpi.pbs file to the list of files to excluded when the tests are run. This file is just a template and attempts to run it hang the run_tests script on our systems. ------------------------------------------- * 0a063619 src/run_tests.sh: run_tests.sh: fix exclude check. The script failed to remove .cu files, this patch fixes the check. Thanks Gary Mohr for reporting/patching. 2013-06-10 * 87399477 src/components/cuda/linux-cuda.c: cuda component: Address a coverity issue The library linking code saved return values in a local var but never used them. Thanks to Gary Mohr for submitting this patch. * 99b5b685 src/components/coretemp/tests/coretemp_basic.c: coretemp_basic: update test to properly enumerate events The code was old and was searching the entire native event list for ones that started with "hwmon". This updates the test to first find the coretemp component, then enumerate all events contained within. * b5c0795b src/components/rapl/tests/rapl_overflow.c: rapl component: address potential looping issue in test. A rapl component test has a do/while which only exited when PAPI_add_named_event returned 0 ( and only 0; the PAPI_E* error codes would not terminate a while( retval ) loop), this felt fragile, minimal checks are now inplace. * 4e9484a5 src/components/rapl/tests/rapl_overflow.c: rapl components: coverity fixes Reported/patched by Gary Mohr ----------------------------- The rapl component also has 1 defect in a test case. The complaint is that there is code that can never be executed. But this one is not as clear, it says that you can not exit the do/while loop that preceeds a test of retval until retval=0 which means the test can never be true. The patch I am providing is to again remove the if test and its contents. But I am concerned that the do/while loop preceeding the test could result in a hard loop that would hang the test case forever. It seems to me like something should also be done to insure the loop will exit at some point. Here is a patch that provides at least part of the fix: ----------------------------- * 0a533810 src/components/net/tests/net_values_by_name.c: net components: coverity fixes Reported/patched by Gary Mohr ----------------------------- The net component has one defect in one of the test cases. The complaint is that there is code that can never be executed. There is a test to see if event_count == 0 which can never be true at that place in the code. So I removed the if statement and its contents. Here is the patch: ----------------------------- 2013-06-07 * b784b063 src/components/nvml/Rules.nvml src/components/nvml/configure src/components/nvml/configure.in...: nvml: Apply Gary Mohr's dlopen patch. Move the nvml component over to using the dlopen and weak linking infrastructure of the cuda component. Thanks, Gary. * d6505b76 src/components/rapl/utils/rapl_plot.c: rapl: update the rapl_plot utility Get the event names by enumerating the ones available with the RAPL component rather than having a hard-coded list. * 2094c5b1 src/components/rapl/linux-rapl.c: rapl: add better error messages on component init failure * d0e668fb src/ctests/Makefile src/ctests/high-level.c src/ctests/hl_rates.c...: First round of changes to implement a PAPI high level event per cycle call. Untested. 2013-06-05 * 63074f82 src/components/rapl/linux-rapl.c: rapl: Add Ivb-EP support The Intel docs are spotty on what is actually supported. They state: 14.7.2 RAPL Domains and Platform Specificity The specific RAPL domains available in a platform varies across product segments. Platforms targeting client segment support the following RAPL domain hierarchy: * Package * Two power planes: PP0 and PP1 (PP1 may reflect to uncore devices) Platforms targeting server segment support the following RAPL domain hierarchy: * Package * Power plane: PP0 * DRAM 2013-05-31 * 31b4702d src/cpus.c: cpus.c: Don't run init_thread/shutdown_thread for disabled components. 2013-05-29 * c48087d2 ChangeLogP511.txt RELEASENOTES.txt: Grab the updated ChangeLog from 5.1.1 Create a ChangeLog and update RELEASENOTES for a 5.1.1 release. 2013-05-24 * d1c8769e src/components/perf_event/tests/Makefile src/components/perf_event/tests/event_name_lib.c .../perf_event/tests/perf_event_user_kernel.c: Add perf_event user/kernel domain test This will be useful if/when we start handling domains properly. * 89e1aeba src/components/perf_event/tests/Makefile src/components/perf_event/tests/event_name_lib.c src/components/perf_event/tests/event_name_lib.h...: Add perf_event offcore response test Does a quick check to see if offcore response events are working. * bda86616 .../perf_event_uncore/perf_event_uncore.c src/ctests/get_event_component.c src/papi_internal.c: Some more ctest fixes involving disabled components. We enforce disabled components sometime in the PAPI routines and sometimes in the components themselves. A bit confusing. It is tough with perf_event and perf_event_uncore because we share libpfm4 by both, so the naming library for perf_event_uncore will be active even if the component is disabled, which can cause some confusing results if your test code ignores PAPI_ENOCMP error messages and accesses a disabled component anyway. This at least fixes our test cases, we might have to revisit this later. * b596621e doc/Doxyfile-common papi.spec src/Makefile.in...: Bump version numbers Call this 5.2.0.0 simple because its greater than (and some components are completely incompatible with) 5.1.1 * eb77a91e .../perf_event_uncore/perf_event_uncore.c src/papi.c: Disallow enumerating events on disabled components. This was causing segfaults on tests where enumeration was trying to enumerate uncore events on machines w/o uncores. * 4e991a8a .../perf_event/tests/perf_event_system_wide.c: perf_event_system_wide: SKIP instead of FAIL if we don't have proper permissions * 7654bb1f src/Makefile.inc src/components/perf_event/tests/Makefile .../perf_event/tests/perf_event_system_wide.c...: move the perf_event specific tests to be with their component This means the perf_event tests will only be run if perf_event is enabled * d82e343f src/ctests/perf_event_uncore_multiple.c: ctests/perf_event_uncore_multiple: Improve this test a bit * b1a594bf src/perf_events.c src/sys_perf_event_open.c: Remove the no-longer needed perf_events files Now we use the versions in the components/perf_event directory * a9a277f3 src/Makefile.in src/Makefile.inc src/configure...: Split up CPUCOMPONENT configure variable Now it is CPUCOMPONENT_NAME CPUCOMPONENT_C CPUCOMPONENT_OBJ This allows having setups with no CPUCOMPONENT set (perf_event used as a component) while keeping backward compatible with non-component CPU components. This has been tested on perf_event and perfctr. It might break other architectures, so test if you can. * 69e29526 src/configure src/configure.in: configure: have --with-components append comonents to existing value This allows configure to earlier set the components value to include "perf_event" if detected and then later append the values passed in with --with-components * 9d28df4c src/components/perf_event/Rules.perf_event src/components/perf_event/perf_event.c src/components/perf_event/perf_event_lib.c...: add perf_event and perf_event_uncore components This adds perf_event as a standalone component. Currently it is not compiled or built, some changes need to be made to the build system before this will work. 2013-05-21 * ea996661 src/components/cuda/linux-cuda.c: eliminate warnings of unused vars * 691bf114 src/components/cuda/linux-cuda.c: eliminate warnings of unused vars * 221bfdab src/components/cuda/linux-cuda.c src/components/cuda/tests/HelloWorld.cu: Problem with cleanup_eventset(): after destroying the CUDA eventset, update_control_state() is called again which operates on the already destroyed eventset. 2013-05-17 * 84925f50 src/components/cuda/linux-cuda.c: When adding multiple CUDA events to an event set, PAPI_add_event() error 14 (CUPTI_ERROR_NOT_COMPATIBLE) is being raised from the CUPTI library. Turns out that the CUDA update control state wasn't cleaning the event set up properly before adding new events. It's fixed now. * 2337aa3a src/perf_events.c: perf_event: allow running with perf_event_paranoid is 2 perf_event_paranoid set to 2 means allow user monitoring only (no kernel domain). The code before this mistakenly disabled all events in this case. Also set the allowed domains to exclude PAPI_DOM_KERNEL. 2013-05-16 * 617d9fbb src/papi_events.csv: papi_events.csv Revert a little mishap in adding ivbep support Somehow the contents of papi_hl.c ended up in the events file. * 2aff4596 src/papi_events.csv: Add identifier for ivb_ep * 1810ddf9 src/papi_libpfm4_events.c src/papi_libpfm4_events.h src/perf_events.c: papi_libpfm4_events: allow specifying core/uncore/os_generic PMUs This allows you to specify you only want your perf_event/libpfm4 based component to only export the PMU types you want. Now we can have an uncore-only component. * 6554f3f0 src/papi_libpfm4_events.c: papi_libpfm4_events.c: only enable presets for component 0 If we have multiple events using libpfm4, we only want to load the presets if it is component 0. * 6a4a4594 src/papi.c: PAPI_get_component_index() was matching names improperly For example, it was matching perf_event and perf_event_uncore as the same component. * 1b94e157 src/papi_hl.c: papi_hl.c : fix IPC calculation I broke it a while back while trying to clear out use of MHz. The code was uncommented and very confusing. It is slightly better now. * 92d4552e src/papi_libpfm4_events.c src/papi_libpfm4_events.h src/perf_events.c: papi_libpfm4_events: code changes to allow multiple component access the PAPI libpfm4 code has been modified to allow multiple users at once. This will allow multiple components to use libpfm4, for example a CPU component and an uncore component. * 7902b30e src/cpus.c: cpus: fix debug compile I always forget to compile with --with-debug and miss changes in the DEBUG statements. 2013-05-15 * 7ddc05ff src/cpus.c src/cpus.h: cpus.c: Add reference count to cpu structure It is possible to have multiple eventsets all attached to the same CPU, as long as only one eventset is running at a time. At EventSet cleanup, PAPI would free the CpuInfo_t structure even if other EventSets were still using it. This patch adds a reference count to the structure and only frees it after the last user is cleaned up. I also fixed a few locking bugs, hopefully I didn't introduce any new ones. * 6a61f9a2 src/cpus.c: more cleanup of the cpus.c file mostly formatting and added comments. * 710d269f src/cpus.c src/cpus.h src/papi.c...: cleanup cpus.h It had a lot of extraneous stuff in it. Also make sure it only gets included in files that need it. * 422226c9 src/papi.c: papi.c: add some extra debug messages * b1297058 src/cpus.c: Clean up cpus.c a bit Tracking down a segfault in the cpu attach cleanup code. * 7b6023cf src/ctests/perf_event_system_wide.c: ctests/perf_event_system_wide: much improved output It segfaults at the end though, unclear if this is a bug in the test or a bug in PAPI. Will investigate. * 38397aa3 src/components/cuda/configure src/components/cuda/configure.in src/components/cuda/linux-cuda.c...: Cuda component: Update library search path From Gary Mohr: It turns out that with the changes I gave you the path to the libcuda.so library is still hard coded to /usr/lib64. This assumes that the NVIDIA-Linux package is installed on the system where the build is being done. In Bull's case (and probably other users also) this is not always the case. To add the flexibility we need, I have added a new configure argument to the cuda configure script. The new argument is "--with-cudrv_dir" and it allows the user to specify where the cuda driver package (ie: NVIDIA-Linux) to be used for the build can be found. This new argument is optional and if not provided a value of "/usr" will be used. This allows existing configure calls to continue to work like before. * f8873d1c src/ctests/perf_event_system_wide.c: ctests/perf_event_system_wide: clean up the output a lot Still working on understanding it. * ebf20589 src/ctests/perf_event_system_wide.c: perf_event_system_wide: testing various DOMAIN and GRANULARITY settings pushing the limits of PAPI/perf_event trying to see why system-wide measurement doesn't work. 2013-05-14 * 0c1ef3f5 src/components/cuda/linux-cuda.c: CUDA component: Update description field Also removes a strcpy in the init code, which overwrote the name field. Thanks to Gary Mohr * 474fc00e src/ctests/perf_event_uncore_lib.c: Add AMD fam15h northbridge event to ctests/perf_event_uncore_lib.c 2013-05-13 * cf56cdac src/perf_events.c: perf_event component: update error returns This passes more error return values back to PAPI. Before this change a lot of places were hardcoded to PAPI_EPERM even if sys_perf_event_open() was reporting a different error. * c824471b src/ctests/Makefile src/ctests/perf_event_system_wide.c src/ctests/perf_event_uncore.c...: Update the perf_event specific tests. This adds a few more uncore tests, which are currently showing some bugs in the implementation. The tests all need root permissions to run, so should default to "SKIPPED" for most users. 2013-05-08 * e0204914 src/configure src/configure.in: Force the use of pthread_mutexes on ARM This lets the system libraries worry about the best way to define mutexes, rather than trying to hand-code in assembly around all of the various issues there are with atomic instructions in the ARM architecture. It might make sense to enable this for *all* Linux architectures, but for now just do it for ARM. * f21b1b27 src/linux-lock.h: Commit 59d3d7584b2925bd05b4b5d0f4fe89666eb8494a removed the definition of mb(). mb() was defined as rmb(). This just corrects it back. (Note from VMW -- this fixes some things, but ARM still won't build on a Cortex A9 pandaboard due to the use of the "swp" instruction. Proper fix is probably to enforce posix-mutexes on ARM) 2013-05-06 * 913f0795 src/components/nvml/configure src/components/nvml/configure.in: NVML: Update wording for configure options. Thanks for pointing out the ambigous wording, Heike. * 81a86c2b src/components/infiniband/Rules.infiniband src/components/infiniband/linux-infiniband.c src/components/infiniband/tests/Makefile: Infiniband component: use dlopen/dlsym for symbols Apply Gary Mohr's patch to switch the infiniband component over to dl* with the same motivations as the cuda component. 2013-05-02 * 2e6bcb2a src/utils/native_avail.c: Add two command line switches: -i EVENTSTR includes only events whose names contain EVENTSTR; -x EVENTSTR excludes all events whose names contain EVENTSTR. These two switches can be combined, but only one string per switch can be used. This allows you to, for example, filter events by component name, or eliminate all uncore events on Sandy Bridge… 2013-05-01 * 3163cc83 src/ctests/perf_event_uncore.c: ctests/perf_event_uncore: add IvyBridge support this needs an updated libpfm4 to work 2013-04-30 * 55c89673 src/examples/add_event/Papi_add_env_event.c src/examples/overflow_pthreads.c: Examples: Missed two instances of %x printf formating. 2013-04-29 * b3c5bd47 src/components/appio/tests/appio_list_events.c src/components/appio/tests/appio_values_by_code.c src/components/appio/tests/appio_values_by_name.c...: Address TRAC 174: Let printf do the formatting https://icl.cs.utk.edu/trac/papi/ticket/174 174: PAPI's debuggin/info output should use %# conversions for octal and hex ------------------------+-------------------- Reporter: sbk@… | Owner: Type: enhancement | Status: new Priority: normal | Component: All Version: HEAD | Severity: normal Keywords: | ------------------------+-------------------- Email sent to James Ralph: Seeing your latest change reminded me: Anytime there is a value issued in hex or octal the "%#" conversion should be used so the value is always preceded with a "0" for octal or a "0x" for hex. Otherwise when a value is printed one can not tell the base it is in (one shouldn't have to rely on internal knowledge of the code or the context to tell). For variables that are pointers the "%p" conversion can be used (this will always use an hex syntax). It would be nice to apply this to all PAPI print statements in their entirety. 2013-04-25 * 87ec9286 src/components/vmware/Rules.vmware: Rules.vmware: Use $(LDL) no -ldl Minor cleanup, but configure sets it, so why not use it. 2013-04-26 * 8dddd587 src/papi_hl.c: papi_hl: Use PAPI_get_virt_usec() for process time The code was using cycles / MHz which is not guaranteed to work on modern machines. It also was sometimes using (instructions / estimated IPC) / MHz which hopefully isn't necessary for any machine PAPI currently supports. Instead use PAPI_get_virt_usec() which should give the right value. 2013-04-25 * 9dd36088 src/ctests/perf_event_uncore.c: ctests/perf_event_uncore: make more modular Cleans up the code to make it easier to add tests for architectures other than SandyBridge-EP. I was doing this so I could add support for IvyBridge but it turns out neither Linux nor libpfm4 supports uncore on IvyBridge yet. hmmm. * 52ff0293 src/components/cuda/Rules.cuda: Rules.cuda: The cuda component now depend on the dynamic linking loader and on some systems one has to explicitly link to it. Add $(LDL) to LD_FLAGS, configure sets it if we need it. * 97a4a5ea src/components/cuda/Rules.cuda src/components/cuda/linux-cuda.c src/components/cuda/tests/Makefile: Cuda component enhancement. ---------------- From Gary's submission--------------------------------- The current packaging of the cuda component in PAPI has a fairly unfriendly side effect. When PAPI is built with the cuda component, then that copy of PAPI can only be used on systems where the cuda libraries are installed. If it is installed on a system without these libraries then all PAPI services fail because they have references to libraries which can not be found. Even papi_avail which you would think has nothing to do with cuda reports the error. This issue significantly complicates the delivery and install of the PAPI package on large clusters where some of the nodes have NVIDIA GPU's (and the cuda libraries to talk to them) and other nodes do not have GPU's (and therefore no software to access them). I have been working with the help of Phil Mucci to eliminate this dependency so that a copy of PAPI built with a cuda component could be installed on all nodes in the cluster and if the node had NVIDIA GPU's (and libraries available) then the cuda component would get enabled and could be used. If the node did not have the hardware or the access libraries were not available, then the cuda component would just disable itself at component initialization so it could not be used (but all other PAPI services would still work). Phil has provided some gentle prodding and lots of valuable suggestions to assist this effort. I now think that I have a working version of this capability and am ready to share it with the community. ----------------------------------------------------------------------- Many thanks to Gary Mohr and Phil Mucci for this much needed functionality. 2013-04-23 * 99c8e352 src/papi_internal.c: papi_internal.c: Print an eventcode in hex vs decimal. Thanks, Gary Mohr. 2013-04-22 * 1fc5dae2 src/run_tests.sh: The test for determining whether to run valgrind was backwards. Correcting that allow the run_test.sh script to stay the same and one just needs to define "VALGRIND=yes" (or any non-null string) to make run_test.sh use valgrind. --- src/run_tests.sh | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/run_tests.sh b/src/run_tests.sh index d1ce205..9337ff2 100755 --- a/src/run_tests.sh +++ b/src/run_tests.sh @@ -19,10 +19,8 @@ else export TESTS_QUIET fi -if [ "x$VALGRIND" = "x" ]; then -# Uncomment the following line to run tests using Valgrind -# VALGRIND="valgrind --leak-check=full"; - VALGRIND=""; +if [ "x$VALGRIND" != "x" ]; then + VALGRIND="valgrind --leak-check=full"; fi #CTESTS=`find ctests -maxdepth 1 -perm -u+x -type f`; -- 2013-04-19 * 4cf16234 src/components/README src/components/bgpm/README src/components/coretemp_freebsd/README...: Restructure README files for components so that the file in the components directory doesn't document individual component details. Add README files to each component directory that requires further installation detail. Update RAPL instructions to capture how to enable reading the MSRs. These files are supposedly configured with Doxygen markup, but I don't think the master README ever got built. It probably should. 2013-04-17 * bf75d226 src/components/cuda/tests/HelloWorld.cu: cuda/tests/HelloWorld.cu: workaround a segfault. Report from Gary Hohr I was running the Cuda test case on a system which did not actually have any NVIDIA GPU's installed on it (but the cuda software was installed and papi was built with the cuda component). I modified the test case to put an real cuda event in the source (as suggested in the source). When I run the test case the cuda component gets disabled in PAPI_library_init (because detectDevice function can not find any GPU's) which is the correct behavior. The test case then calls PAPI_event_name_to_code which failed because the cuda component was disabled. The test case then created an event set and called PAPI_add_events with an empty list of events to be added. This led to a segfault somewhere inside libpfm4. The attached patch makes some minor changes to protect against this problem. I noticed this test case does not use the PAPI test framework utilities (test_xxxx functions) so I did not modify the test to use them. 2013-04-15 * 457bfd74 src/components/cuda/linux-cuda.c: When creating two event sets - one for the CUDA and one for the CPU component - the order of event set creation appears crucial. When the CPU event set has been created before the CUDA event set then PAPI_start() for the CUDA event set works fine. However, if the CUDA event set has been created before the CPU event set, then PAPI_start(CUDA_event_set) forces the CUDA control state to be updated one more time, even if the CUDA event set has not been modified. The CUDA control state function did not properly handle this case and hence cause PAPI_start() to fail. This has been fixed. * 807120b6 src/components/cuda/linux-cuda.h: linux-cuda.c 2013-03-28 * 7b0eec7a src/run_tests.sh: run_tests.sh: further refine component test find Exclude *.cu when looking for component tests. 2013-03-25 * 6a40c8ba src/run_tests.sh: run_tests.sh: File mode changes. run_tests.sh is now expected to run from the install location in addition to src. The script tried to remove execute from *.[c|h], now it just excludes *.[c|h] from the find commands. 2013-03-18 * 2ba9f473 src/perfctr-x86.c: perfctr: don't read in event table multiple times papi_libpfm3_events.c now reads in the predefined events, we don't also need to do this in perfctr setup_x86_presets() * 326401b1 src/perfctr.c: Fix segfault in perfctr.c The preset lookup uses the cidx index, but in perfctr.c we weren't passing a cidx value (it was being left off). The old perfctr code plays games with defining extern functions so the compiler wasn't giving us a warning. 2013-03-14 * 50130c6f src/components/bgpm/L2unit/linux-L2unit.c src/linux-bgq.c: If a counter is not set to overflow (threshold==0; happens when PAPI_shutdown is called) then we do not want to rebuild the BGPM event set, even if the event set has been used previously and hence "applied or attached". Usually if an event set has been applied or attached prior to setting overflow, the BGPM event set needs to be deleted and recreated (which implies malloc() from within BGPM). Not so, though, if threshold is 0 which is the case when PAPI_shutdown is called. Note, this only applies to Punit and L2unit, not IOunit since an IOunit event set in not applied or attached. 2013-03-13 * 1a143003 src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/IOunit/linux-IOunit.h src/components/bgpm/L2unit/linux-L2unit.c...: Overflow issue on BG/Q resolved. Overflow with multiple components worked; overflow with multiple components and multiple events did not work as supposed to. * 42741a40 src/components/cuda/Rules.cuda: Added one more library to linker command. 2013-03-12 * 1431eb3f src/components/nvml/Makefile.nvml.in src/components/nvml/Rules.nvml src/components/nvml/configure...: NVML component: build system work Adopt the cuda component's method for specifying library location. 2013-03-11 * ce66feac src/components/mx/linux-mx.c: mx component: Modernize init routine. Add component index to _mx_component_init()s signarure and set the bit in component info. * 1c1bc177 src/components/cuda/Makefile.cuda.in src/components/cuda/Rules.cuda src/components/cuda/configure...: Resolve configure issues for CUDA component. 2013-03-07 * f3572537 src/linux-common.c src/linux-memory.c: Fix the build on Linux-SPARC I dug out an old SPARC machine and fixed the PAPI build on it. * 2c7f102c src/perf_events.c: More comprehensive sys_perf_open to PAPI error mappings This tries to cover more of the errors returned by sys_perf_open and map them to better results. EINVAL is a problem because it can mean Conflict as well as Event not found and many other things, so it's unclear what to do with it. * 299070ef src/perf_events.c src/sys_perf_event_open.c: Return proper error codes for sys_perf_event_open For some reason on x86 and x86_64 we were trying to set errno manually and thus over-writing the proper errno value, causing all errors to look like PAPI_EPERM This removes that code, as well as adds code to report ENOENT as PAPI_ENOEVENT. With this change, on IVY this happens which looks more correct. ./utils/papi_command_line perf::L1-ICACHE-PREFETCHES Failed adding: perf::L1-ICACHE-PREFETCHES because: Event does not exist command_line.c PASSED 2013-03-06 * baa557ca src/papi_libpfm4_events.c src/papi_user_events.c: Coverity fixes: Coverity pointed out that there was a case where load_user_eent_table() could leak memory. The change in the location of the papi_free(foo) ensures that the allocated memory is freed. Coverity pointed out one path through the code in _papi_libpfm4_ntv_code_to_descr() that did not free up memory allocated in the function. Added a free on the path in free up that memory. Thanks Will Cohen. 2013-02-14 * 395b7bc7 src/Makefile.inc src/components/README src/components/appio/tests/Makefile...: Add component tests' to the install-[all|tests] target. Thanks to Gary Mohr. ------------------- This makes a fairly small change to src/Makefile.inc to add logic that adds a new install-comp_tests target which calls the install target for each component being built. This new target is listed as a dependency on the install-tests target so it will happen when the 'install-all', 'install-tests', or 'install-comp_tests' targets are used. A note about this change, I am not real familiar with the auto make and auto conf tools. This change was enough to make it work for me but if there is another file that should also be changed for this modification, please help me out here. The patch also adds install targets to the Makefiles for all of the components which have 'tests' directories and updates the README file which talks about how to create component tests. Another note, I only compile with a couple of components (ours, rapl, and example) so if I fat fingered something in one of the other components Makefiles I would not have noticed. Please keep me honest and make sure you compile with them all enabled. Thanks for adding this capability for us. Gary --------------------------- Makefile.inc: Add run_tests and friends to install-tests target. Component test Makefiles' get their install location to mirror what runtests expects. 2013-03-04 * 448d21ab src/components/rapl/linux-rapl.c: Remove a stray debug statement. Thanks to Harald Servat for catching this. 2013-03-01 * df1a75cc src/utils/command_line.c: Wrestled some horribly convoluted indexing into shape. The -u and -x options now print as expected (I think). 2013-01-31 * b0f5f4d6 src/components/nvml/linux-nvml.c: linux-nvml.c: Fix type warning. CUDA and NVML have an signed vs unsigned thing going on in their returned device counts, cast away the warning. 2013-01-29 * 8490b4ee src/papi.c: General doxygen cleanup: remove all "No known bugs" messages; correct and cleanup examples for PAPI_code_to_name and PAPI_name_to_code 2013-01-23 * 89e45a9b src/linux-memory.c src/linux-timer.c: ia64 fixes. Thanks to Tony Jones for patches. 2013-01-16 * 23e0ba2d src/components/nvml/linux-nvml.c: nvml component: cleanup a memory leak We did not free a buffer at shutdown time. 2013-01-15 * f3db85fc src/papi.h: papi.h bump version number. * dfa80287 src/buildbot_configure_with_components.sh: Buildbot configure script. Add cuda and nvml components, if configured, to the buildbot coverage test. Note: Script now checks for existance of Makefile.cuda and then Makefile.nvml so see if it can build the cuda component and then if it can build the nvml component. * cf416e27 src/threads.c: Cleaned up compiler warning (gcc version 4.4.6) * 59cbc8fc src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/L2unit/linux-L2unit.c...: Cleaned up compiler warnings on BG/Q (gcc version 4.4.6 (BGQ-V1R1M2-120920)) 2013-01-14 * 3af71658 .../build/lib.linux-x86_64-2.7/perfmon/__init__.py .../lib.linux-x86_64-2.7/perfmon/perfmon_int.py .../build/lib.linux-x86_64-2.7/perfmon/pmu.py...: libpfm4: remove extraneous build artifacts. Steve Kaufmann reported differences between the libpfm4 I imported into PAPI and the libpfm4 that can be attained with a git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4 Self: Do libpfm4 imports from a fresh clone of libpfm4. papi-5.3.0/INSTALL.txt0000600003276200002170000005607512247131117014025 0ustar ralphundrgrad/* * File: INSTALL.txt * CVS: $Id$ * Author: Kevin London * london@cs.utk.edu * Mods: Dan Terpstra * terpstra@cs.utk.edu * Mods: Philip Mucci * mucci@cs.utk.edu * Mods: * */ ***************************************************************************** HOW TO INSTALL PAPI ONTO YOUR SYSTEM ***************************************************************************** On some of the systems that PAPI supports, you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first. The general installation steps are below, but first find your particular Operating System's section for any additional steps that may be necessary. NOTE: the configure and make files are located in the papi/src directory. General Installation 1. % ./configure % make 2. Check for errors. a) Run a simple test case: (This will run ctests/zero) % make test If you get good counts, you can optionally run all the test programs with the included test harness. This will run the tests in quiet mode, which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the functionality being tested is not supported by that platform. % make fulltest (This will run ./run_tests.sh) To run the tests in verbose mode: % ./run_tests.sh -v 3. Create a PAPI binary distribution or install PAPI directly. a) To install PAPI libraries and header files from the build tree: % make install b) To install PAPI manual pages from the build tree: % make install-man c) To install PAPI test programs from the build tree: % make install-tests d) To install all of the above in one step from the build tree: % make install-all e) To create a binary kit, papi-.tgz: % make dist ***************************************************************************** MORE ABOUT CONFIGURE OPTIONS ***************************************************************************** There is an extensive array of options available from the configure command-line. These can differ significantly from version to versions of PAPI. For complete details on the command-line options, use: % ./configure --help ***************************************************************************** DOCUMENTATION BY DOXYGEN ***************************************************************************** PAPI now ships with documentation generated by doxygen. Documentation for the public apis can be created by running doxygen from the doc directory. More complete documentation of all internal apis and structures can be generated with: % doxygen Doxyfile-everything Doxygen documentation for the currently released version of PAPI is also available on the website. ***************************************************************************** Operating System Specific Installation Steps (In Alphabetical Order by OS) ***************************************************************************** AIX - IBM POWER5 and POWER6 and POWER7 ***************************************************************************** PAPI is supported on AIX 5.x for POWER5 and POWER6. PAPI is also tested on AIX 6.1 for POWER7. Use ./configure to select the desired make options for your system, specifying the --with_bitmode=32 or --with-bitmode=64 to select wordlength. 32 bits is the default. 1. On AIX 5.x, the bos.pmapi is a product level fileset (part of the OS). However, it is not installed by default. Consult your sysadmin to make sure it is installed. 2. Follow the general instructions for installing PAPI. WARNING: PAPI requires XLC version 6 or greater. Your version can be determined by running 'lslpp -a -l | grep -i xlc'. BG/P ***************************************************************************** BG/P is a cross-compiled environment. The machine on which PAPI is compiled is not the machine on which PAPI runs. To compile PAPI on BG/P, specify the BG/P environment as shown below: % ./configure --with-OS=bgp % make NOTE: ./configure might fail if the cross compiler is not in your path. If that is the case, just add it to your path and everything should work: % export PATH=$PATH:/bgsys/drivers/ppcfloor/gnu-linux/bin By default this will make a subset of tests in the ctests directory and all tests in the ftests directory. There is an additional C test program provided for the BG/P environment that exercises the specific BG/P events and demonstrates how to intermix the PAPI and BG/P UPC native calls. This test program is built with the normal make sequence and can be found in the ctests/bgp directory. The testing targets in the make file will not work in the BG/P environment. Since BG/P supports multiple queuing systems, you must manually execute individual programs in the ctests and ftests directories to check for successful library creation. You can also manually edit the run_tests.sh script to automate testing for your installation. Most papi utilities work for BGP, including papi_avail, papi_native_avail, and papi_command_line. Many ctests pass for BGP, but many others produce errors due to the non-traditional architecture of BGP. In particular, PAPI_TOT_CYC always seems to produce 0 counts, although papi_get_virt_usec and papi_get_real_usec appear to work. The IBM RedPaper: http://www.redbooks.ibm.com/abstracts/redp4256.html provides further discussion about PAPI on BGP along with other performance issues. BG/Q ***************************************************************************** Five new components have been added to PAPI to support hardware performance monitoring for the BG/Q platform; in particular the BG/Q network, the I/O system, the Compute Node Kernel in addition to the processing core. There are no specific component configure scripts for L2unit, IOunit, NWunit, CNKunit. In order to configure PAPI for BG/Q, use the following configure options at the papi/src level: % ./configure --prefix=< your_choice > \ --with-OS=bgq \ --with-bgpm_installdir=/bgsys/drivers/ppcfloor \ CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc \ F77=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gfortran \ --with-components="bgpm/L2unit bgpm/CNKunit bgpm/IOunit bgpm/NWunit" CLE - Cray XT and XE Opteron ***************************************************************************** The Cray XT/XE is a cross-compiled environment. You must specify the perfmon version to configure as shown below. Before running configure to create the makefile that supports a Cray XT/XE CLE build of PAPI, execute the following module commands: % module purge % module load gcc Note: do not load the programming environment module (e.g. PrgEnv-gnu) but the compiler module (e.g. gcc) as shown above. Check CLE compute nodes for the version of perfmon2 that it supports: % aprun -b -a xt cat /sys/kernel/perfmon/version and use this version when configuring PAPI for a perfmon2 substrate: % configure CFLAGS="-D__crayxt" \ --with-perfmon=2.82 --prefix= \ --with-virtualtimer=times --with-tls=__thread \ --with-walltimer=cycle --with-ffsll --with-shared-lib=no \ --with-static-tools Configure PAPI for a perf events substrate: % configure CFLAGS="-D__crayxt" \ --with-perf-events --with-pe-incdir= \ --with-assumed-kernel=2.6.34 --prefix= \ --with-virtualtimer=times --with-tls=__thread \ --with-walltimer=cycle --with-ffsll --with-shared-lib=no \ --with-static-tools Invoke the make accordingly: % make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n % make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n install The testing targets in the makefile will not work in the XT/XE CLE environment. It is necessary to log into an interactive session and run the tests manually through the job submission system. For example, instead of: % make test use: % aprun -n1 ctests/zero and instead of: % make fulltest use: % ./run_cat_tests.sh after substituting "aprun -n1" for "yod -sz 1" in run_cat_tests.sh. FreeBSD - i386 & amd64 ***************************************************************************** PAPI requires FreeBSD 6 or higher to work. Kernel needs some modifications to provide PAPI access to the performance monitoring counters. Simply, add "options HWPMC_HOOKS" and "device hwpmc" in the kernel configuration file. For i386 systems, add also "device apic". (You can obtain more information in hwpmc(4), see NOTE 1 to check the supported HW) After this step, just recompile the kernel and boot it. FreeBSD 7 (or greater) does not ship with a fortran compiler. To compile fortan tests you will need to install a fortran compiler first (e.g. installing it from /usr/ports/lang/gcc42), and setup the F77 environment variable with the compiler you want to use (e.g. gfortran42). Fortran compilers may issue errors due to "Integer too big for its kind *". Add to FFLAGS environment variable a compiler option to use int*8 by default (in gfortran42 it is -fdefault-integer-8). Follow the "General Installation" steps. NOTE 1: -- HWPMC driver supports the following processors: Intel Pentium 2, Intel Pentium Pro, Intel Pentium 3, Intel Pentium M, Intel Celeron, Intel Pentium 4, AMD K7 (AMD Athlon) and AMD K8 (AMD Athlon64 / Opteron). FreeBSD 8 also adds support for Core/Core2/Core-i[357]/Atom processors. There is also a patch for FreeBSD 7/7.1 in http://wiki.freebsd.org/PmcTools Linux - Xeon Phi [MIC, KNC, Knight's Corner] ***************************************************************************** Full PAPI support of the MIC card requires MPSS Gold Update 2 or above, and a cross-compilation toolchain from Intel, the Intel C compiler is also supported. The compiler ----------------------------------------------------------------------------- * Download one of the MPSS full source bundles at [http://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss] * Untar the download. * Extract gpl/package-cross-k1om.tar.bz2 Building PAPI - gcc cross compiler ----------------------------------------------------------------------------- * Add usr/linux-k1om-4.7/bin or equivalent to your PATH so PAPI can find the cross-build utils. (see above for instructions on acquiring the cross compilation toolchain) * You will need to invoke configure with options: > ./configure --with-mic --host=x86_64-k1om-linux --with-arch=k1om This sets up cross-compilation and sets options needed by PAPI. * Run make to build the library. Building PAPI - icc ----------------------------------------------------------------------------- If icc is in your path, > ./configure --with-mic Builds a mic native version of the library. Offload Code ------------ To use PAPI in MIC offload code, build a mic-native version of PAPI using icc, as detailed above. Then in your program, wrap the papi.h header as follows: #pragma offload_attribute (push,target(mic)) #include "papi.h" #pragma offload_attribute (pop) Make PAPI calls from offload code as normal. Finally add -offload-option,mic,ld,$(path_to_papi)/libpapi.a to your compile incantation. Linux - Itanium II & Montecito ***************************************************************************** PAPI on Itanium Linux links to the perfmon library. The library version and the Itanium version are automatically determined by configure. If you wish to override the defaults, a number of pfm options are available to configure. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation. PLATFORM NOTES: The earprofile test fails under perfmon for Itanium II. It has been reconfigured to work on the upcoming perfmon2 interface. Linux - PPC64 (POWER5, POWER5+, POWER6 and PowerPC970) **************************************************************************** Linux/PPC64 requires that the kernel be patched and recompiled with the PerfCtr patch if the kernel is version 2.6.30 or older. The required patches and complete installation instructions are provided in the papi/src/perfctr-2.7.x directory. PPC64 is the ONLY platform that REQUIRES use of PerfCtr 2.7.x. *- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -* WARNING: You should always use a PerfCtr distribution that has been distributed with a version of PAPI or your build will fail. The reason for this is that PAPI builds a shared library of the Perfctr runtime, on which libpapi.so depends. PAPI also depends on the .a file, which it decomposes into component objects files and includes in the libpapi.a file for convenience. If you install a new perfctr, even a shared library, YOU MUST REBUILD PAPI to get a proper, working libpapi.a. There are several options in configure to allow you to specify your perfctr version and location. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation. Linux Perf Events ( with kernel 2.6.32 and newer ) ***************************************************************************** Performance counter support has been merged as the "Perf Events" subsystem as of Linux 2.6.32. This means that PAPI can be built without patching the kernel on new enough systems. Perf Events support is new, and certain functionality does not work. If you need any of the functionality listed below, we recommend you install the PerfCtr patchset and use that in conjunction with PAPI. + PAPI requires at least Linux kernel 2.6.32, as the earlier 2.6.31 version had some significant API changes. + Kernels before 2.6.33 have extra overhead when determining whether events conflict or not. + Counter multiplexing is handled by PAPI (rather than perf_events) on kernels before 2.6.33 due to a bug in the kernel perf_events code. + Nehalem EX support requires kernel 2.6.34 or newer. + Pentium 4 support requires kernel 2.6.35 or newer. The PAPI configure script should auto-detect the availability of Perf Events on new enough distributions (this mainly requires that perf_event.h be available in /usr/include/linux) On older distributions (even ones that include the 2.6.32 kernel) the perf_event.h file might not be there. One fix is to install your distributions linux kernel headers package, which is often an optional package not installed by default. If you cannot install the kernel headers, you can obtain the perf_event.h file from your kernel and run configure as such: ./configure --with-pe-incdir=INCDIR replacing INCDIR with the directory that perf_event.h is in. Linux PerfCtr (requires patching the kernel) ***************************************************************************** When using Linux kernels before 2.6.32 the kernel must be patched with the PerfCtr patch set. (This patchset can also be used on more recent kernels if the support provided by Perf Events is not enough for your workload). The required patches and complete installation instructions are provided in the papi/src/perfctr-x.y directory. Please see the INSTALL file in that directory. Do not forget, you also need to build your kernel with APIC support in order for hardware overflow to work. This is very important for accurate statistical profiling ala gprof via the hardware counters. So, when you configure your kernel to build with PERFCTR as above, make sure you turn on APIC support in the "Processor type and features" section. This should be enabled by default if you are on an SMP, but it is disabled by default on a UP. In our 2.4.x kernels: > grep PIC /usr/src/linux/.config /usr/src/linux/.config:CONFIG_X86_GOOD_APIC=y /usr/src/linux/.config:CONFIG_X86_UP_APIC=y /usr/src/linux/.config:CONFIG_X86_UP_IOAPIC=y /usr/src/linux/.config:CONFIG_X86_LOCAL_APIC=y /usr/src/linux/.config:CONFIG_X86_IO_APIC=y You can verify the APIC is working after rebooting with the new kernel by running the 'perfex -i' command found in the perfctr/examples/perfex directory. PAPI on x86 assumes PerfCtr 2.6.x. NOTE: THE VERSIONS OF PERFCTR DO NOT CORRESPOND TO LINUX KERNEL VERSIONS. *- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -* WARNING: You should always use a PerfCtr distribution that has been distributed with a version of PAPI or your build may fail. Newer versions with backward compatibility may also work. PAPI builds a shared library of the Perfctr runtime, on which libpapi.so depends. PAPI also depends on the .a file, which it decomposes into component objects files and includes in the libpapi.a file for convenience. If you install a new PerfCtr, even a shared library, YOU MUST REBUILD PAPI to get a proper, working libpapi.a. There are several options in configure to allow you to specify your perfctr version and location. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation.PERFCT *- IF PERFCTR IS INSTALLED BUT PAPI FAILS TO INITIALIZE -* You may be running udev, which is not smart enough to know the permissions of dynamically created devices. To fix this, find your udev/devices directory, often /lib/udev/devices or /etc/udev/devices and perform the following actions: mknod perfctr c 10 182 chmod 644 perfctr On Ubuntu 6.06 (and probably other debian distros), add a line to /etc/udev/rules.d/40-permissions.rules like this: KERNEL=="perfctr", MODE="0666" On SuSE, you may need to add something like the following to /etc/udev/rules.d/50-udev-default.rules: (SuSE does not have the 40-permissions.rules file in it.] # cpu devices KERNEL=="cpu[0-9]*", NAME="cpu/%n/cpuid" KERNEL=="msr[0-9]*", NAME="cpu/%n/msr" KERNEL=="microcode", NAME="cpu/microcode", MODE="0600" KERNEL=="perfctr", NAME="perfctr", MODE="0644" These lines tell udev to always create the device file with the appropriate permissions. Use 'perfex -i' from the perfctr distribution to test this fix. PLATFORM NOTES: Opteron fails the matrix-hl test because the default definition of PAPI_FP_OPS overcounts speculative floating point operations. Solaris 8 - Ultrasparc ***************************************************************************** The only requirement for Solaris is that you must be running version 2.8 or newer. As long as that requirement is met, no additional steps are required to install PAPI and you can follow the general installation guide. Solaris 10 - UltraSPARC T2/Niagara 2 ***************************************************************************** PAPI supports the Niagara 2 on Solaris 10. The substrate offers support for common basic operations like adding/reading/etc and the advanced features multiplexing (see below), overflow handling and profiling. The implementation for Solaris 10 is based on libcpc 2, which offers access to the underlying performance counters. Performance counters for the UltraSPARC architecture are described in the UltraSPARC architecture manual in general with detailed descriptions in the actual processor manual. In case of this substrate the documentation for performance counters can be found at: - http://www.opensparc.net/publications/specifications/ In order to install PAPI on this platform make sure the packages SUNWcpc and SUNWcpcu are installed. For the compilation Sun Studio 12 was used while the substrate has been developed. GNU GCC has not been tested and would require to modify the makefiles Makefile.solaris-niagara2 (32 bit) and Makefile.solaris-niagara2-64bit (64 bit). The steps required for installation are as follows: ./configure --with-bitmode=[32|64] --prefix=/is/optional If no --with-bitmode parameter is present a default of 32 bit is assumed. If no --prefix is used, a default of /usr/local is assumed. make make install If you want to link your application against your installation you should make sure to include at least the following linker options: -lpapi -lcpc PLEASE NOTE: This is the first revision of Niagara 2/libcpc 2/Solaris 10 support and needs further testing! Contributions, especially for the preset definitions, would be very appreciated. MULTIPLEXING: As the Niagara 2 offers no native event to count the cycles elapsed, a "synthetic event" was created offering access to the cycle count. This event is neither as accurate as the native events, nor it should be used for anything else than the multiplexing mode, which needs the cycle count in order to work. Therefore multiplexing and the preset PAPI_TOT_CYC should be only used with caution. BEWARE OF WRONG COUNTER RESULTS! Windows XP/2000/Server 2003 - Intel Pentium III or AMD Athlon / Opteron ***************************************************************************** Please use PAPI 3.7 (http://icl.cs.utk.edu/projects/papi/downloads/papi-3.7.2.tar.gz) The Windows source tree comes with Microsoft Visual Studio Version 8 projects to build a graphical shell application, the PAPI library as a DLL, a kernel driver to provide access to the counters, and a collection of C test programs. The WinPMC driver must be installed with administrator privileges. See the winpmc.html file in the papi/win2k/winpmc directory for details on building and installing this driver. The general installation instructions are irrelevant for Windows. Other Platforms ***************************************************************************** PAPI can be compiled and installed on most platforms that have GNU compilers regardless of operating system or hardware. This includes, for example, Macintosh systems running recent versions of OSX. However, PAPI can only provide access to the CPU hardware counters on platforms that are directly supported. Unsupported platforms will run, buttony provide basic timing functions, and potential access to some non-cpu components. ***************************************************************************** CREATING AND RUNNING COMPONENTS ***************************************************************************** Basic instructions on how to create a new component can be found in src/components/README. The components directory contains several components developed by the PAPI team along with a simple yet functional "example" component which can be used as a guide to aid third-party developers. Assuming components are developed according to the specified guidelines, they will function within the PAPI framework without requiring any changes to PAPI source code. Before running any component that requires configuration, the configure script for that component must be executed in order to generate the Makefile which contains the configuration settings. Normally, the script will only need to be executed once. Depending on the component, configure may require that one or more configuration settings be specified by the user. The components to be added to PAPI are specified during the configuration of PAPI by adding the --with-components= command line option to configure. For example, to add the acpi, lustre, and net components, the option would be: % ./configure --with-components="acpi lustre net" Attempting to add a component to PAPI which requires configuration and has not been configured will result in a compilation error because the PAPI build environment will be unable to find the Makefile for that component. papi-5.3.0/ChangeLogP421.txt0000600003276200002170000012266512247131117015114 0ustar ralphundrgrad2012-02-13 * src/components/net/linux-net.c: Repairing more coverity warnings. 2012-02-11 * src/windows-common.c: Missed an instance of CPUs yesterday. * src/: papi_internal.c, threads.c: This changes fixes two race conditions that are probably the cause of the pthrtough double-free error. When freeing a thread, we remove and free all eventsets belonging to that thread. This could race with the thread itself removing the evenset, causing some ESI fields to be freed twice. The problem was found by using the Valgrind 3.8 Helgrind tool valgrind --tool=helgrind --free-is-write=yes ctests/pthrtough In order for Helgrind to work, I had to temporarily modify PAPI to use POSIX pthread mutexes for locking. Is there any reason we don't use these all the time? 2012-02-10 * src/utils/: avail.c, component.c, event_chooser.c, native_avail.c: ix one more case of "CPU's" in the print header code. Also remove the extraneous The following correspond to fields in the PAPI_event_info_t structure. message * src/: testlib/papi_test.h, testlib/test_utils.c, ctests/all_native_events.c, ctests/calibrate.c, ctests/code2name.c, ctests/hwinfo.c: Fix one more case of "CPU's" in the print header code. Also remove the extraneous The following correspond to fields in the PAPI_event_info_t structure. message * src/buildbot_configure_with_components.sh: take infiniband out of the buildbot test. * src/: x86_cache_info.c, components/coretemp/linux-coretemp.c, components/lmsensors/linux-lmsensors.c, components/lustre/linux-lustre.c, components/net/linux-net.c, utils/event_chooser.c: Fix coverity errors reported by Will Cohen. * src/: aix.c, any-proc-null.c, linux-common.c, papi.c, papi.h, papivi.h, solaris-niagara2.c, solaris-ultra.c, ctests/clockres_pthreads.c: Address Redhat bug 785975. The plural of CPU appears to be CPUs * src/Makefile.inc: Patch to cleanup dependencies, allowing for parallel makes. Patch due to Will Cohen from redhat 2012-02-09 * src/buildbot_configure_with_components.sh: Add infiniband and mx component to buildbot component tests. * src/components/net/tests/: net_values_by_code.c, net_values_by_name.c: Apply patch suggested by Will Cohen to check for system return values. * src/components/lmsensors/linux-lmsensors.h: Added missing string header 2012-02-08 * man/... : update man pages one more time for 4.2.1 release * release_procedure.txt: Make sure generated html has papi group id. 2012-02-07 * src/multiplex.c: Fix the @file matching multiple files warning. * src/components/README: Cleanup doxygen errors. * doc/Doxyfile-html: Typo introduced by the last commit. * doc/Doxyfile-html: Exclude linux-bgp.c from doxygen. * doc/Doxyfile-html: Make sure the component README file gets included in doxygen. * src/components/coretemp_freebsd/coretemp_freebsd.c: Cleanup doxygen warnings in freebsd coretemp component. * src/papi.h: Cleanup some doxygen warnings related to the groupings. * src/components/example/example.c: fix doxygen warning in the example component * doc/Doxyfile-html: Remove some cruft from doxygen config file. This addresses the warning about dot not found at /sw/bin/dot . * src/components/: infiniband/linux-infiniband.c, infiniband/linux-infiniband.h, cuda/linux-cuda.c, cuda/linux-cuda.h: Cleaned up some doxygen issues * src/components/lmsensors/linux-lmsensors.c: Removed long forgotten debug outputs * src/papi_libpfm4_events.c: Fix minor doxygen typos. * src/components/vmware/vmware.c: Add params for doxygen * man/... : update man pages 2012-02-06 * doc/Doxyfile-man1: Fix a typo in a doxygen config file. 2012-02-03 * release_procedure.txt, doc/Doxyfile, doc/Doxyfile-everything, doc/Doxyfile-html, doc/Doxyfile.utils, doc/Doxyfile-man1, doc/Doxyfile-man3, doc/Makefile, doc/doxygen_procedure.txt: Rework the doxygen configuration files. * RELEASENOTES.txt: Update for the impending release. * ChangeLogP421.txt, RELEASENOTES.txt: Updates for the impending release. 2012-02-02 * src/: papi.c, papi.h: Minor tweaks for doxygen errors 2012-02-01 * src/components/lmsensors/: Rules.lmsensors, configure.in: Fixed configure error message and rules link error for shared object linking. Thanks Will Cohen. * src/components/appio/Rules.appio: Correct pathing * src/ctests/api.c: One minor tiny fix to check for PAPI_ENOEVNT when testing PAPI_flops. If PAPI_FP_OPS does not exist on the processor (like many of em), then this tests fails. 2012-01-31 * src/ctests/multiattach.c: Increase acceptance criteria for cycles. * src/Makefile.in, src/configure, src/configure.in, src/papi.h, doc/Doxyfile, doc/Doxyfile-everything, doc/Doxyfile.utils, papi.spec: Update version number to 4.2.1 in preparation for release. * src/ctests/prof_utils.c: Correct a warning on 32bit builds about casting caddr_t to (long long) Specifically: prof_utils.c:234: warning: cast from pointer to integer of different size prof_utils.c:248: warning: cast from pointer to integer of different size prof_utils.c:262: warning: cast from pointer to integer of different size We first cast to unsigned long and then on to long long. ( This maybe overkill, but its for a printf format string ) 2012-01-30 * release_procedure.txt: Add the correct path for doxygen on ICL machines. * src/papi_events.csv: Modify Intel Sandybridge PAPI_FP_OPS and PAPI_FP_INS events to not count x87 fp instructions. The problem is that the current predefines were made by adding 5 events. With the NMI watchdog stealing an event and/or hyperthreading reducing the numbr of available counters by half, we just couldn't fit. This now raises the potential for people using x87-compiled floating point on Sandybridge and getting 0 FP_OPS. This is only likely if running a 32-bit kernel and *not* compiling your code with -msse. A long-term solution might be trying to find a better set of FP predefines for sandybridge. * src/components/: lustre/linux-lustre.c, mx/linux-mx.c: Some really minor cleanups to the lustre and mx components. 2012-01-28 * src/components/example/: example.c, tests/example_basic.c: Update example component Cleans up code, adds some more documentation, adds counter write support. 2012-01-27 * src/papi_user_events.c: Minor cleanups for user events. * src/libpfm4/: README, include/perfmon/pfmlib.h, lib/Makefile, lib/pfmlib_amd64.c, lib/pfmlib_common.c, lib/pfmlib_priv.h: Fix "conflicts" in git import of libpfm4. * src/libpfm4/lib/: pfmlib_amd64_fam11h.c, events/amd64_events_fam11h.h: Initial revision 2012-01-26 * src/papi_fwrappers.c: Escape the include directives in the documentation. (Cleans up doxygen ) * src/components/README: Adding vmware to component README * src/components/vmware/: Makefile.vmware.in, PAPI-VMwareComponentDocument.pdf, Rules.vmware, VMwareComponentDocument.txt, configure, configure.in, vmware.c, vmware.h: merge vmware branch to head * src/perf_events.c: Set fast_counter_read back to 0 on x86/x86_64 perf_events, as currently rdpmc counter access is not supported. There are patches floating around that enable this (although performance is still a long way from perfctr) but they will not likely be merged for a while now, and the perf_events substrate will require a lot of extra code to support it once it does make it into a shipping kernel. * src/buildbot_configure_with_components.sh: Remove acpi from the buildbot configure script. 2012-01-25 * src/components/mx/: Makefile.mx.in, Rules.mx, configure, configure.in, linux-mx.c, linux-mx.h, tests/Makefile, tests/mx_basic.c, tests/mx_elapsed.c, utils/fake_mx_counters.c, utils/sample_output: Re-write of the MX component + Add tests + Modernize code + Remove the need to run ./configure in the mx directory + Add fake mx_counters program that lets you test component on machine without myrinet installed * src/components/: README, acpi/Rules.acpi, acpi/linux-acpi-memory.c, acpi/linux-acpi.c, acpi/linux-acpi.h: Remove the ACPI component. It was one of the oldest components and needed a lot of cleanup work, and it turns out that the main useful event it provided (temperature) isn't available on modern machines/kernels (coretemp should be used instead). 2012-01-23 * src/perf_events.c: Restored Phil's changes that I inadvertently clobbered with my last commit :( * src/perf_events.c: Remove a warning about an uninitialized variable. * src/utils/: component.c, event_info.c, native_avail.c: Update the Doxygen comments on these utilities to have the command line options listed in a list like the other utils. * src/perf_events.c: More improvements to the read path for multiplexed counters. Now the case for bad kernel behavior is built in, and is not required with a #define. Basically, there are situations when either enabled or running is zero but not both. This could result in a divide by 0 in the worst case, as was observed by Tushar Mohan in papiex. You could trigger it by doing a read immediately after doing a start with perf events and use a FORMAT_SCALE argument. Now the logic goes, assuming mpxing. 1) if (running=enabled) return raw counter 2) if (running && enabled) scale counter by ratio 3) else warn in debug mode return raw counter Apparently we need a test case that does a read immediately after a start. That's a hole. Tested on brutus, core2 2.6.36 Here's the original report. ------------------- Model string and code : Intel(R) Pentium(R) M processor 1600MHz (9) Linux thinkpad 2.6.38-02063808-generic #201106040910 SMP Sat Jun 4 10:51:30 UTC 2011 i686 GNU/Linux PAPI Version: 4.2.0.0 I think I ran into a bug similar to what we ran with MIPS. With the latest PAPI (from CVS), on an x86 (32-bit machine), when using papiex with multiplex with anything more than two events, I get a floating point exception in PAPI during the PAPI_read call. On enabling debugging in the substrate, I think the problem is the same (namely a division by zero, because some event had a zero time of running): libpapiex debug: 24625,0x0,papiex_thread_init_routine Starting counters with PAPI_start SUBSTRATE:perf_events.c:pe_enable_counters:953:24625 ioctl(enable): ctx: 0x96a4bc8, fd: 3 SUBSTRATE:perf_events.c:pe_enable_counters:953:24625 ioctl(enable): ctx: 0x96a4bc8, fd: 5 libpapiex debug: 24625,0x0,papiex_thread_init_routine Calling PAPI_lock before critical section libpapiex debug: 24625,0x0,papiex_thread_init_routine Released PAPI lock libpapiex debug: 24625,0x0,papiex_start START POINT 0 LABEL libpapiex debug: 24625,0x0,papiex_start Reading counters (PAPI_read) to get initial counts SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625 read: fd: 3, tid: 0, cpu: -1, ret: 56 SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 2 1341021 1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625 (papi_pe_buffer[3] 33405 * tot_time_enabled 1341021) / tot_time_running 1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625 (papi_pe_buffer[5] 44552 * tot_time_enabled 1341021) / tot_time_running 1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625 read: fd: 5, tid: 0, cpu: -1, ret: 40 SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 1 214777 0 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625 (papi_pe_buffer[3] 0 * tot_time_enabled 214777) / tot_time_running 0 The above debug log is for three events: PAPI_TOT_CYC, PAPI_TOT_INS and PAPI_L1_DCM. Multiplexing works with two events. Adding the third (any event), gives this error. Basically, the floating point exception kills the program, and PAPI_read never returns. I think I know why papiex always hits this bug: It's because right after starting the counters with PAPI_start, papiex does a PAPI_read to store the initial values of the counters in a tmp variable. These are then subtracted from the final counter values. Should we put a deliberate delay? Of course, the real bug should be fixed in PAPI. ---- * src/utils/event_info.c: Major re-write of the papi_xml_event_info program. + Remove event code numbers, as they are not stable run-to-run + Add some Doxygen comments + Remove some wrong assumptions that could cause potential buffer overflows + Improve usage information 2012-01-20 * src/components/lustre/: Rules.lustre, linux-lustre.c, linux-lustre.h, fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/read_ahead_stats, fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/stats, tests/Makefile, tests/lustre_basic.c: Finish the re-write of the lustre component. It would be nice if someone with access to a machine with a lustre filesystem could test this for us. * src/: papi_internal.c, components/lustre/linux-lustre.c: Update the component initialization code so that it can handle a PAPI ERROR return gracefully. Previously there was no way to indicate initialization failure besides just setting num_native_events to 0. 2012-01-19 * src/components/lustre/: linux-lustre.c, linux-lustre.h: First pass at cleaning up the lustre component. It should now properly report no events when no lustre filesystems are available. 2012-01-11 * src/papi_events.csv: Add AMD fam12h support to the events file. Right now it is just an alias to the similar fam10h event list; this can be split out if necessary once we find a tester with the hardware. * src/libpfm4/: README, docs/man3/pfm_get_event_next.3, docs/man3/pfm_get_pmu_info.3, include/perfmon/perf_event.h, include/perfmon/pfmlib.h, lib/Makefile, lib/pfmlib_amd64.c, lib/pfmlib_amd64_priv.h, lib/pfmlib_common.c, lib/pfmlib_perf_event.c, lib/pfmlib_priv.h, lib/events/intel_coreduo_events.h, lib/events/perf_events.h, perf_examples/Makefile, perf_examples/perf_util.c, perf_examples/perf_util.h, perf_examples/self.c, perf_examples/task_smpl.c, perf_examples/x86/bts_smpl.c: Fix "merge" conflicts with libpfm4 merge. * src/libpfm4/lib/: pfmlib_amd64_fam12h.c, events/amd64_events_fam12h.h: Initial revision * src/papi_libpfm4_events.c: Properly use the pfm_get_event_next() iterator to find next event. Without this, on AMD Fam10h some events are missed. Some events are still missed due to libpfm4 bug, this will be fixed once I update the libpfm4 tree included with PAPI. Note, enumeration fixes like this often break things, so please test if possible. * src/papi_events.csv: Update the coreduo (not core2) events. Most notably the FP events were wrong. This, along with a forthcoming libpfm4 update, make all the CTESTS pass on an old Yonah coreduo laptop I have. 2012-01-05 * src/ctests/api.c: Make the api test actually test PAPI_flops() as it claims to do, rather than PAPI_flips(). Patch thanks to: Emilio De Camargo Francesquini * src/papi_hl.c: Fix some copy-and-paste documentation remnants in the papi_hl.c file, mostly where it said FLIPS where it meant FLOPS. 2012-01-04 * src/utils/native_avail.c: Update papi_native_avail to *not* print the event codes, as these are not guaranteed to be stable from run to run. Also fix up the formatting and print some component info too. Please try and let me know if you don't like the new output. * src/: configure, configure.in: Respect a FORCED option in configure. 2011-12-22 * src/Rules.pfm4_pe: Remove perfmon.h from MISCHDRS. 2011-12-20 * src/: Rules.perfctr, Rules.perfctr-pfm, Rules.pfm, Rules.pfm4_pe, Rules.pfm_pe, linux-lock.h, mb.h: Merry Christmas ARM users. This patch fixes the SMP ARM issues reported by Harald Servat. Also, adds proper header dependency checking in the Rules files. People, please when you add headers, please add them to the dependency lines so everything gets rebuilt properly. New implementation of SMP locks are very pedantic, that is, they are nost the fastest, but they do use atomics and avoid kernel intervention. Passed on our 2 core ARM v7. All pthreads tests now pass, except the ones that also fail in the single processor case usually due to a missing event. Samples: mucci@panda:~/papi.head/src$ uname -a Linux panda 3.0.0 #2 SMP Fri Jul 29 16:23:54 EDT 2011 armv7l GNU/Linux mucci@panda:~/papi.head/src$ hostname panda mucci@panda:~/papi.head/src$ cat /proc/cpuinfo Processor: ARMv7 Processor rev 2 (v7l) processor: 0 BogoMIPS: 2007.19 processor: 1 BogoMIPS: 1965.18 Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer: 0x41 CPU architecture: 7 CPU variant: 0x1 CPU part: 0xc09 CPU revision: 2 Hardware: OMAP4 Panda board Revision: 0020 Serial: 0000000000000000 mucci@panda:~/papi.head/src$ ./ctests/locks_pthreads Creating 2 threads 10000 iterations took 13489 us. Running 44480 iterations Expected: 88960 Received: 88960 locks_pthreads.c PASSED mucci@panda:~/papi.head/src$ ./ctests/pthrtough Creating 2 threads for 1000 iterations each of: register create_eventset destroy_eventset unregister pthrtough.c PASSED mucci@panda:~/papi.head/src$ ./ctests/pthrtough2 Creating 2000 threads for 1 iterations each of: register create_eventset destroy_eventset unregister Failed to create thread: 238 Continuing test with 237 threads. pthrtough2.c PASSED mucci@panda:~/papi.head/src$ ./ctests/thrspecific Thread 0x40ae1470 started, specific data is at 0xbea9c6d4 Thread 0x40021000 started, specific data is at 0xbea9c6c4 Thread 0x4244d470 started, specific data is at 0xbea9c6c8 Thread 0x4138d470 started, specific data is at 0xbea9c6d0 Thread 0x41c4d470 started, specific data is at 0xbea9c6cc Entry 0, Thread 0x41c4d470, Data Pointer 0xbea9c6cc, Value 4000000 Entry 1, Thread 0x40021000, Data Pointer 0xbea9c6c4, Value 500000 Entry 2, Thread 0x40ae1470, Data Pointer 0xbea9c6d4, Value 1000000 Entry 3, Thread 0x4244d470, Data Pointer 0xbea9c6c8, Value 8000000 Entry 4, Thread 0x4138d470, Data Pointer 0xbea9c6d0, Value 2000000 thrspecific.c PASSED mucci@panda:~/papi.head/src$ ./ctests/krentel_pthreads program_time = 6, threshold = 20000000, num_threads = 3 launched timer in thread 0 launched timer in thread 1 launched timer in thread 3 launched timer in thread 2 [1] time = 1, count = 7, iter = 5, rate = 1400.0/Kiter [2] time = 1, count = 7, iter = 5, rate = 1400.0/Kiter [0] time = 1, count = 7, iter = 5, rate = 1400.0/Kiter [3] time = 1, count = 7, iter = 5, rate = 1400.0/Kiter [1] time = 2, count = 25, iter = 16, rate = 1562.5/Kiter [0] time = 2, count = 25, iter = 16, rate = 1562.5/Kiter [3] time = 2, count = 25, iter = 16, rate = 1562.5/Kiter [2] time = 2, count = 25, iter = 16, rate = 1562.5/Kiter [1] time = 3, count = 25, iter = 16, rate = 1562.5/Kiter [2] time = 3, count = 25, iter = 16, rate = 1562.5/Kiter [0] time = 3, count = 25, iter = 16, rate = 1562.5/Kiter [3] time = 3, count = 25, iter = 16, rate = 1562.5/Kiter [1] time = 4, count = 25, iter = 16, rate = 1562.5/Kiter [0] time = 4, count = 25, iter = 16, rate = 1562.5/Kiter [3] time = 4, count = 25, iter = 16, rate = 1562.5/Kiter [2] time = 4, count = 25, iter = 16, rate = 1562.5/Kiter [3] time = 5, count = 25, iter = 16, rate = 1562.5/Kiter [0] time = 5, count = 25, iter = 16, rate = 1562.5/Kiter [2] time = 5, count = 25, iter = 16, rate = 1562.5/Kiter [1] time = 5, count = 26, iter = 17, rate = 1529.4/Kiter [2] time = 6, count = 25, iter = 16, rate = 1562.5/Kiter [0] time = 6, count = 27, iter = 17, rate = 1588.2/Kiter done krentel_pthreads.c PASSED 2011-12-15 * src/papi_libpfm_presets.c: Change PAPI_PERFMON_EVENT_FILE environment variable name to PAPI_CSV_EVENT_FILE since it's not just for perfmon anymore. * src/: configure, configure.in: Open mouth, insert foot; fix perfctr configure by not testing a library we have not built yet. 2011-12-14 * src/: configure, configure.in: Missed one more place where we tested perfctr != "no" * src/: configure, configure.in: Fix a typo in the perfctr section; it was causing a machine to default to perfctr when it had no performance interface. ( a centos vm image with a 2.6.18 kernel ) Also checks that we actually have perfctr if we specify --with-perfctr. 2011-12-08 * src/components/cuda/: Makefile.cuda.in, Rules.cuda, configure, configure.in, linux-cuda.c, linux-cuda.h: Added auto-detection of CUDA version to PAPI CUDA Component. Reason is, the interface has changed between CUDA/CUPTI 4.0 and 4.1. PAPI now supports both CUDA versions without any exposure to the users. Configure step is unchanged and no additional knowledge of which CUDA version is installed is required. 2011-12-03 * src/components/appio/: CHANGES, README, Rules.appio, appio.c, appio.h, tests/Makefile, tests/appio_list_events.c, tests/appio_values_by_code.c, tests/appio_values_by_name.c: [no log message] 2011-11-25 * src/linux-timer.c: Fix compilation warning if you specify --with-walltime=gettimeofday * src/linux-timer.c: Fix the build on Linux systems using mmtimer * src/linux-common.c: Update the linux MHz detection code to use bogoMIPS when there is no MHz field available in /proc/cpuinfo. This gives roughly correct MHz on ARM, and the MIPS workaround should also still work. 2011-11-23 * src/components/net/linux-net.c: Fix compile errors in a debug message. (pathname didn't exist but we are working on NET_PROC_FILE) 2011-11-22 * src/components/net/: linux-net.c, tests/net_values_by_code.c, tests/net_values_by_name.c: Change the ping command in the net tests to not use &> to redirect to NULL. This would work on a system with csh, but on systems with a bash shell this runs ping in the background instead, so the test finishes before ping can generate any packets. * src/components/net/linux-net.c: Fix slight bug in the net component, where a memset() had the wrong arguments. This made for weird results in the case where we start/stop quickly enough that we return the initial data. * src/components/net/: CHANGES, Makefile.net.in, README, Rules.net, configure, configure.in, linux-net.c, linux-net.h, tests/Makefile, tests/net_list_events.c, tests/net_values_by_code.c, tests/net_values_by_name.c: Replace net component with updated version written by Jose Pedro Oliveira * Dynamically detects the network interfaces (i.e. the ones listed in /proc/net/dev) * No longer needs to fork/exec the external ifconfig command and parse its output. It now reads the Linux kernel network statistics directly from /proc/net/dev. * Each network interface now has 16 events instead of 13 (all counters in /proc/net/dev). * Adds support for PAPI_event_name_to_code() * Adds a couple of small tests/examples 2011-11-16 * doc/Doxyfile-everything: Fix the exclude libpfm/perfctr config. 2011-11-10 * src/perf_events.c: Only scale when running != enabled. Now verified on ig, brutus and the malta * src/perf_events.c: Further tuneups for mpx'ing. Previous commit broke systems with valid return values from perf_events for running & enabled. My attempt at scaling in long long world caused an overflow which led to a negative number when passed up the chain. Also consolidated types... best way to avoid this stuff is to start as the type you are ending as. Now we use some better integer scaling...guaranteed within +-0.5% of the actual scaled value of enabled / running. New results on brutus: multiplex1 case1: Does PAPI_multiplex_init() not break regular operation? Added PAPI_TOT_CYC Added PAPI_FP_INS case1: PAPI_TOT_CYC PAPI_FP_INS case1: 2739865106 600002876 case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS case2: PAPI_TOT_CYC PAPI_FP_INS case2: 2739678237 600002258 case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS case3: PAPI_TOT_CYC PAPI_FP_INS case3: 2739847832 600002298 case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4: 2737832980 600013404 case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC Added PAPI_FP_INS read @start counter[0]: 7106 read @stop counter[0]: 2740387017 difference counter[0]: 2740379911 read @start counter[1]: 0 read @stop counter[1]: 600017169 difference counter[1]: 600017169 multiplex1.c PASSED 2011-11-09 * src/components/cuda/linux-cuda.c: For the CUDA Component, PAPI_read() now accumulates event values. This has to be explicitly done in PAPI because CUPTI automatically resets all counter values to 0 after a read. (PAPI_start()/stop() continues to reset the values to 0) * src/perf_events.c: Last of the multiplex fixes to perf events. The root of all evil was this: counts[i] = ( uint64_t ) ( ( double ) buffer[count_idx] * ( double ) buffer[get_total_time_enabled_idx( )] / ( double ) buffer[get_total_time_running_idx( )] ) ; In addition to improper casting to uints... (papi returns int64s), using floating point arith is a no-no. Plus this resulted in divide by zeros... Before: SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3, tid: 0, cpu: -1, buffer[0-2]: 0x6cba, 0x0, 0x0, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x23, 0x0, 0x0, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3, tid: 0, cpu: -1, buffer[0-2]: 0x6de72b5d, 0x8ae0fa80, 0x8ae0fa80, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b46b, 0x8ae0fa80, 0x8ae0fa80, ret: 24 So kernel is good, but errors in multiplexed scaling. case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC Added PAPI_FP_INS read @start counter[0]: 9223372034707292159 read @stop counter[0]: 1843791732 difference counter[0]: -9223372032863500427 multiplex1.c FAILED Line # 389 With fix: SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3, tid: 0, cpu: -1, buffer[0-2]: 0x6782, 0x0, 0x0, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x0, 0x0, 0x0, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3, tid: 0, cpu: -1, buffer[0-2]: 0x6de725dc, 0x8ae0fa80, 0x8ae0fa80, ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b400, 0x8ae0fa80, 0x8ae0fa80, ret: 24 read @start counter[0]: 26498 read @stop counter[0]: 1843865052 difference counter[0]: 1843838554 read @start counter[1]: 0 read @stop counter[1]: 80000000 difference counter[1]: 80000000 SUBSTRATE:perf_events.c:_papi_pe_update_control_state:1288:12821 Called with count == 0 SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_shutdown:1178:12821 shutdown multiplex1.c PASSED New code is vastly simpler and smaller and checks for bad kernel behavior: int64_t tot_time_running = papi_pe_buffer[get_total_time_running_idx( )]; int64_t tot_time_enabled = papi_pe_buffer[get_total_time_enabled_idx( )]; #ifdef BRAINDEAD_MULTIPLEXING if (tot_time_enabled == 0) tot_time_enabled = 1; if (tot_time_running == 0) tot_time_running = 1; #else /* If we are convinced this platform's kernel is fully operational, then this stuff will never happen. If it does, then BRAINDEAD_MULTIPLEXING needs to be enabled. */ if ((tot_time_running == 0) && (papi_pe_buffer[count_idx])) { PAPIERROR("This platform has a kernel bug in multiplexing, count is %lld (not 0), but time running is 0.\n",papi_pe_buffer[count_idx]); return PAPI_EBUG; } if ((tot_time_enabled == 0) && (papi_pe_buffer[count_idx])) { PAPIERROR("This platform has a kernel bug in multiplexing, count is %lld (not 0), but time enabled is 0.\n",papi_pe_buffer[count_idx]); return PAPI_EBUG; } #endif pe_ctl->counts[i] = (papi_pe_buffer[count_idx] * tot_time_enabled) / tot_time_running; Also, renamed all instances of 'buffer' to papi_pe_buffer because buffer is a global variable on MIPS/Linux/libc. Yikes! (gdb) whatis buffer type = struct utmp * * src/ctests/multiplex1.c: Made sure that PAPI_TOT_CYC is the first event added to multiplexing event set. This will demonstrate the bug in perf_event multiplexing arithmetic in case5 on MIPS and other perf_event subsystems that likely have some breakage in the kernels handling of multiplexing. The common bug is that the perf_event subsystem does not fill in the second and third elements of the 24 byte read that gets returned from the kernel. These values are time_enabled and time_running. MIPS as of 3.0.3 just fills this in after a HZ tick has happened. Workarounds are pretty simple in the low level layer... A buggy output looks like this (3.0.3 MIPS/Linux Big Endian) -bash-4.1$ ./ctests/multiplex1 case1: Does PAPI_multiplex_init() not break regular operation? Added PAPI_TOT_CYC Added PAPI_FP_INS case1: PAPI_TOT_CYC PAPI_FP_INS case1: 1843775252 80000000 case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS case2: PAPI_TOT_CYC PAPI_FP_INS case2: 1843773254 80000037 case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS case3: PAPI_TOT_CYC PAPI_FP_INS case3: 1843772919 80000037 case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4: 1843773959 80000037 case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC Added PAPI_FP_INS read @start counter[0]: 9223372034707292159 read @stop counter[0]: 1843784577 difference counter[0]: -9223372032863507582 multiplex1.c FAILED Line # 389 Error: Difference in start and stop resulted in negative value! 2011-11-08 * src/components/cuda/: linux-cuda.c, linux-cuda.h: Updated CUDA component for CUPTI 4.1 (RC1). Note, SetCudaDevice() should now work with the latest CUDA 4.1 version. 2011-11-07 * src/components/coretemp/linux-coretemp.c: Update coretemp to better handle sparse numbering of the inputs. * doc/Doxyfile-everything: Exclude the libpfm* and perfctr-* directories from consideration when generating Doxygen docs. * src/: papi.h, components/acpi/linux-acpi.h, components/coretemp_freebsd/coretemp_freebsd.c, components/cuda/linux-cuda.h, components/infiniband/linux-infiniband.h, components/mx/linux-mx.h, components/net/linux-net.h: Place a space in < your name here > to cleanup doxygen warnings. * src/perf_events.c: Only perf event systems that have FAST counter reads and FAST hw timer access are x86... * src/linux-common.c: MIPS clock and Linux fixup code * src/components/example/example.c: A little more documentation on which of the component vector function pointers are relevant. * src/papi_vector.c: Tested the dummy get_{real,virt}_{cyc,usec} functions on zeus, they appear to work. * src/components/example/tests/example_multiple_components.c: Another fix to properly skip the multiple component case if CPU component not available. * src/components/example/tests/example_multiple_components.c: Skip the test if no CPU component enabled, rather than fail. 2011-11-04 * src/components/example/example.c: Free example_native_table with papi_free, glibc didn't like it if we just called free. (we allocate it with papi_calloc) * man/...: Version number bump. (since the pages are quantifiably different from those released in 4.2.0 ) * doc/: Doxyfile, Doxyfile-everything, Doxyfile.utils: Bump version number in the doxygen config files. * src/components/example/example.c: _papi_example_shutdown_substrate does not have any arguments. * src/components/net/linux-net.c: Include ctype.h for isspace(). * release_procedure.txt: release_procedure now reflects the correct version of doxygen to use. * src/buildbot_configure_with_components.sh: Do not always configure with not cpu counters, allow this to be passed in. Allows us to use one script for both types of builds we test. * delete_before_release.sh, src/buildbot_configure_with_components.sh: Create a script for buildbot to configure with several components. Buildbot runs all commandline arguments through a sanitization before passing them to sh. Thus --with-configure="a b c" => '--with-configure="a b c"' which is bad. delete_before_release.sh has been instructed to remove this file. * man/...: Rebuild the manpages with doxygen 1.7.4 to remove the 's at the end of sentances. The html output looks clean. 2011-11-03 * src/: multiplex.c, papi.c: Fix some gcc-4.6 compile warnings complaining that retval was being set but not used. * src/papi.c: Add some extra comments to the PAPI_num_cmp_hwctrs() code that describe its limitations a bit better. 2011-11-02 * src/: ctests/overflow_allcounters.c, testlib/test_utils.c: Add lots of debugging to make results of overflow_allcounters test a bit more clear. * src/components/coretemp/tests/coretemp_pretty.c: coretemp_pretty wasn't printing the description for fan inputs. The result on an apple MacBook Pro (running Linux) now looks like this: Trying all coretemp events Found coretemp component at cid 2 hwmon0.temp1_input value: 33.50 degrees C, applesmc module, label TB0T hwmon0.temp2_input value: 33.50 degrees C, applesmc module, label TB1T hwmon0.temp3_input value: 32.00 degrees C, applesmc module, label TB2T hwmon0.temp4_input value: 0.00 degrees C, applesmc module, label TB3T hwmon0.temp5_input value: 62.25 degrees C, applesmc module, label TC0D hwmon0.temp6_input value: 54.25 degrees C, applesmc module, label TC0F hwmon0.temp7_input value: 57.25 degrees C, applesmc module, label TC0P hwmon0.temp8_input value: 69.00 degrees C, applesmc module, label TG0D hwmon0.temp9_input value: 58.00 degrees C, applesmc module, label TG0F hwmon0.temp10_input value: 51.25 degrees C, applesmc module, label TG0H hwmon0.temp11_input value: 58.25 degrees C, applesmc module, label TG0P hwmon0.temp12_input value: 60.75 degrees C, applesmc module, label TG0T hwmon0.temp13_input value: 62.25 degrees C, applesmc module, label TN0D hwmon0.temp14_input value: 59.25 degrees C, applesmc module, label TN0P hwmon0.temp15_input value: 49.00 degrees C, applesmc module, label TTF0 hwmon0.temp16_input value: 54.00 degrees C, applesmc module, label Th2H hwmon0.temp17_input value: 58.75 degrees C, applesmc module, label Tm0P hwmon0.temp18_input value: 31.50 degrees C, applesmc module, label Ts0P hwmon0.temp19_input value: 44.25 degrees C, applesmc module, label Ts0S hwmon0.fan1_input value: 1999 RPM, applesmc module, label Left side hwmon0.fan2_input value: 2003 RPM, applesmc module, label Right side coretemp_pretty.c PASSED * src/components/coretemp/: linux-coretemp.c, linux-coretemp.h, tests/coretemp_pretty.c: Make the coretemp code a bit pickier about which events it supports. Add descriptions to the events. Also add support for Voltage (in*) events. On an amd14h machine I have access to, coretemp_pretty now prints: Trying all coretemp events Found coretemp component at cid 2 hwmon0.in1_input value: 1.31 V, it8721 module, label ? hwmon0.in2_input value: 2.22 V, it8721 module, label ? hwmon0.in3_input value: 3.34 V, it8721 module, label +3.3V hwmon0.in4_input value: 1.02 V, it8721 module, label ? hwmon0.in5_input value: 1.52 V, it8721 module, label ? hwmon0.in6_input value: 1.13 V, it8721 module, label ? hwmon0.in7_input value: 3.26 V, it8721 module, label 3VSB hwmon0.in8_input value: 3.17 V, it8721 module, label Vbat hwmon0.temp1_input value: 28.00 degrees C, it8721 module, label ? hwmon0.temp2_input value: -128.00 degrees C, it8721 module, label ? hwmon0.temp3_input value: -128.00 degrees C, it8721 module, label ? hwmon0.fan1_input value: 0 RPM hwmon0.fan2_input value: 1320 RPM hwmon1.temp1_input value: 33.00 degrees C, jc42 module, label ? hwmon2.temp1_input value: 31.75 degrees C, jc42 module, label ? hwmon3.temp1_input value: 53.00 degrees C, radeon module, label ? hwmon4.temp1_input value: 53.12 degrees C, k10temp module, label ? coretemp_pretty.c PASSED * src/components/coretemp/: linux-coretemp.c, tests/coretemp_pretty.c: Cut and paste error slipped in to that last commit. Fixes a build issue. * src/components/coretemp/: linux-coretemp.c, tests/Makefile, tests/coretemp_pretty.c: Clean up coretemp with same cleanups done in example component. Add a new test, "coretemp_pretty" that prints coretemp results in a more user-friendly way. * man/:... Rebuild the man pages with a newer version of doxygen. ( older versions of doxygen had a nasty bug in man output. ) Also reworked the utilities documentation to remove pages for the files. Thanks to Jose Pedre Oliveria for pointing this out. * src/components/example/tests/: Makefile, example_multiple_components.c: Add a test that makes sure you can have active EventSets on multiple components at the same time. * release_procedure.txt: Change PATH specification to include tcsh syntax; other minor syntax corrections. * src/components/example/example.c: More cleanups and documentation for the example component. 2011-11-01 * src/components/example/example.c: Some more major overhaul of the example component. A lot more documentation, plus make is behave a lot more like a real component would. * doc/Doxyfile.utils: Turn off undocumented warnings for the utils. doxygen run. * src/utils/: avail.c, command_line.c, cost.c, event_chooser.c, multiplex_cost.c: Add spaces to the comments so doxygen doesn't think is an xml tag. 2011-10-31 * src/utils/: avail.c, clockres.c, command_line.c, component.c, cost.c, decode.c, error_codes.c, event_chooser.c, mem_info.c, multiplex_cost.c, native_avail.c: Remove the @file directive from the doxygen comment blocks for the utilities. This cleans up the generated man pages. ( we nolonger build *.c.1 ) * src/components/example/: example.c, tests/example_basic.c: Clarify in the example component that ->reset only gets called if an eventset is currently running. Extend the example_basic test to test PAPI_reset() * release_procedure.txt: Fix a maketarget typo. * release_procedure.txt: We now have a good version of doxygen installed on most icl run machines. ( /mnt/scratch/sw/doxygen-1.7.5.1 ) * doc/doxygen_procedure.txt: [no log message] * release_procedure.txt: Update release_procedure to inform how to update the website documentation link. 2011-10-28 * RELEASENOTES.txt: Correct the RELEASENOTES for some things I missed when reviewing it. It's Offcore events that we don't support on Nehalem/Westmere/Sandybridge. Also the power6 libpfm4 bug that was listed as an outstanding bug was fixed a long time ago. * src/components/coretemp/linux-coretemp.c: Have coretemp set the num_native_events field. * src/components/example/tests/example_basic.c: Update example test to print num_native_events, to help debug issues with other components not updating the value. * src/components/coretemp/: linux-coretemp.c, linux-coretemp.h: Fix typo enent -> event Also remove residual LMSENSOR mentions from the coretemp header. * src/papi_libpfm4_events.c: Fix two memory leak locations. The attached patch reduces the number of lost memory blocks reported by valgrind from 234 to 39. It frees the memory allocated by the 4 strdups and the calloc functions in papi_libpfm4_events.c:allocate_native_event(). Patch by: José Pedro Oliveira * src/components/cuda/tests/Makefile: The change to pass the PAPI CC/CFLAGS to the component tests broke the nvidia test as it wants CC to be nvcc. So update that Makefile to use nvcc instead. 2011-10-27 * src/components/example/tests/example_basic.c: Improve the example_basic component test to be much more comprehensive. * src/components/example/: example.c, tests/HelloWorld.c, tests/Makefile, tests/example_basic.c: Cleanup the example test. Fix various mistakes in the comments as well as add better error checking. Also rename the "HelloWorld" test to "example_basic" * src/components/coretemp/tests/Makefile: The coretemp_test target was example_test due to cut-and-paste error. Patch from Jose Pedro Oliveira * src/Makefile.inc: Add a component_tests dependency so that the component_tests are made during a make -j build * src/Makefile.inc: Make sure the component test makefiles get passed the CC and CFLAGS definitions. * src/components/coretemp/: linux-coretemp.c, tests/Makefile, tests/coretemp_basic.c: Fix up the coretemp component some more. Make sure the enumerate function returns PAPI_ENOEVNT if no events are available. Update the Makefile so it has proper dependencies. Update the test so it prints the first event available. (The latter based on a patch from Jose Pedro Oliveira) * src/: solaris-ultra.c, ctests/all_native_events.c: The solaris-ultra substrate was still broken. This is because recent changes to component bind time explictly used the ->set_domain() call, and this vector was not set up in solaris_ultra. Also made the all_native_events test report the returned error value to aid in debugging problems like this in the future. papi-5.3.0/README0000600003276200002170000000575112247131117013031 0ustar ralphundrgrad/* * File: README * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ This directory contains: README This file. LICENSE.txt The text of the PAPI license. INSTALL.txt Instructions for installing on all supported platforms. RELEASENOTES.txt Information about recent releases. ChangeLogPxxx.txt Detailed log of changes committed to the repository. doc/ User documentation and support files. See doc/README for details. man/ Stuff related to PAPI man pages. See man/README for details. src/ The PAPI library source files and support files. See src/README for more details. Getting Started --------------- If this is the first file you've opened in the PAPI tree, we'll try to give you a few tips on where to go from here. - Read the license found in LICENSE.txt. It's pretty short, and not very restrictive, but it'll give you an idea of what you can and can't do with the PAPI sources. - Visit the website at: There you can find late-breaking news that may be more current than in these files. You can also find documentation in a greater variety of formats than in the papi/doc/ directory. - Sign up for the PAPI mailing list(s). Instructions are on our home page. - Read the RELEASENOTES.txt file to get an idea of what's new in the current release. Installing PAPI --------------- To install PAPI on your system: - Find the section in INSTALL.txt that pertains to your hardware and operating system. - Follow the directions to install required components and build the PAPI libraries. - Run the test suite when you are finished to verify that everything went ok. NOTE: Although we make every attempt to get all tests to PASS or SKIP on all platforms, there are occasional instances of FAILures due to excessively tight compliance thresholds or platform idiosyncrasies. Don't panic if one or two tests FAIL. Contact us with complete output and we'll see what we can do. Using PAPI ---------- To use PAPI in your own programs: - Read the PAPI Overview found at: http://icl.cs.utk.edu/projects/papi/wiki/Main_Page. - Try out the utility programs in /utils to see what's in your system. - Try a test program. Source for a number of tests in both C and FORTRAN is available in the src/tests/ and src/ftests/ directories. Find a program that's similar to what you want to do. Make sure you can build it and run it. - Write a test program of your own, exercising the PAPI events and features of interest to you. - Go for broke. Fold PAPI calls into your sources and see what you can learn. Bugs and Questions ------------------ - Visit our FAQ at: or read a snapshot of the FAQ in papi/PAPI_FAQ.html - Subscribe to the PAPI mailing list at: - Read historical postings to the list. - Post questions to the list. - Post bugs to the TRAC bug tracker at: papi-5.3.0/ChangeLogP530.txt0000600003276200002170000004651212247131117015111 0ustar ralphundrgrad2013-11-25 * a40c96c5 src/components/nvml/linux-nvml.c: nvml component: Add missing } * 166971ba src/components/nvml/linux-nvml.c: nvml component: modify api checks To check if nvmlDeviceGetEccMode and nvmlDeviceGetPowerUsage are supported, we just call the functions and see if nvml thinks its supported by the card. 2013-11-21 * 78192de9 delete_before_release.sh: Kill the .gitignore files in delete_before_release * 60fb1dd4 src/utils/command_line.c: command_line utility: Initialize a variable Initialize data_type to PAPI_DATATYPE_INT64 Addresses a coverity error Error: COMPILER_WARNING: [#def19] papi-5.2.0/src/utils/command_line.c:133:4: warning: 'data_type' may be used uninitialized in this function [-Wmaybe-uninitialized] switch (data_type) { ^ 2013-11-20 * da2925f6 src/ctests/data_range.c: Make data_range test use prginfo Coverity complained about prginfo being an unused variable for data_range.c. The code is modified to be stylistically like the code for hw_info in the preceding lines which also is not used elsewhere in the test. This is more to reduce the amount of output in the Coverity scan than to fix this minor issue. * 3386953d src/ctests/data_range.c: Check the return values of PAPI_start() and PAPI_stop() for the data_range test The ia64 data_range test did not check the return values of PAPI_start() or PAPI_stop(). There are propbably few people running this test on ia64 machine, but this is more to eliminate a couple errors noted by a Coverity scan and reduce the clutter in the Coverity scan. 2013-11-19 * e704e8f1 src/configure src/configure.in: configure: Build fpapi.h and co for mic When building for mic, set the cross_compiling var in configure to use a native c compiler to build genpapif. 2013-11-18 * d32b1dae man/man1/papi_avail.1 man/man1/papi_clockres.1 man/man1/papi_command_line.1...: Rebuild the man pages for a 5.3 release * 4e735d11 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump version numbers for a pending 5.3 * efe026cd src/Makefile.inc: Makefile.inc: Pass LINKLIB, not SHLIB to the comp_tests * f0598acb src/ctests/Makefile.target.in: ctests/Makefile.target.in: Properly catch LINKLIB LINKLIB=$(SHLIB) or $(LIBRARY), so we have to have configure fill in those as well. * 1744c23e src/ctests/Makefile.target.in: ctests/Makefile.target.in: Respect static-tools the --with-static-tools configure flag sets STATIC, not LDFLAGS. This gets passed to the tests' make subprocesses via LDFLAGS="$(LDFLAGS) $(STATIC)" We mimic this in the installed Makefile. * e9347373 src/ctests/Makefile: ctests/Makefile: Don't clobber value of LIBRARY TOOD: write a better message * 237219d1 src/Makefile.inc: Makefile.inc: Add enviro vars to fulltest recipe The fulltest target didn't set LD_LIBRARY_PATH and as a result, several tests wouldn't find libpfm and fail to run. The fix is to call our SETPATH command first (as all of the other testing targets do) See ------------------------------------------------------------------ icc -diag-disable 188,869,271 -g -g -DSTATIC_PAPI_EVENTS_TABLE -DPEINCLUDE="libpfm4/include/perfmon/perf_event.h" -D_REENTRANT -D_GNU_SOURCE -DUSE_COMPILER_TLS -Ilibpfm4/include -I../../../testlib -I../../.. -I. -o perf_event_offcore_response perf_event_offcore_response.o event_name_lib.o ../../../testlib/libtestlib.a ../../../libpapi.so.5.2.0.0 ld: warning: libpfm.so.4, needed by ../../../libpapi.so.5.2.0.0, not found (try using -rpath or -rpath-link) ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_event_attr_info' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_initialize' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_pmu_info' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_version' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_os_event_encoding' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_event_next' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_get_event_info' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_strerror' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_find_event' ../../../libpapi.so.5.2.0.0: undefined reference to `pfm_terminate' make[2]: *** [perf_event_offcore_response] Error 1 ------------------------------------------------------------------ 2013-11-17 * a7f642d2 src/Makefile.inc src/configure src/configure.in: Switch LINKLIB to not have relative pathing 2013-11-15 * 91a6fa54 src/components/lustre/tests/Makefile: Fix a typo in the lustre tests' Makefile 2013-11-13 * 9a5f9ad4 src/papi_preset.c: papi_preset.c: Fix _papi_load_preset_table func Patch by Gleb Smirnoff ---------------------- The _papi_load_preset_table() loses last entry from a static table. The code in get_event_line() returns value of a char next to the line we are returning. Obviously, for the last entry the char is '\0', so function returns false value and _papi_load_preset_table() ignores the last line. Patch attached. The most important part of my patch is only: - ret = **tmp_perfmon_events_table; + return i; This actually fixes the lost last line. However, I decided to make the entire get_event_line() more robust, protected from bad input, and easier to read. ---------------------- 2013-11-12 * 579139a6 src/utils/hybrid_native_avail.c: more doxygen xml tag cleanup * 952bb621 src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/CNKunit/linux-CNKunit.h src/components/bgpm/IOunit/linux-IOunit.c...: Fix doxygen Unsupported xml/html tag warnings * 0c161015 src/components/micpower/linux-micpower.h: micpower: fix doxygen warning * b187f065 src/components/host_micpower/README: host_micpower: update docs 2013-11-11 * 4d379c6f src/ctests/p4_lst_ins.c: ctests/p4_lst_ins: Narrow scope of test This test attempted to ensure that it was running on a P4, the test missed for all non intel systems. 2013-11-10 * ee1c7967 .../host_micpower/utils/host_micpower_plot.c: Added energy consumption to host_micpower utility. 2013-11-08 * eee49912 src/ctests/shlib.c: shlib.c: Check for NULL Thanks to Will Cohen for reporting. Coverity picked up an instance of a value that could be NULL and strlen would barf on it. Error: FORWARD_NULL (CWE-476): papi-5.2.0/src/ctests/shlib.c:70: var_compare_op: Comparing "shinfo->map" to null implies that "shinfo->map" might be null. papi-5.2.0/src/ctests/shlib.c:74: var_deref_model: Passing "shinfo" to function "print_shlib_info_map(PAPI_shlib_info_t const *)", which dereferences null "shinfo->map". papi-5.2.0/src/ctests/shlib.c:13:26: var_assign_parm: Assigning: "map" = "shinfo->map". papi-5.2.0/src/ctests/shlib.c:24:3: deref_var_in_call: Function "strlen(char const *)" dereferences an offset off "map" (which is a copy of "shinfo->map"). * 83c31e25 src/components/perf_event/perf_event.c: perf_event.c: Check return value of ioctl Thanks to Will Cohen for reporting based upon output of coverity. * e5b33574 src/utils/multiplex_cost.c: multiplex_cost: check return value on PAPI_set_opt Thanks to Will Cohen for reporting based upon output of coverity. * 04f95b14 src/components/.gitignore: Ignore component target makefile * cbf7c1a8 src/components/rapl/linux-rapl.c src/components/rapl/tests/Makefile src/components/rapl/tests/rapl_basic.c: Modify linux-rapl to support one wrap-around of the 32-bit registers for reading energy. This insures availability of the full 32-bit dynamic range. However, it does not protect against two wrap-arounds. Care must be taken not to exceed the expected dynamic range, or to check reasonableness of results on completion. Modifications were also made to report rapl events as unscaled binary values in order to compute dynamic ranges. Modify rapl-basic to add a test (rapl_wraparound) to estimate maximum measurement time for a naive gemm. With a -w option, measurement for this amount of time will be performed. The gemm can be replaced with a user kernel for more accurate time estimates. Makefile was modified to support the new test case. 2013-11-07 * 7784de21 src/ctests/data_range.c src/ctests/zero_shmem.c: Modernize some ctests Add tests_quiet check to data_range and zero_shmem 2013-11-06 * 7c953490 src/configure src/configure.in: More MPICC checking Have configure check for mpcc on AIX, in addition to mpicc. * 5c8d2ce0 src/ctests/zero_shmem.c: zero_shmem.c: Fix compiler warning The worker threads in the test print an ID, the test was setup to call pthread_self(), this is problematic. Since each thread is started with a unique work load, use this to lable threads. * 993a6e96 src/ctests/Makefile.recipies src/ctests/Makefile.target.in: ctests/Makefile.recipies: conditionally build the MPI test * b29d5f56 src/Makefile.inc src/configure src/configure.in: Check for mpicc at configure time configure[.in]: look for mpicc Makefile.inc: Pass MPICC to ctests' make 2013-11-05 * b2d643df src/papi_events.csv: Add floating point events for IvyBridge Now that Intel has documented them and libpfm4 supports them, PAPI can use them. We just use the same events as on sandybridge. Tested on an ivybridge system. 2013-11-01 * c5be5e26 src/components/micpower/linux-micpower.c: micpower: check return of fopen before use Issue reported by Will Cohen from results of Coverity run. * 5c1405ab src/components/host_micpower/utils/Makefile src/components/host_micpower/utils/README .../host_micpower/utils/host_micpower_plot.c: Add host_micpower utility to gather power (and voltage) measurements on Intel Xeon Phi chips using MicAccessAPI. * 46b9bdf5 src/components/host_micpower/linux-host_micpower.c: Added more detailed event description and correct units to host mic power events. * b97c0126 src/components/host_micpower/linux-host_micpower.c: host_micpower: Better error reporting grab output of dlerror on library load failure 2013-10-31 * 84da7fd3 src/components/host_micpower/Rules.host_micpower src/components/host_micpower/tests/Makefile: host_micpower: Fix some makefile bits tests/Makefile needed to define a target to work with the Make_comp_tests install machinery. Rules.host_micpower had a typo 2013-10-30 * 14f3e4c4 src/components/host_micpower/linux-host_micpower.c: host_micpower: fix function signature shutdown_thread took wrong arguments. 2013-10-28 * a4cc1113 release_procedure.txt: Update release_procedure.txt Bug in the version of doxygen we were using to produce the documentation led to some of the Fortran functions being left out in the cold. We now proscribe 1.8.5 * a1d6ae34 src/components/host_micpower/README: host_micpower: Add a README file. 2013-10-25 * 859dbc2c src/Makefile.inc src/components/Makefile_comp_tests src/components/Makefile_comp_tests.target.in...: Make the testsuite as a stand-alone copy-able directory of code These changes to the Makefiles allows the testsuite to be compiled separately from the papi sources. This is useful for people wanting to experiment with the tests and verify that the existing installation of papi works. We put absolute paths to the installed library and include files into the installed makefile for the tests. * c307ad18 src/ctests/Makefile src/ctests/attach_target.c src/testlib/do_loops.c: Refactor the driver in do_loops.c into its own file. (ctests/Makefile, ctests/attach_target.c testlib/do_loops.c) 2013-10-23 * ace71699 src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/L2unit/linux-L2unit.c...: Passing BGPM errors up to PAPI. 2013-10-22 * 2ee090ec src/components/bgpm/NWunit/linux-NWunit.c src/components/bgpm/NWunit/linux-NWunit.h: Fixed the behavior in BGQ's NWunit component after attaching an event set to a specific thread that owns the target recourse counting * 8ab071ee src/components/cuda/linux-cuda.c: CUDA component: Set the number of native events Patch by Steve Kaufmann When running papi_component_avail I notice that the number of CUDA events was always zero when the component was available. The following change correctly sets the number of native events for the component: 2013-10-11 * 071943b6 src/configure src/configure.in src/linux-context.h...: add preliminary aarch64 (arm64) support There has been some work to build fedora 19 on 64-bit arm armv8 machines (aarch64). I took a look that the why the papi build was failing. The attached is a set of minimal patches to get papi to build. The patch is just a step toward getting aarch64 support for papi. Things are not all there for papi to work in that environment. Still need libpfm to support aarch64 and papi_events.csv describing mappings to machine specific events. 2013-10-01 * 096eb7fc src/ctests/zero_shmem.c: zero_shmem: cleanup compiler warnings Remove unused variables. * d9669053 src/ctests/earprofile.c: ctests/earprofile.c: Fix compiler warning Both PAPI_get_hardware_info and PAPI_get_executable_info expect const pointers, (get_executable_info is called by prof_init in profile_utils). 2013-09-30 * 87e7e387 src/ctests/p4_lst_ins.c: ctests/p4_lst_ins.c: Fix the P4 load test. This test relied upon a removed symbol to decide if it should run. The symbol unsigned char PENTIUM4 was removed in 2011, update the logic. * 737d91ff src/ctests/zero_shmem.c: ctests/zero_shmem: Update the test * add_test_events expects another argument, update the zero_shmem test's invocation * Protect[Hide] OpenSHMEM calls with ifdefs 2013-09-27 * 86c11829 src/ctests/zero_shmem.c: zero_shmem: Include pthread.h * 2d0e666c src/ctests/zero_smp.c: zero_smp: Change a compile time error to a test_skip In 8d1f2c1, we changed the default assumption to be that all ctests are build. This change allows the test to gracefully skip if it does not have 'native SMP' threads. 2013-09-26 * 8d1f2c16 src/ctests/Makefile: ctests/Makefile: Default to building everything Set target all to depend upon ALL * ffd051cf src/ftests/Makefile src/testlib/Makefile: testlib, ftests Makefiles: cleanup ifort generated files ifort produces mod and f90 intermediate files which clean does not cleanup * c720bb59 src/components/coretemp/tests/coretemp_basic.c src/components/coretemp/tests/coretemp_pretty.c: Coretemp tests: Fix skipping logic The coretemp_basic test was failing if coretemp was disabled, skip seems more appropriate. Add this logic to the coretemp_pretty test. * af7f7508 src/configure src/configure.in: configure: refactor CTEST_TARGETS Problem: The set of ctests to build is determined at configure time, in CTEST_TARGETS. This is set in each OS detection section and suffers from neglect. Solution: Try to push the decisions about which tests to build out of configure, ask for them all. Idealy the tests will be written in such a way as to fail/skip gracefully if they lack functionallity, teething problems are expected initially. * 14421695 src/testlib/Makefile: testlib: Fix the Makefile variable assignment Consider: src=a.c b.c c.F obj=$(src:.c=.o) c.o After this substution, obj is {a.o b.o c.F c.o}, not quite the nut. Change the logic to correct that. 2013-09-17 * 05a4e17b .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore: cleanup a compiler warning _peu_read does not use the hwd_context argument. * f2056857 src/papi_events.csv: papi_events.csv: Add PAPI_L1_ICM for Haswell Thanks to Maurice Marks of Unisys for the contribution ------------- I've continued testing on Haswell. By comparison with Vtune and Emon on Haswell I found that we can use the counter L2_RQSTS:ALL_CODE_RD for PAPI_L1_ICM, which is a very useful measure. Attached is my current version of papi_events.csv with Haswell fixes. ------------- 2013-08-28 * efe3533d src/Makefile.inc src/components/Makefile_comp_tests src/ctests/Makefile...: testlib: library-ify testlib * Move ftests_util to testlib * Naively create libtestlib.a * utils link to the testlib library * [c|f]tests Switch the tests over to linking libtestlib.a * Component tests link libtestlib.a 2013-08-26 * d2a76dde src/configure src/configure.in src/utils/hybrid_native_avail.c: Gabrial's mic with icc changes to configure. Specify --with-mic at configure time and upon finding icc as the C compiler, it adds --mmic * 4c0349c0 src/papi_events.csv: papi_events.csv: First draft preset events on HSW Contributed by Nils Smeds ------------------------- Here is a suggestion for addition to Hsw counters. These are not rigorously tested. It compiles and loads. I'm rather uncertain on many of the events so I am hoping that adding events like this will get some useful feedback from the community so that we can improve. ------------------------- 2013-08-20 * 1b8ff589 src/utils/command_line.c: command_line util: Fix skipping event bug. The command line utility had an extranious index increment which resulted in skipping the reporting of event counts. Remove the increment. Reported by Steve Kaufmann -------------------------- I am getting some funny results when I use papi_command_line with the RAPL events. If I request them all: $ papi_command_line THERMAL_SPEC:PACKAGE0 MINIMUM_POWER:PACKAGE0 MAXIMUM_POWER:PACKAGE0 MAXIMUM_TIME_WINDOW:PACKAGE0 PACKAGE_ENERGY:PACKAGE0 DRAM_ENERGY:PACKAGE0 PP0_ENERGY:PACKAGE0 Successfully added: THERMAL_SPEC:PACKAGE0 Successfully added: MINIMUM_POWER:PACKAGE0 Successfully added: MAXIMUM_POWER:PACKAGE0 Successfully added: MAXIMUM_TIME_WINDOW:PACKAGE0 Successfully added: PACKAGE_ENERGY:PACKAGE0 Successfully added: DRAM_ENERGY:PACKAGE0 Successfully added: PP0_ENERGY:PACKAGE0 THERMAL_SPEC:PACKAGE0 : 115.000 W <<<<< MINIMUM_POWER:PACKAGE0 ?? MAXIMUM_POWER:PACKAGE0 : 180.000 W PACKAGE_ENERGY:PACKAGE0 : 2003784180(u) nJ DRAM_ENERGY:PACKAGE0 : 438751220(u) nJ PP0_ENERGY:PACKAGE0 : 1248748779(u) nJ ---------------------------------- Verification: Checks for valid event name. This utility lets you add events from the command line interface to see if they work. command_line.c PASSED Note that a value for MINIMUM_POWER:PACKAGE0 is not displayed even though it was successfully added to the event set. In fact, if combined with other events, the value for this event is never displayed. If you specifiy it on its own it is displayed: ------------------------------------ 2013-08-16 * 0cb63d6e src/components/lustre/linux-lustre.c: lustre component: fix memory leak 2013-08-13 * c810cd0d src/components/micpower/linux-micpower.c src/linux-memory.c src/papi_preset.c: Close resource leaks User dcb reported several resource leaks in trac bug #184. -------------------- I just ran the static analysis checker "cppcheck" over the source code of papi-5.2.0 It said 1. [linux-memory.c:711]: (error) Resource leak: sys_cpu 2. [papi_preset.c:735]: (error) Resource leak: fp 3. [components/micpower/linux-micpower.c:166]: (error) Resource leak: fp I've checked them all and they all look like resource leaks to me. Suggest code rework. ---------------------------------- 2013-08-07 * 8d479895 doc/Makefile: Doxygen makefile: update dependencies The manpages are generated from comments in papi.h, papi.c, papi_hl.c and papi_fwrappers.c; update the make dependencies to reflect this. papi-5.3.0/ChangeLogP501.txt0000600003276200002170000000710512247131117015102 0ustar ralphundrgrad2012-09-20 * 708d173a man/man1/papi_avail.1 man/man1/papi_clockres.1 man/man1/papi_command_line.1...: Rebuild the manpages for a 5.0.1 release. 2012-09-19 * 29cdd839 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump the version number for a 5.0.1 release. * bb7727f6 src/libpfm4/examples/fo src/libpfm4/examples/injectevt.c .../bin/usr/local/include/perfmon/perf_event.h...: Cleanup a botched libpfm4 update. As Steve Kaufmann noted, I botched an update of libpfm4. 2012-09-18 * dc117410 src/configure src/configure.in: Remove a trailing slash in libpfm4 pathing. Addresses an issue in rpmbuild when using bundled libpfm4. Reported and patched by William Cohen 2012-09-17 * e196b89b src/components/cuda/configure src/components/cuda/configure.in: Minor changes to CUDA configure necessary to get it running smoothly on the Kepler architecture. 2012-09-11 * 866bd51c src/papi_internal.c src/papi_preset.c: Fix preset bug The preset code was only initializing the first element of the preset code[] array. Thus any event with more than one subevent was not terminated at all, and the preset code would use random garbage as presets. This exposed another problem; half our code assumed a 0 terminated code[] array, the rest was looking for PAPI_NULL (-1). This standardizes on PAPI_NULL, with comments. Hopefully this might fix PAPI bug #150. This is a serious bug and should be included in the next stable release. 2012-08-29 * b978a744 src/configure src/configure.in: configure: fix autodetect perfmon case The fixes I made yesterday to libpfm include finding broke on perfmon2 PAPI if you were letting the library be autodetected. This change should fix things. Tested on an actual 2.6.30 perfmon2 system. * 4386e6e5 src/libpfm4/Makefile src/libpfm4/README src/libpfm4/config.mk...: Update libpfm4 included with papi to 4.3 2012-08-28 * 729a8721 src/configure src/configure.in: configure: don't check for libpfm if incdir specified When various --with-pfm values are passed, extra checks are done against the libpfm library. This was being done even if only the include path was specified, which probably shouldn't be necessary. This broke things because a recent change I made had the libpfm include path be always valid. * bc9ddffc src/configure src/configure.in: Fix compiling with separate libpfm4 The problem was if you used any of the --with-pfm-incdir type directives to configure, it would them assume you wanted a perfmon2 build. This removes that assumption. I did check this with perfmon2, perfctr, and perf_event builds so hopefully I didn't break anything. 2012-08-27 * 3b737198 src/papi.c src/papi_libpfm4_events.c src/papi_preset.c...: Hack around debugging macros. Under NO_VARARG_MACROS configs the debug printing guys become two expression statements. This is bad for code expecting eg SUBDBG(); to be one statement. --ie-- if ( foo ) SUBDBG("Danger Will Robinson"); ------ In order to keep the useful file and line number expansions with out variadic macro support, we split SUBDBG into two parts; A call to DEBUGLABEL() and friends and then a call to a function to capture the actual informative message. So if(foo) stmt(); becomes if (foo) print_the_debug_label(); print_your_message(...); And your message is always printed. See papi_debug.h for what actually happens. I'm not clever enough to work around this any other way, so I exaustivly put { }s around every case of the above I found. (I only searched on 'DBG' so its possible I missed some) papi-5.3.0/ChangeLogP440.txt0000600003276200002170000001220712247131117015103 0ustar ralphundrgrad2012-04-17 * 8782daed cvs2cl.pl delete_before_release.sh gitlog2changelog.py...: Update the release machinery for git. gitlog2changelog.py takes the output of git log and parses it to something like a changelog. * 80ff04a9 doc/Doxyfile-html: Cover up an instance of doxygen using full paths. Doxygen ( up to 1.8.0, the most recent at this writing ) would use full paths in directory dependencies ignoring the use relative paths config option. * c556dad1 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump the version for the PAPI 4.4.0 release. 2012-04-14 * 27174c0b src/components/bgpm/CNKunit/CVS/Entries src/components/bgpm/CNKunit/CVS/Repository src/components/bgpm/CNKunit/CVS/Root...: Removed CVS stuff from Q code. * 970a2d50 src/configure src/configure.in src/linux-bgq.c...: Removed papi_events.csv parsing from Q code. (CVS stuff still needs to be taken care of.) 2012-04-13 * 853d6c74 src/libpfm-3.y/lib/intel_corei7_events.h src/libpfm-3.y/lib/intel_wsm_events.h src/libpfm-3.y/lib/pfmlib_intel_nhm.c: Add missing update to libpfm3 Somehow during all of the troubles we had with importing libpfm3 into CVS, we lost some Nehalem/Westmere updates. Tested on a Nehalem machine to make sure this doesn't break anything. 2012-04-12 * 07e4fcd6 INSTALL.txt: Updated INSTALL notes for Q * 2a0f919e src/Makefile.in src/Makefile.inc src/components/README...: Added missing files for Q merge. * 0b0f1863 src/Rules.bgpm src/components/bgpm/CNKunit/CVS/Entries src/components/bgpm/CNKunit/CVS/Repository...: Added PAPI support for Blue Gene/Q. 2012-02-17 * 147a4969 src/perfctr-2.6.x/usr.lib/event_set_centaur.o src/perfctr-2.6.x/usr.lib/event_set_p5.o src/perfctr-2.6.x/usr.lib/event_set_p6.o: Remove a few binary files in perfctr-2.6.x 2012-02-23 * 955bd899 src/perfctr-2.6.x/usr.lib/event_set_centaur.os src/perfctr-2.6.x/usr.lib/event_set_p5.os src/perfctr-2.6.x/usr.lib/event_set_p6.os: Removes the last of the binary files from perfctr2.6.x Some binary files were left out in the cold after a mishap trying to configure perfctr for the build test. 2012-02-17 * 5fe239c8 src/perfctr-2.6.x/CHANGES src/perfctr-2.6.x/INSTALL src/perfctr-2.6.x/Makefile...: More cleanups from the migration, latest version of libpfm-3.y perfctr-2.[6,7] Version numbers got really confused in cvs and the git cvsimport didn't know that eg 1.1.1.28 > 1.1 ( see perfctr-2.6.x/CHANGES revision 1.1.1.28.6.1 :~) 2012-03-13 * e7173952 src/libpfm-3.y/examples_v2.x/multiplex.c src/libpfm-3.y/examples_v2.x/pfmsetup.c src/libpfm-3.y/examples_v2.x/rtop.c...: Fix some libpfm3 warnings. libpfm3 is not maintained anymore, so applied these changes locally. libpfm3 is compiled with -Werror so they broke the build with newer gcc even though they are just warnings in example programs. 2012-04-09 * 10528517 src/libpfm-3.y/Makefile src/libpfm-3.y/README src/libpfm-3.y/docs/Makefile...: Copy over libpfm-3.y from cvs. libpfm3 was another one of our skeletons in CVS. Thanks to Steve Kaufmann for keeping us honest. 2012-02-17 * ec8c879e src/aix.c src/components/coretemp/linux-coretemp.c src/components/coretemp_freebsd/coretemp_freebsd.c...: The git conversion reset all of the CVS $Id$ lines to just $Id$ Since we depend on the $Id$ lines for the component names, I had to go back and fix all of them to be the component names again. 2012-03-09 * 71a2ae4f src/components/lmsensors/linux-lmsensors.c: Fix buffer overrun in lmsensors component Conflicts: src/components/lmsensors/linux-lmsensors.c * ec0e1e9a src/libpfm4/config.mk src/libpfm4/docs/man3/pfm_get_os_event_encoding.3 src/libpfm4/examples/showevtinfo.c...: Update to current git libpfm4 snapshot 2012-02-15 * 1312923e src/libpfm4/debian/changelog src/libpfm4/debian/control src/libpfm4/debian/rules...: The git cvsimport didn't get the latest version of the libpfm4 import. This should be the versions as were in cvs now. 2012-02-24 * 81847628 src/papi_events.csv: Fix broken Pentium 4 Prescott support We were missing the netbusrt_p declaration in papi_events.csv 2012-03-01 * 917afc7f src/papi_internal.c: Add some locking in _papi_hwi_shutdown_global_internal This caused a glibc double-free warning, and was caught by the Valgrind helgrind tool in krentel_pthreads There are some other potential locking issues in PAPI_shutdown, especially when debug is enabled. * f85c092f src/papi.c: Fix possible race in _papi_hwi_gather_all_thrspec_data The valgrind helgrind tool noticed this with the thrspecific test 2012-03-09 * 912311ed src/multiplex.c src/papi_internal.c src/papi_libpfm4_events.c...: Fix issue when using more than 31 multiplexed events on perf_event On perf_event we were setting num_mpx_cntrs to 64. This broke, as the MPX_EventSet struct only allocates room for PAPI_MPX_DEF_DEG events, which is 32. This patch makes perf_event use a value of 32 for num_mpx_cntrs, especially as 64 was arbitrarily chosen at some point (the actual value perf_event can support is static, but I'm pretty sure it is higher than 64). Conflicts: src/papi_libpfm4_events.c papi-5.3.0/ChangeLogP414.txt0000600003276200002170000010077112247131117015110 0ustar ralphundrgrad2011-08-29 * src/configure: Rebuild from configure.in with version number bump to 4.1.4 in advance of pending internal vendor release for Cray. 2011-08-26 * release_procedure.txt: Update rel procedure to mention building the man pages before a release. * man/: man1/avail.c.1, man1/clockres.c.1, man1/command_flags_t.1, man1/command_line.c.1, man1/component.c.1, man1/cost.c.1, man1/decode.c.1, man1/error_codes.c.1, man1/event_chooser.c.1, man1/mem_info.c.1, man1/native_avail.c.1, man1/options_t.1, man1/papi_avail.1, man1/papi_clockres.1, man1/papi_command_line.1, man1/papi_component_avail.1, man1/papi_cost.1, man1/papi_decode.1, man1/papi_error_codes.1, man1/papi_event_chooser.1, man1/papi_mem_info.1, man1/papi_multiplex_cost.1, man1/papi_native_avail.1, man3/CDI.3, man3/HighLevelInfo.3, man3/PAPIF.3, man3/PAPIF_accum.3, man3/PAPIF_add_event.3, man3/PAPIF_add_events.3, man3/PAPIF_assign_eventset_component.3, man3/PAPIF_cleanup_eventset.3, man3/PAPIF_create_eventset.3, man3/PAPIF_destroy_eventset.3, man3/PAPIF_get_dmem_info.3, man3/PAPIF_get_exe_info.3, man3/PAPIF_get_hardware_info.3, man3/PAPIF_num_hwctrs.3, man3/PAPI_accum.3, man3/PAPI_accum_counters.3, man3/PAPI_add_event.3, man3/PAPI_add_events.3, man3/PAPI_addr_range_option_t.3, man3/PAPI_address_map_t.3, man3/PAPI_all_thr_spec_t.3, man3/PAPI_assign_eventset_component.3, man3/PAPI_attach.3, man3/PAPI_attach_option_t.3, man3/PAPI_cleanup_eventset.3, man3/PAPI_component_info_t.3, man3/PAPI_cpu_option_t.3, man3/PAPI_create_eventset.3, man3/PAPI_debug_option_t.3, man3/PAPI_descr_error.3, man3/PAPI_destroy_eventset.3, man3/PAPI_detach.3, man3/PAPI_dmem_info_t.3, man3/PAPI_domain_option_t.3, man3/PAPI_enum_event.3, man3/PAPI_event_code_to_name.3, man3/PAPI_event_info_t.3, man3/PAPI_event_name_to_code.3, man3/PAPI_exe_info_t.3, man3/PAPI_flips.3, man3/PAPI_flops.3, man3/PAPI_get_cmp_opt.3, man3/PAPI_get_component_info.3, man3/PAPI_get_dmem_info.3, man3/PAPI_get_event_info.3, man3/PAPI_get_executable_info.3, man3/PAPI_get_hardware_info.3, man3/PAPI_get_multiplex.3, man3/PAPI_get_opt.3, man3/PAPI_get_overflow_event_index.3, man3/PAPI_get_real_cyc.3, man3/PAPI_get_real_nsec.3, man3/PAPI_get_real_usec.3, man3/PAPI_get_shared_lib_info.3, man3/PAPI_get_thr_specific.3, man3/PAPI_get_virt_cyc.3, man3/PAPI_get_virt_nsec.3, man3/PAPI_get_virt_usec.3, man3/PAPI_granularity_option_t.3, man3/PAPI_hw_info_t.3, man3/PAPI_inherit_option_t.3, man3/PAPI_ipc.3, man3/PAPI_is_initialized.3, man3/PAPI_itimer_option_t.3, man3/PAPI_library_init.3, man3/PAPI_list_events.3, man3/PAPI_list_threads.3, man3/PAPI_lock.3, man3/PAPI_mh_cache_info_t.3, man3/PAPI_mh_info_t.3, man3/PAPI_mh_level_t.3, man3/PAPI_mh_tlb_info_t.3, man3/PAPI_mpx_info_t.3, man3/PAPI_multiplex_init.3, man3/PAPI_multiplex_option_t.3, man3/PAPI_num_cmp_hwctrs.3, man3/PAPI_num_components.3, man3/PAPI_num_counters.3, man3/PAPI_num_events.3, man3/PAPI_num_hwctrs.3, man3/PAPI_option_t.3, man3/PAPI_overflow.3, man3/PAPI_perror.3, man3/PAPI_preload_info_t.3, man3/PAPI_profil.3, man3/PAPI_query_event.3, man3/PAPI_read.3, man3/PAPI_read_counters.3, man3/PAPI_read_ts.3, man3/PAPI_register_thread.3, man3/PAPI_remove_event.3, man3/PAPI_remove_events.3, man3/PAPI_reset.3, man3/PAPI_set_cmp_domain.3, man3/PAPI_set_cmp_granularity.3, man3/PAPI_set_debug.3, man3/PAPI_set_domain.3, man3/PAPI_set_granularity.3, man3/PAPI_set_multiplex.3, man3/PAPI_set_opt.3, man3/PAPI_set_thr_specific.3, man3/PAPI_shlib_info_t.3, man3/PAPI_shutdown.3, man3/PAPI_sprofil.3, man3/PAPI_sprofil_t.3, man3/PAPI_start.3, man3/PAPI_start_counters.3, man3/PAPI_state.3, man3/PAPI_stop.3, man3/PAPI_stop_counters.3, man3/PAPI_strerror.3, man3/PAPI_thread_id.3, man3/PAPI_thread_init.3, man3/PAPI_unlock.3, man3/PAPI_unregister_thread.3, man3/PAPI_write.3, man3/high_api.3, man3/low_api.3, man3/papi_data_structures.3, man3/papi_vector_t.3, man3/ret_codes.3: Switch over to doxygen generated man pages. * man/: man1/papi_avail.1, man1/papi_clockres.1, man1/papi_command_line.1, man1/papi_cost.1, man1/papi_decode.1, man1/papi_event_chooser.1, man1/papi_mem_info.1, man1/papi_native_avail.1, man3/PAPI.3, man3/PAPIF.3, man3/PAPIF_get_clockrate.3, man3/PAPIF_get_domain.3, man3/PAPIF_get_exe_info.3, man3/PAPIF_get_granularity.3, man3/PAPIF_get_preload.3, man3/PAPIF_set_event_domain.3, man3/PAPI_accum.3, man3/PAPI_accum_counters.3, man3/PAPI_add_event.3, man3/PAPI_add_events.3, man3/PAPI_assign_eventset_component.3, man3/PAPI_attach.3, man3/PAPI_cleanup_eventset.3, man3/PAPI_create_eventset.3, man3/PAPI_destroy_eventset.3, man3/PAPI_detach.3, man3/PAPI_encode_events.3, man3/PAPI_enum_event.3, man3/PAPI_event_code_to_name.3, man3/PAPI_event_name_to_code.3, man3/PAPI_flips.3, man3/PAPI_flops.3, man3/PAPI_get_cmp_opt.3, man3/PAPI_get_component_info.3, man3/PAPI_get_dmem_info.3, man3/PAPI_get_event_info.3, man3/PAPI_get_executable_info.3, man3/PAPI_get_hardware_info.3, man3/PAPI_get_multiplex.3, man3/PAPI_get_opt.3, man3/PAPI_get_overflow_event_index.3, man3/PAPI_get_real_cyc.3, man3/PAPI_get_real_usec.3, man3/PAPI_get_shared_lib_info.3, man3/PAPI_get_substrate_info.3, man3/PAPI_get_thr_specific.3, man3/PAPI_get_virt_cyc.3, man3/PAPI_get_virt_usec.3, man3/PAPI_help.3, man3/PAPI_ipc.3, man3/PAPI_is_initialized.3, man3/PAPI_library_init.3, man3/PAPI_list_events.3, man3/PAPI_list_threads.3, man3/PAPI_lock.3, man3/PAPI_multiplex_init.3, man3/PAPI_native.3, man3/PAPI_num_cmp_hwctrs.3, man3/PAPI_num_components.3, man3/PAPI_num_counters.3, man3/PAPI_num_events.3, man3/PAPI_num_hwctrs.3, man3/PAPI_overflow.3, man3/PAPI_perror.3, man3/PAPI_presets.3, man3/PAPI_profil.3, man3/PAPI_query_event.3, man3/PAPI_read.3, man3/PAPI_read_counters.3, man3/PAPI_register_thread.3, man3/PAPI_remove_event.3, man3/PAPI_remove_events.3, man3/PAPI_reset.3, man3/PAPI_set_cmp_domain.3, man3/PAPI_set_cmp_granularity.3, man3/PAPI_set_debug.3, man3/PAPI_set_domain.3, man3/PAPI_set_event_info.3, man3/PAPI_set_granularity.3, man3/PAPI_set_multiplex.3, man3/PAPI_set_opt.3, man3/PAPI_set_thr_specific.3, man3/PAPI_shutdown.3, man3/PAPI_sprofil.3, man3/PAPI_start.3, man3/PAPI_start_counters.3, man3/PAPI_state.3, man3/PAPI_stop.3, man3/PAPI_stop_counters.3, man3/PAPI_strerror.3, man3/PAPI_thread_id.3, man3/PAPI_thread_init.3, man3/PAPI_unlock.3, man3/PAPI_unregister_thread.3, man3/PAPI_write.3: Remove the old manpages in preperation for defaulting to doxygen generated ones. 2011-08-25 * src/: perf_events.c, ctests/overflow_allcounters.c, ctests/papi_test.h, ctests/test_utils.c: Block all PERF_COUNT_SW events from overflow_allcounters test, as overflow on software counter can crash perf_event kernels pre 3.1 * src/libpfm4/: Makefile, config.mk, lib/Makefile, lib/pfmlib_common.c, lib/pfmlib_perf_event.c, lib/pfmlib_priv.h, perf_examples/perf_util.c, perf_examples/task_smpl.c: Fix the "conflicts" from the import * papi.spec, doc/Doxyfile, doc/Doxyfile-everything, src/Makefile.in, src/configure.in, src/papi.h: Bump version number to 4.1.4 in advance of pending internal vendor release for Cray. 2011-08-23 * src/: papi.c, papi_hl.c: Removed all references to Fortran APIs. These are now all in papi_fwrappers.c Also normalized syntax for many doxygen headers. * src/papi_fwrappers.c: Added doxygen skeleton for all remaining Fortran functions in this file. Also added wrappers for four additional APIs: PAPI_get_real_nsec PAPI_read_ts PAPI_lock PAPI_unlock 2011-08-19 * src/: papi.c, papi_fwrappers.c: Stubbed out doxygen pages for Fortran functions. About half way done! * src/papi_libpfm4_events.c: Finish up the documentation/cleanup pass through the libpfm4 code. 2011-08-18 * src/papi_libpfm3_events.c: Fix code so we no longer get warnings that 'setup_preset_term' and '_pfm_get_counter_info' are defined but not used * src/: papi_libpfm3_events.c, papi_libpfm4_events.c, papi_libpfm_events.h, perf_events.c, perfctr-x86.c: Consolidate use of _papi_libpfm_init() and pass in MY_VECTOR when necessary. * src/papi_libpfm4_events.c: Dynamically allocate the libpfm4 native events, rather than having a fixed array allocated at init time. * src/papi_libpfm4_events.c: Some more minor cleanups and documentation in the libpfm4 code. * src/components/coretemp/linux-coretemp.c: Fixup for linux coretemp component, it pays to check cvs status once in a while... 2011-08-16 * src/papi.c: Update the PAPI_enum_event() Doxygen comments to reflect modern values for the "modifier" parameter. * src/papi_libpfm4_events.c: Clean up code and add documentation for all the functions involved in libpfm4's _papi_libpfm_ntv_enum_events() function. 2011-08-15 * src/mb.h: Updat the rmb() barrier for ARM. * src/papi_events.csv: Update SandyBridge EP support to match that of mainline libpfm4 * src/papi_libpfm4_events.c: Cleanup libpfm4 code, and add more comments to code. * src/perf_events.c: Fix bug where umask support was disabled. * src/Rules.perfctr-pfm: Make the perfctr code use the merged preset event code. * src/: Rules.pfm_pe, papi_libpfm3_events.c, papi_libpfm_presets.c: Have libpfm3 use the merged preset code. * src/: Rules.pfm4_pe, papi_libpfm4_events.c, papi_libpfm_presets.c: Move the libpfm presets code to its own file, and modify the libpfm4 code to use it. * src/papi_libpfm3_events.c: Make the libpfm3 predefined events parser identical to the libpfm4 one, in preparation for a merge. * src/: papi_libpfm3_events.c, papi_libpfm4_events.c, papi_libpfm_events.h, perf_events.c: Move vendor fixups into the substrate and out of the naming library code. * src/: Rules.perfctr-pfm, Rules.pfm4_pe, Rules.pfm_pe, papi_libpfm3_events.c, papi_libpfm4_events.c, papi_libpfm_events.h, papi_pfm4_events.c, papi_pfm_events.c, papi_pfm_events.h, perf_events.c, perfctr-x86.c, perfmon.c: Rename papi_pfm_events.c to papi_libpfm3_events.c to make it more clear what is in the file. Also rename papi_pfm4_events.c to papi_libpfm4_events.c and papi_pfm_events.h to papi_libpfm_events.h * src/perfmon.c: Fixup perfmon2 case for the libpfm renaming * src/perfctr-x86.c: Fix perfctr breakage from the libpfm rename. * src/: papi_pfm4_events.c, papi_pfm_events.c, papi_pfm_events.h, perf_events.c, perfctr-x86.c, perfmon-ia64.c, perfmon.c: The PAPI code uses _pfm_ in function names to mean *both* perfmon2 code and libpfm3/4 code. This can cause a lot of confusion. Rename libpfm specific function names to use _libpfm_ instead. * src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: Fix build error on perfmon2 due to movement of the _papi_pfm_shutdown() 2011-08-05 * src/: Makefile.in, Makefile.inc, configure, configure.in, components/Makefile_comp_tests, components/cuda/tests/HelloWorld.cu, components/cuda/tests/Makefile, components/example/tests/HelloWorld.c, components/example/tests/Makefile, components/README: Added generic implementation that makes it possible to add tests to components without modifying any PAPI-specific code (other than adding the tests and a makefile to the component directory). All component tests will be compiled together with PAPI when typing 'make' (as well as cleaned up when 'make clean' or 'make clobber' is typed). +++ Also added tests to 2 components, the example and cuda component. * src/: papi_defines.h, papi_internal.h, papi_pfm4_events.c, perf_events.c: Add locking to papi_pfm4_events so that adding/looking up event names doesn't have a race condition when multiple threads are doing it at once. Also fix the recently-added pfm_shutdown() to be called at substrate_shutdown() rather than plain shutdown() as the latter is called at thread_shutdown() time too. * src/: papi_pfm4_events.c, papi_pfm_events.c, papi_pfm_events.h, perf_events.c: Add a _papi_pfm_shutdown() function and have it clear out the native events array at PAPI_shutdown(). This makes sample code that exhibits the libpfm4 event race much easier to write. * src/ctests/multiplex2.c: Added some PAPI_set_domain's inside of #if 0's for testing. 2011-08-03 * src/papi_pfm4_events.c: Use the new ARM vendor code to force the proper default domain on ARM cpus. * src/: linux-common.c, papi.h: Add an ARM vendor string and have it properly set. The hardware detection logic is a horrible mess of parsing /proc/cpuinfo I took the easy way out and just tacked the ARM logic on the end rather than trying to clean it up at all. * src/perf_events.c: Clean up some comments, add a few debug messages. 2011-08-02 * src/linux-memory.c: The ARM warning for memory hierarchy not being implemented was in the wrong place. * src/: papi_pfm4_events.c, sys_perf_event_open.c: Fix some misleading debug messages. * src/papi_events.csv: Update ARM Cortex A9 preset events, and add ARM Cortex A8 events 2011-07-28 * src/: cycle.h, linux-context.h, linux-lock.h, linux-memory.c, linux-timer.c, mb.h: Add remaining changes needed for ARM compilation. This is enough for "papi_avail" and "papi_native_avail" to work. Lots of #warning statements scattered around. ARM is a complicated architecture and things like memory barriers and mutexes are very dependent on what version of the architecture they are running on. It will take a while to figure out the proper way to handle this in PAPI. Also, on Cortex-A8 and Cortex-A9 there is no way to separate kernel events from the user ones. So all measurements contain both. This will probably confuse our ctests. * src/papi_events.csv: Add ARM Cortex A9 preset events to the CSV file. * src/sys_perf_event_open.c: Add the perf_event syscall number for ARM * src/papi_fwrappers.c: Create PAPIF group in doxygen, for the papi fortran interface. 2011-07-27 * src/x86_cache_info.c: My changes yesterday broke on the --with-debug case, as noticed by buildbot. 2011-07-26 * src/: papi.c, papi_fwrappers.c: Implement doxygen comments for PAPI_get_opt; Implement doxygen comments for PAPIF_accum in papi_fwrappers.c. This is a first step in providing separate independent Fortran documentation. * doc/Doxyfile: Have doxygen parse papi_fwrappers.c for comments. * src/papi_pfm4_events.c: The last checkin broke papi_native_avail on libpfm4. Fix it. * src/papi_pfm4_events.c: Cleanup some code in papi_pfm4_events.c to avoid gcc-4.6 warnings * src/x86_cache_info.c: Fix some warnings in src/x86_cache_info.c reported by gcc-4.6 2011-07-21 * src/ctests/all_native_events.c: Change all_native_events test to create an eventset for each native event it finds. Also becomes a good test of the number of outstanding eventsets allowed. 2011-07-19 * src/papi.c: Doxygen rewrite for PAPI_set_opt. 2011-07-13 * src/: papi_events.csv, libpfm4/lib/events/intel_snb_events.h: A few more commits that get SandyBridge mostly working. * src/papi.h: Include a comment to the prototype for PAPI_read_ts. This is apparently a requirement to get doxygen to link from the prototype to the doc block for the function (a link shows up in the low_api group now). 2011-07-12 * src/libpfm4/lib/events/intel_snb_events.h: Temporarily add missing SandyBridge FP events until support gets merged upstream. * src/papi.c: Some minor Doxygen fixes. This was my run through the HTML output produced by my assigned functions. 2011-07-11 * src/libpfm4/lib/pfmlib_intel_snb.c: Temporarily add model 45 Sandy Bridge to our copy of libpfm4 until we can get this merged upstream. * src/ctests/: multiattach.c, multiattach2.c, reset.c, val_omp.c, zero_attach.c, zero_fork.c, zero_omp.c, zero_pthreads.c, zero_smp.c: Fix all the remaining users of the ctests add_two_events() helper * src/ctests/first.c: Fix first test bug due to add_two_events() change. Clean up validation of results. * src/ctests/zero.c: Some cleanups I made to the testing routine add_two_events() a while ago broke the zero test. (the cycles result was swapped with the other counter result). This fixes this, plus adds a validation check to try to avoid this happening in the future. * src/: configure, configure.in: Patch from William Cohen that sets LD_LIBRARY_PATH and LIBPATH to include libpfm4/lib. A better fix would probably be to include only the libpfm library we are currently configured for. I need to do more testing of the --with-static-lib=no --with-shared-lib=yes --with-shlib options * src/papi_hl.c: High level interface Doxygen comments updated to include interface overview 2011-07-08 * doc/Doxyfile, src/papi.h, src/papi_hl.c, src/papi_vector.h: Add in the PAPI component development page. Currently not linked to by anything yet, but can be found at file://$(html_dir)/CDI or http://web.eecs.utk.edu/~ralph/html/CDI for an already built page. 2011-07-07 * src/: papi.c, papi.h: Add doxygen comments for PAPI_get_executable_info(), PAPI_exe_info_t and PAPI_address_map_t * src/papi.c: Add doxygen comments for PAPI_event_code_to_name() and PAPI_event_name_to_code() * src/papi.c: Add doxygen comments for PAPI_enum_event() * src/papi.c: Add doxygen comments for PAPI_create_eventset() * src/papi.c: Add doxygen comments for PAPI_cleanup_eventset() and PAPI_destroy_eventset() * src/papi.c: Add doxygen comments for PAPI_attach() and PAPI_detach() * src/papi.c: Add doxygen comments for PAPI_assign_eventset_component() 2011-07-05 * src/components/cuda/linux-cuda.c: missing parentheses added in CUDA_Shutdown() which caused a seg fault. 2011-07-01 * src/papi.c: Add doxygen comments for PAPI_add_event() * src/papi.c: Add doxygen comments for PAPI_add_events() +++ Updated PAPI_accum() * src/papi.c: Add doxygen comments for PAPI_accum() * src/ctests/: data_range.c, earprofile.c: Some more ia64 ctests fixes * src/papi.c: Add doxygen comments for PAPI_register_thread() * src/papi.c: Add doxygen comments for: PAPI_read() PAPI_read_ts() * src/ctests/earprofile.c: Another attempt at fixing earprofile on ia64. * src/ctests/earprofile.c: PAPI for ia64 compiles now, and now it's some of the ia64-specific ctests that are broken. There was a missing #include "papi.h" in earprofile 2011-06-30 * src/papi.c: Doxygen for: PAPI_set_multiplex PAPI_shutdown PAPI_sprofil_t PAPI_start (int EventSet) PAPI_state (int EventSet, int *status) PAPI_stop (int EventSet, long long *values) PAPI_strerror (int) * src/: linux-timer.c, perfmon-ia64-pfm.h, perfmon-ia64.c: more ia64 fixes * src/papi.c: doxygen comments for: PAPI_query_event() * src/: linux-timer.c, linux-timer.h, papi_vector.c, papi_vector.h: Some more ia64 fixes. * src/papi.c: add doxygen comments for PAPI_profil() * src/: linux-timer.c, linux-timer.h, perfmon-ia64.c: More ia64 fixes. Getting closer. * src/: linux-context.h, perfmon-ia64.c, perfmon-ia64.h: One more try at fixing ia64. The trick to cross compiling is ./configure --with-CPU=itanium2 --with-arch=ia64 --with-perfmon=2.0 --with-tls=no make __ia64__=1 and you still have to fiddle with some __ia64__ ifdefs scattered in the code 2011-06-29 * src/papi.c: Add doxygen comments for: * PAPI_num_events() * PAPI_overflow() * PAPI_perror() * src/papi.c: Doxygen for PAPI_set_domain and PAPI _set_granularity. Unfortunately, this seems to have raised more issues about Fortran support... * src/papi.c: Add doxygen comments to * PAPI_list_threads() * PAPI_lock() * PAPI_multiplex_init() * PAPI_num_hwctrs() * PAPI_num_cmp_hwctrs() * src/papi.c: Doxygen for PAPI_set_debug and minor tweaks to other function documentation. 2011-06-28 * src/: linux-common.h, linux-timer.c, papi_pfm_events.c, perfmon-ia64-pfm.h: some more itanium fixes. This won't be enough to fix things but it is a start. * src/papi.c: Check in Kiran's doxygen work. This time hopefully not clobbering anyone. * src/: linux-context.h, linux-timer.c, perfmon-ia64.h: Attempt to fix the build for itanium systems. * src/papi.c: Fix comments embedded in doygen source to be C++ single line format. 2011-06-27 * src/papi.c: Commit documentation changes for PAPI_reset, PAPI_set_thr_specific, and PAPI_get_thr_specific. The last one wasn't on my list, but it mirrored _set_ so I did it anyway. * src/papi.c: [no log message] * src/papi.c: Commit Kiren's updates to the code documentation. 2011-06-24 * doc/Doxyfile: One got left behind... ( see previous commit about redoing doxygen procedures ) * src/Makefile.inc, src/configure, src/configure.in, doc/Doxyfile.html, doc/Doxyfile.utils, doc/Doxyfile.utils-everything, doc/Makefile, doc/doxygen_procedure.txt: Update install process for man-pages, install from pre-built pages living in $(PAPI_DIR)/man and update $(PAPI_DIR)/doc to generate doxygen pages and copy them to $(PAPI_DIR)/man. This removes doxygen from the install process. And when removes the web of doxygen configurationf files, going back to just two, lite and kitchen-sink. * src/papi.c: Updates to doxygen stuff for PAPI_remove_event{s} * src/: linux-bgp.c, perfmon-ia64.c, perfmon.c, solaris-niagara2.c, solaris-ultra.c: When I made the multiattach change I forgot to update _papi_hwi_lookup_thread calls on all architectures. This should get the ones I missed. 2011-06-23 * src/papi_pfm4_events.c: For libpfm4 we were setting available counters to the number of generic counters. This was less than libpfm3, so update the code to set the number of counters to be equal to generic+fixed. In theory whether an event can be added is determined at add time, so the extra check for number of counters is unnecessarily getting in the way. This should be fixed but might require a re-write of some PAPI internals. 2011-06-22 * src/ctests/test_utils.c: One more fix to the byte_profile code * src/ctests/byte_profile.c: Fix byte_profile ctest, as it was breaking on libpfm4. * src/: extras.c, papi.c, perf_events.c, threads.c, threads.h, ctests/multiattach.c, ctests/multiattach2.c: Add support for handling multiattach properly. This adds a pid argument to the _papi_hwi_lookup_or_create_thread() call. A pid of "0" falls back to the old behavior of using the current tid/pid. If attaching to an outside pid/tid, a new thread object is created to handle this. This seems like the right thing to do, though there's enough complicated code in the threads code that I haven't fully audited that this can't fail somehow in complicated cases where lots of attaching/detaching is done in conjunction with having a large multi-threaded program. 2011-06-13 * src/papi_pfm4_events.c: Fix the libpfm4 enumerate code. It was possible for papi_native_avail to get stuck in an infinite loop if two events had the same name on different PMUs and the "default" PMU happened later in the enumeration. This was the case on SandyBridge at least. This should be fixed now. * src/ctests/test_utils.c: Make "test_fail()" actually fail. In the comments we say we don't exit to avoid leaking memory in threads. That seems suspect. The threads should exit properly too. If they don't, then we should fix the threading code and not make our tests never exit on fail (which can make debugging a pain). 2011-06-10 * src/: papi.c, papi_hl.c: Add example code to the high level interface docs * src/papi_events.csv: Add initial Sandy Bridge event support. This is in no way nested, so be cautious if using. Sandy Bridge support is libpfm4 only, so you'll have to configure with --with-libpfm4 * src/papi_hl.c: Added an example of how to embed example code in PAPI_stop_counters documentation. 2011-06-09 * src/Makefile.inc: Makefile fix for fortran wrapper files on case-insensitive filesystems. During build, it renames the preprocessed file PAPI_FWRAPPERS.c to upper_PAPI_FWRAPPERS.c 2011-06-08 * src/: configure, Makefile.inc, configure.in: Have configure check that doxygen is installed, and have make install only attempt to build the doxygen docs if we found doxygen. 2011-06-07 * src/: run_tests_exclude_cuda.txt, components/cuda/linux-cuda.c: ctests/thrspecific works now too with the CUDA component * src/components/cuda/linux-cuda.c: clean up and indent * src/components/cuda/: linux-cuda.c, linux-cuda.h: Added CudaRemoveEvent functionality (was broken in earlier CUDA RC versions). ctests/all_native_events works now (at least for the default CUDA device). +++ Minor exit/return mods in CUDA component * doc/Doxyfile, doc/Doxyfile.html, doc/Doxyfile.utils, doc/Doxyfile.utils-everything, doc/Makefile, src/Makefile.inc, src/papi.c, src/papi.h, src/papi_hl.c: Rework doxygen to better generate manpages from code comments. 2011-06-03 * release_procedure.txt: Incorporate a note about using 2.59 autoconf to build configure. 2011-06-02 * src/utils/error_codes.c: Tweak the doxygen title text. 2011-06-01 * src/: configure, configure.in: Modified configure.in to look for a 2.59 autoconf prerequisite. Rebuilt configure with 2.59. We'll try this out on buildbot. 2011-05-31 * src/: run_tests_exclude_cuda.txt, components/cuda/linux-cuda.c, components/cuda/linux-cuda.h: 2 things: (1) Bug in CUDA v4.0 fixed. It caused a threaded application to hang when parent called cuInit() before fork() and child called also cuInit(). All fork ctests pass now if papi is configured with cuda component. (2) If running a threaded application, we need to make sure that a thread doesn't free the same memory location(s) more than once. Now all pthread ctests pass, too (again, if papi is configured with cuda component). 2011-05-27 * src/perf_events.c: It turns out our FORMAT_ID workaround detection code was identical to FORMAT_GROUP (and not really necessary) so merge the two. 2011-05-26 * src/papi_pfm_events.h: One last try at the cray compile fix, this time using a suggestion from Steve Kaufmann. * src/perf_events.c: Update some comments on the workarounds. I've been writing some validation tests for our various workarounds. It turns out the "no multiplexing before 2.6.33" problem is actually an artifact of the check_schedulability bug on x86 (and its interaction with our event partitioning code) rather than a distinct kernel bug. * src/Rules.pfm4_pe: Now fix libpfm4. I think they should all be fixed now. Too many permutations. * src/: Rules.pfm_pe, papi_pfm_events.h: One last try at fixing the perfmon2 build. * src/papi_pfm_events.h: Fix the perfmon2 build that broke with the libpfm4 merge. The previous fix only fixed perfctr, not perfmon2 This should fix the build for cray machines. 2011-05-24 * src/utils/component.c: Add doxygen comments to components.c * src/papi_events.csv: Fix the PAPI_TOT_INS instruction for Atom, as well as update the floating point events. * src/perf_events.c: We were using some of the perf_event functionality in an susupported way and this broke recently when the perf_event interface was made more strict. You can't use the PERF_EVENT_IOC_REFRESH ioctl on a group leader to start all sampling siblings... use PERF_EVENT_IOC_ENABLE Don't pass NULL or 0 as the argument to the PERF_EVENT_IOC_REFRESH ioctl. These fixes seem to work and fix the Nehalem regressions. The above changes were made to PAPI back in November to fix the I/O possible error, so we should check to be sure that this doesn't reintroduce the problem. We should also probably back-port this fix to 4.1.2 and 4.2 stable 2011-05-23 * src/: configure, configure.in, papi.c, papi.h, papi_data.h, utils/Makefile, utils/error_codes.c: New utility to display PAPI error codes and description strings. There was no API to access error descriptions, so I created PAPI_descr_error( int error_code ) too. I also updated the error table to provide strings for all defined codes. * src/aix.c: Define aix's .cmp_info.itimer_ns value to a default. The multiplexing tests are happy on power7 aix now. * src/: sys_perf_event_open.c, ctests/overflow.c: cleanup some debug messages * src/ctests/: overflow.c, test_utils.c: The overflow test depends on the exact ordering of the flags in the add_test_event() code. So my previous changes broke the test. This commit fixes the test case again. * src/ctests/: byte_profile.c, prof_utils.c, prof_utils.h, profile.c, profile_twoevents.c, sprofile.c: ctests: remove the "hw_info" field from the profile setup functions, as the field isn't used. * src/: configure, configure.in, utils/Makefile, utils/component.c: Introduce a component avail utility, lists the components we were built with, optionally with native/preset counts and version number. * src/components/example/example.c: Add number of 'native' events to the component info structure in example component. * src/ctests/: byte_profile.c, papi_test.h, prof_utils.c, prof_utils.h, profile.c, profile_twoevents.c, sprofile.c, test_utils.c, zero_smp.c: Clean up the ctest profile event section code some more. This fixes a build error on AIX that I introuced on Friday. * src/papi_events.csv: Initial PAPI Fam14h Bobcat support. Only works with libpfm4 version of PAPI. Passes most of the tests, but still need to verify as there are a number of subtle differences in the native events. 2011-05-20 * src/ctests/: byte_profile.c, mendes-alt.c, papi_test.h, prof_utils.c, test_utils.c: Fix byte_profile to work on Nehalem. Still needs some more work to print the result properly. * src/ctests/: attach2.c, attach3.c, branches.c, byte_profile.c, case1.c, case2.c, first.c, multiattach.c, multiattach2.c, overflow.c, overflow3_pthreads.c, overflow_index.c, overflow_one_and_read.c, overflow_pthreads.c, papi_test.h, prof_utils.c, profile_pthreads.c, reset.c, sdsc.c, sprofile.c, tenth.c, test_utils.c, zero.c, zero_attach.c, zero_fork.c, zero_pthreads.c: Some cleanups to the ctests/test_utils.c code + Remove the hw_info field from the add_two_events() and add_two_nonderived_events() functions, as it wasn't used. + Make the add_test_events() function loop through all the masks, insteading having a hardcoded test for each possible mask * src/ctests/test_utils.c: buildbot didn't like the colored test messages (despite the code having fancy checks for "isatty()"). So change the color thing to require an environment variable to be set, TESTS_COLOR=y 2011-05-19 * src/ctests/test_utils.c: Add color to the testsuite results if we are running at a console. This makes is much easier to see FAILED results. I can back this out if people don't like it, but it's made my life a lot easier when running all the tests involved with the libpfm4 merge. * src/: papi_pfm_events.c, papi_pfm_events.h: Fix the build with perfctr introduced by libpfm4 changes. * src/configure.in: Documentation for the AIX heap fix. * src/: papi_pfm4_events.c, ctests/test_utils.c: power6 doesn't work with libpfm4, as it reports num_cntrs=0 have PAPI print a better error in this case until we get a fix upstream. * src/: configure, configure.in: On aix one has to ask really nicely for a usable ammount of heap space. The omp tests should run now. * src/: configure, configure.in, perf_events.c, sys_perf_event_open.c: This is the last commit needed to get libpfm4 support going. To build with libpfm4 support enabled, run configure like this: ./configure --with-libpfm4 * src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: Pass the actual perf_attr structure around, rather than just a 64-bit event value. This allows support for generalized events and eventual offcore/uncore support. * src/: papi_pfm_events.c, perf_events.c, perf_events.h: Clean up some debugging #ifdefs * src/papi_events.csv: The papi_events.csv file requires some additions for libpfm4 to work + The CPU family names have changed from libpfm3 to libpfm4 It should be backward compatible to just add the libpfm4 ones in addition to the libpfm3 ones + libpfm4 does not provide a helper to get the instruction and cycle event names. So we have to add them for all supported CPUs * src/: Rules.pfm4_pe, papi_pfm4_events.c: New files needed for libpfm4 support 2011-05-16 * release_procedure.txt: Add note to update from cvs before tagging. Thanks, Will Cohen :) papi-5.3.0/ChangeLogP400.txt0000600003276200002170000003207412247131117015103 0ustar ralphundrgrad2010-01-14 terpstra * src/perf_events.c 1.18: More tweaks from Corey for event rescheduling problem. Also a syntax fix for POWER platforms. 2010-01-13 sbk * src/configure.in 1.166: Enable the static perf events static table to be created and compiled in again for Cray XT CLE. 2010-01-13 terpstra * release_procedure.txt 1.13: Bump the date. * src/papi_internal.c 1.138: Fix for rescheduling events after a failed add. This addresses the NULL pointer issue found in overflow_allcounters on i7. Thanks to Corey Ashford of IBM for the fix. * papi.spec 1.4: Final changes from Will Cohen. * src/libpfm-3.y/config.mk 1.3: * src/libpfm-3.y/examples_v2.x/self_smpl_multi.c 1.3: * src/libpfm-3.y/examples_v2.x/syst.c 1.3: * src/libpfm-3.y/examples_v2.x/syst_multi_np.c 1.3: * src/libpfm-3.y/examples_v2.x/syst_np.c 1.3: * src/libpfm-3.y/lib/pfmlib_coreduo.c 1.3: * src/libpfm-3.y/lib/power7_events.h 1.3: *** empty log message *** * src/utils/event_info.c 1.11: Change test for version number to differentiate between PAPI-C and Classic PAPI. We were testing for versions >=3 && >= .9. This was missing versions >= 4. * src/libpfm-3.y/include/perfmon/pfmlib_gen_mips64.h 1.1.1.4: * src/libpfm-3.y/lib/intel_atom_events.h 1.1.1.5: * src/libpfm-3.y/lib/intel_corei7_events.h 1.1.1.5: * src/libpfm-3.y/lib/pfmlib_gen_mips64.c 1.1.1.6: * src/libpfm-3.y/lib/pfmlib_intel_nhm.c 1.1.1.9: Importing latest libpfm * src/Makefile.in 1.44: * src/papi.h 1.193: Update version numbers for impending release of PAPI 4.0.0. 2010-01-13 jagode * src/Makefile.inc 1.152: Avoid printing conditional 'if' statements while compiling (but they are performed). * src/perf_events.h 1.7: Seg fault on i7 with perf_events. This was fixed a while ago for perfmon and perfctr but perf-events was left behind. 2010-01-13 bsheely * src/configure 1.165: Generated configure to correspond to ost recent change (Cray XT) to configure.in 2010-01-12 terpstra * src/linux-bgp.c 1.3: Restore prior native naming convention: PNE_BGP_... Needed to avoid conflict with system level naming conventions. * src/ctests/bgp/Makefile 1.3: Modifications to build test files for BGP. * INSTALL.txt 1.42: Update description for BGP. 2010-01-08 terpstra * src/Rules.perfctr-pfm 1.47: * src/Rules.pfm_pe 1.10: Eliminate duplicate definitions of environment variable for the compile line. These are now defined in configure. * src/ctests/test_utils.c 1.77: Minor tweak to print native event codes in hex instead of decimal -- far more useful that way. * src/perfctr-p4.c 1.106: Minor tweak to get this file to compile with DEBUG turned on. 2010-01-07 sbk * src/Rules.pfm 1.50: The libpfm flag CONFIG_PFMLIB_OLD_PFMV2 was correctly set for when compiling and building libpfm, but it also needs to be set for installing also. The header file libpfm-3.y/include/perfmon/perfmon.h uses this flag to determine if a macro is prepended to perfmon.h when installing it. 2010-01-07 jagode * src/linux-acpi.c 1.16: * src/linux-mx.c 1.15: * src/linux-net.c 1.4: Renamed identifier 'native_name' for net, mx, and acpi components because of conflicts on POWER machines. This variable has also been defined in powerX_events.h. 2010-01-07 bsheely * src/Rules.perfctr 1.57: Added DEBUGFLAGS to OPTFLAGS since only OPTFLAGS gets used in Makefile.inc 2010-01-05 terpstra * src/multiplex.c 1.76: Modified license language for John May's LLNL portion of this code to conform with BSD as provided by LLNL. Thanks, Bronis, for bird-dogging this. 2009-12-20 terpstra * src/solaris-niagara2.c 1.4: Changes to fix overflow/profile issues in niagara2. Thanks to Fabian Gorsler. 2009-12-18 terpstra * src/ctests/bgp/papi_1.c 1.2: * src/ctests/native.c 1.61: * src/ctests/papi_test.h 1.37: * src/extras.c 1.159: * src/linux-bgp-memory.c 1.2: * src/linux-bgp-native-events.h 1.2: * src/linux-bgp-preset-events.c 1.2: * src/linux-bgp.h 1.2: * src/papi.c 1.337: * src/papiStdEventDefs.h 1.38: * src/papi_data.c 1.35: * src/papi_internal.h 1.181: * src/papi_preset.h 1.17: * src/papi_protos.h 1.69: * src/papi_vector.c 1.22: Committing changes for BG/P. Utilities and basic counting works; Not fully tested. 2009-12-16 terpstra * LICENSE.txt 1.6: Minor tweaks on the header of the license text. * src/solaris-niagara2-memory.c 1.3: * src/solaris-niagara2.h 1.3: Commit initial changes for Niagara2 support for PAPI-C. Thanks to Fabian Gorsler. Basic counting works; some unresolved issues remain for overflow and profile. 2009-12-11 terpstra * src/papi_events.csv 1.3: Add a synonym for Pentium M. 2009-12-08 bsheely * src/linux.c 1.69: Fixed memory issue seen in testing on certain platforms 2009-12-05 terpstra * ChangeLogP372.txt 1.1: file ChangeLogP372.txt was initially added on branch papi-3-7-0. 2009-12-02 terpstra * src/sys_perf_counter_open.c 1.10: * src/syscalls.h 1.4: Slightly cleaner syntax for redefinition of perf_event_attr in KERNEL31. 2009-12-01 terpstra * src/ctests/sdsc4.c 1.14: Fix from Will Cohen to avoid round-off errors in computing small differences between large numbers, which occasionally resulted in sqrt of negative numbers. Originally applied to sdsc2; modified and applied to sdsc2. 2009-11-30 terpstra * src/x86_cache_info.c 1.7: Strip the Windows version of cpuid out to make this version compatible with the 3.7.x branch. * src/ctests/sdsc2.c 1.13: Fix from Will Cohen to avoid round-off errors in computing small differences between large numbers, which occasionally resulted in sqrt of negative numbers. Thanks Will 2009-11-25 terpstra * src/papi_hl.c 1.77: PAPI_stop_counters was returning PAPI_OK even if PAPI_stop returned something other than PAPI_OK. Uncovered as part of the BG/P merge. 2009-11-25 bsheely * src/hwinfo_linux.c 1.2: added test for topology/thread_siblings and topology/ core_siblings 2009-11-24 terpstra * src/papi_vector.h 1.10: Fix a bug in assigning signals for overflow. Also expose a vector_find_dummy routine to allow testing for component functions. If the function pointer is a dummy, it isn't implemented in the component. This is used in extras to test for the implementation of a name_to_code routine. 2009-11-24 bsheely * src/ctests/hwinfo.c 1.7: Removed invalid code (zero can be a valid value for nnodes) 2009-11-23 bsheely * src/solaris-ultra.c 1.125: resolved compile error * src/run_tests.sh 1.37: * src/run_valgrind_tests.sh 1.2: valgrind code merged into run_tests.sh and commented out by default 2009-11-20 bsheely * src/genpapifdef.c 1.41: * src/papi_events.xml 1.3: * src/papi_fwrappers.c 1.81: Applied patch from Steve Kaufmann at Cray. Removes the remaining Unicos, Catamount, T3E, X1 and X2 references. Only explicit support for XT4+/CLE remains. 2009-11-18 mucci * src/any-null.c 1.52: * src/linux-bgl.c 1.9: * src/perfmon.c 1.97: * src/windows.c 1.4: Renamed shutdown_global to shutdown_substrate to make it more obvious that this is per substrate. This callback will be important for freeing some memory up and making sure locks are reset. Looks like a big patch, but only a few lines. * src/config.h.in 1.9: Add support for detecting gettid and syscall(gettid) which results in HAVE_GETTID and HAVE_SYSCALL_GETTID being defined in config.h This will be useful for Linux where we can remove all the special casing for threads and locking and the errors with getpid. gettid all the time. * src/papi_lock.h 1.1: Beginnings of a single function with all PAPI/Linux locking functions. Note to PAPI-C developers. The multiple context concept of PAPI-C has failed to include the lock data structure. PAPI currently only has one scope of locks that span the high-level to the low-level. This will need to be revisited and the locks split into high-level and per-context locks. 2009-11-13 terpstra * ChangeLogP371.txt 1.1: file ChangeLogP371.txt was initially added on branch papi-3-7-0. 2009-11-12 bsheely * src/papi_events_table.sh 1.1: * src/papi_pfm_events.c 1.35: * src/papi_pfm_events.h 1.4: * src/perfmon_events.csv 1.57: * src/perfmon_events_table.sh 1.6: * src/pmapi-ppc64_events.c 1.8: * src/ppc64_events.h 1.10: renamed perfmon_events.csv perfmon_events_table.h perfmon_events_table.sh to papi_events.csv papi_events_table.h papi_events_table.sh and made code changes required by the renaming 2009-11-11 terpstra * src/ctests/first.c 1.49: Fix overly restrictive verification of results. In verifying that FP_INS/FP_OPS/TOT_INS was non-zero, we were requiring it to be near theoretical FP_OPS which caused false verification failures in some edge cases. Now we just require count >= iterations. 2009-11-11 bsheely * src/ctests/inherit.c 1.13: * src/ctests/multiplex1_pthreads.c 1.49: * src/ctests/overflow.c 1.66: * src/ctests/overflow2.c 1.25: * src/ctests/overflow3_pthreads.c 1.21: * src/ctests/overflow_allcounters.c 1.5: * src/ctests/overflow_force_software.c 1.24: * src/ctests/overflow_index.c 1.9: * src/ctests/overflow_one_and_read.c 1.5: * src/ctests/overflow_single_event.c 1.45: * src/ctests/overflow_twoevents.c 1.26: * src/ctests/pthrtough2.c 1.7: * src/ctests/zero_shmem.c 1.6: * src/ftests/cost.F 1.18: * src/ftests/fmultiplex1.F 1.37: * src/ftests/ftests_util.F 1.49: * src/ftests/native.F 1.55: * src/perfmon.h 1.20: removed code for obsolete cray builds * src/ctests/do_loops.c 1.32: * src/ctests/zero_fork.c 1.9: * src/linux-memory.c 1.41: * src/linux.h 1.3: * src/perfctr-p3.c 1.91: * src/perfctr-p3.h 1.50: * src/run_cat_tests.sh 1.4: removed Catamount code 2009-11-09 bsheely * src/linux-ia64-memory.c 1.23: * src/linux-ia64.c 1.176: created hwinfo_linux.c to encapsulate code to set _papi_hw_info struct on Linux platforms * src/unicosmp-memory.c 1.4: removed obsolete file 2009-11-06 terpstra * src/libpfm-3.y/examples_v2.x/x86/smpl_nhm_lbr.c 1.1.1.2: libpfm nhm and atom fixes 2009-11-05 bsheely * src/alpha-memory.c 1.11: * src/ckcatamount.c 1.3: * src/dadd-alpha.c 1.43: * src/dadd-alpha.h 1.14: * src/irix-memory.c 1.20: * src/irix-mips.c 1.116: * src/irix-mips.h 1.34: * src/irix.c 1.2: * src/irix.h 1.3: * src/linux-alpha.c 1.24: * src/linux-alpha.h 1.9: * src/power3.c 1.41: * src/power3.h 1.19: * src/power3_events.c 1.9: * src/power3_events.h 1.8: * src/power4_events.h 1.9: * src/power4_events_map.c 1.6: * src/t3e_events.c 1.11: * src/tru64-alpha.c 1.66: * src/tru64-alpha.h 1.22: * src/unicos-ev5.c 1.69: * src/unicos-ev5.h 1.20: * src/unicos-memory.c 1.12: * src/unicosmp.h 1.5: * src/x1-native-presets.h 1.4: * src/x1-native.h 1.5: * src/x1-presets.h 1.7: * src/x1.c 1.38: * src/x1.h 1.11: removed files related to obsolete builds 2009-11-03 terpstra * src/libpfm-3.y/examples_v2.x/x86/Makefile 1.1.1.3: * src/libpfm-3.y/examples_v2.x/x86/smpl_core_pebs.c 1.1.1.3: * src/libpfm-3.y/examples_v2.x/x86/smpl_pebs.c 1.1.1.1: * src/libpfm-3.y/include/Makefile 1.1.1.9: * src/libpfm-3.y/include/perfmon/perfmon_pebs_smpl.h 1.1.1.1: * src/libpfm-3.y/include/perfmon/pfmlib_intel_nhm.h 1.1.1.2: * src/libpfm-3.y/lib/amd64_events_fam10h.h 1.1.1.5: * src/libpfm-3.y/lib/intel_corei7_unc_events.h 1.1.1.2: * src/libpfm-3.y/lib/pfmlib_amd64.c 1.1.1.10: * src/libpfm-3.y/lib/pfmlib_core.c 1.1.1.12: * src/libpfm-3.y/lib/pfmlib_intel_atom.c 1.1.1.6: * src/libpfm-3.y/lib/pfmlib_intel_nhm_priv.h 1.1.1.2: * src/libpfm-3.y/lib/power6_events.h 1.1.1.4: latest libpfm changes 2009-11-02 terpstra * src/utils/avail.c 1.49: * src/utils/native_avail.c 1.42: Fixes to eliminate strcpy on overlapping strings The offending calls were replaced with memmoves and encapsulated in a single function for better maintenance. 2009-10-29 bsheely * src/solaris-ultra.h 1.41: resolved compile errors on solaris 2009-10-23 bsheely * src/Rules.pfm_pcl 1.13: * src/pcl.c 1.12: * src/pcl.h 1.5: Naming convention change from PCL to Perf Events: renamed pcl.h and pcl.c to perf_events.h and perf_events.c, renamed Rules.pfm_pcl to Rules.pfm_pe, configure option --with-pcl changed to --with-perf-events 2009-10-20 bsheely * src/ctests/byte_profile.c 1.18: corrected possible logic error in setting end point of profile buffer 2009-10-15 bsheely * src/perfctr-ppc32.c 1.9: corrected possible init error 2009-10-14 terpstra * src/ctests/calibrate.c 1.39: Error checking was missing undercount conditions. 2009-10-13 terpstra * src/run_tests_exclude.txt 1.6: This file never existed on the PAPI-C branch. * src/aix-memory.c 1.15: * src/aix.c 1.84: * src/aix.h 1.29: * src/pmapi-ppc64.c 1.8: * src/pmapi-ppc64.h 1.4: * src/threads.c 1.33: Conversion of AIX to PAPI-C. Most tests pass, except for some overflow related stuff. Haven't examined things closely yet, but thought I should check this stuff in. 2009-10-12 bsheely * src/ftests/fdmemtest.F 1.5: * src/ftests/flops.F 1.14: declare types explicitly * src/ctests/multiattach.c 1.5: * src/ctests/zero_attach.c 1.5: corrected logic error with pid type 2009-10-09 terpstra * src/power6_events.h 1.3: * src/power6_events_map.c 1.4: Somehow these got removed from the repository. papi-5.3.0/ChangeLogP500.txt0000600003276200002170000030757112247131117015113 0ustar ralphundrgrad2012-08-08 * 4b4f87ff ChangeLogP5000.txt: Changelog for PAPI5 * 6f208c06 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump version numbers in prep for a 5.0 release. * c6fdbd11 release_procedure.txt: Update release_procedure.txt Change the order of when we branch git, so that the main dev branch gets some of the release related changes. 2012-04-17 * 97d4687f ChangeLogP440.txt: Pickup the changelog from papi 4.4 This was only included in the stable-4.4 branch. 2012-08-23 * 628c2b6e src/buildbot_configure_with_components.sh: Take debug out of the with several components build test config. When built with PAPI's memory wrapper routines, the threaded stress tests will sometimes get into poor performing situations. See trac ticket 148 for discussion. http://icl.utk.edu/trac/papi/ticket/148 2012-08-22 * 46faae8e src/ctests/overflow2.c src/ctests/overflow_single_event.c src/ctests/overflow_twoevents.c...: Move find_nonderived_event() from overflow_twoevents to test_utils and call it from overflow2 and overflow_single_event to insure that we're not trying to overflow on a derived event. * 3e7d8455 src/ctests/zero_smp.c: Fix a memory leak reported on the aix power7 machine. zero_smp.c did not unregister at the end of its thread function. * 3ad5782f src/perf_events.c: perf_events: fix segfault if DEBUG is enabled Was incorrectly using "i" as an index where it should be "0" in a debug statement. 2012-08-21 * a3cadbdb src/ftests/accum.F src/ftests/avail.F src/ftests/case1.F...: Take #2. Changing len_trim function in ftests to last_char. This time, I respect 72 char line limit. * c9db8fbf src/ctests/overflow_force_software.c: overflow_force_software was the only test that used a different hard_tolerance value (0.25) than the other overflow tests (0.75). This caused trouble on Power7/AIX. Now we are using the same hard_tolerance value in all overflow tests. * 70515343 src/ftests/accum.F src/ftests/avail.F src/ftests/case1.F...: Changed name of function len_trim to last_char. * 95168d79 src/components/cuda/linux-cuda.c: Cleanup cuda shutdown code. * The shutdown_thread code cleaned out the whole component's state. This has been split into shutdown_global for the whole component, and shutdown_thread is left to cleanup some control state info. * 56284f81 src/ctests/multiplex1_pthreads.c: Fix memory leaks in pthread multiplex tests. * aeead8b6 src/threads.c: Remove an outdated comment about _papi_hwi_free_EventSet holding INTERNAL lock * e598647b src/perf_events.c: perf_events: fix issue where we dereference a pointer before NULL check. Fix suggested by Will Cohen, based on a coverity report. * 4e0ed976 src/ctests/calibrate.c: Modify warning message to eliminate the word "error" Hopefully this will suppress it in buildbot outputs. * 50fbba18 src/ctests/api.c src/ftests/case2.F: Cleanup a few more warnings from the PAPI_perror change. * 1f06bf28 src/ftests/case2.F: Missed an instance of perror in the fortran code. * 93e6ae2c src/ftests/ftests_util.F: Fix warning in ftest_util.F 2012-08-20 * 60c6029e src/perf_events.c: perf_events: Update multiplexing code It * turns out the PERF_EVENT_IOC_RESET ioctl resets the count but not the multiplexing info. This means that when we fiddle with the events then reset them in check_scheduability(), we are not really resetting things to zero. The effect might be small, but since the new multiplex code by definition is always scheduable, then let's skip the test if multiplexing. * 9079236c src/ctests/zero.c: Change error reporting so FLOPS > 100% above theoretical FAIL and FLOPS > 25% above theoretical WARN. 2012-08-18 * 980558af src/papi_internal.c: papi_internal: fix memory leak When I made some changes a while back I forgot to free ESI->NativeBits properly. This was causing memory leak warnings on buildbot. 2012-08-17 * 83a14612 src/perf_events.c: perf_events: more cleanups and comments We really need to go back and figure out in more detail what the profile/sampling/overflow code is doing. * 7cafb941 src/perf_events.c: perf_events: more cleanups and comments * e9e39a4b src/perf_events.c: perf_events: disable kernel multiplexing * before 2.6.34 It turns out even our simple multiplexing won't work on kernels before 2.6.34, so fall back to sw multiplex in that case. * 05801901 src/perf_events.c: perf_events: more cleanup and comments * 268e31d7 src/perf_events.c: perf_events: more cleanup and commenting * d62fc2bf src/perf_events.c: perf_events: more cleanup and comments * fb0081bc src/perf_events.c: perf_events: more cleanups and comments * a1142fc8 src/perf_events.c: perf_events: cleanup and comment the kernel * bug workarounds * b8560369 src/perf_events.c: perf_events: minor cleanups and new comments * 6c320bb2 src/perf_events.c: perf_events: fix some debug messages I forget to test with --with-debug enabled * f7a3cccf src/perf_events.c: perf_events: enable new read_code This makes the read code much simpler. It finishes the multiplexing changes. To avoid complication, we no longer enable PERF_FORMAT_ID as reading that extra info is unnecessary with the current implementation. This passes all the tests on a recent kernel, but on 2.6.32 there are still a few issues. * 15749cff src/ctests/all_events.c src/ctests/all_native_events.c: Fix warning in all_events and all_native_events. In the perror semantic change, several strings for use in the old interface were left. 2012-08-16 * afdd25fa src/perf_events.c: perf_events: always enable kernel multiplexing The new code should work on any kernel version. * 9f5e23ae src/perf_events.c: Rewrite multiplex support. Drop support for the former "partitioned" multiplexing, as we could never use it. Instead use the simple/braindead model. This still needs more work, as sometimes reads are failing. * cdd29909 src/ftests/strtest.F: Fix strtest.F ftest It was still making some assumptions about PAPI_perror() writing to a string rather than directly to standard error. * 565f60b3 src/papi_internal.c: Missing code to set num_error_chunks to 0 The new _papi_hwi_cleanup_errors() function was not resetting num_error_chunks to 0, leading to a segfault in the fmultiplex1 test. 2012-08-02 * bb85bafd src/genpapifdef.c src/papi.c src/papi_common_strings.h...: Remove usage of _papi_hwi_err. Move PAPI over to storing errors in a runtime list. * Functions to add/lookup errors. * Generate the list of PAPI_E* errors at library_init time. * genpapifdef pulled the values for the PAPI_* error return codes from the _papi_hwi_err structure at configure time. Since this is now built at run-time, I added the appropriate values to genpapifdef's builting describe_t table. See : _papi_hwi_publish_error _papi_hwi_init_errors For usage hints. 2012-08-10 * e27af085 src/perf_events.c: perf_event: rename BRAINDEAD_MULTIPLEXING It is now "simple_multiplexing" and is a variable not an #ifdef This is needed before perf_event multiplexing can be sorted out. It's unclear if it actually works anyway. * 7f8e8c58 src/perf_events.c: perf_event: remove context "cookie" field It was a bit of overkill, we just need an initialized field. Also revamp how context and control are initialized. * 8cb8ac6d src/perf_events.c: perf_event: move all event specific info to * the control state previously half was in the context state and half in the control state perf_event has a strange architecture with each event being created having its own fd, which is context wide. In PAPI though we usually only have one eventset (control state) active at once, so there's no need to have the context be aware of this. 2012-08-09 * 8d7782cb src/perf_events.c: perf_event: rename evt_t to perf_event_info_t This just makes the code easier to follow. * 349de05c src/perf_events.c: perf_event: remove the superfluous per_event_info_t structure 2012-08-08 * da8ad0a2 src/ctests/all_native_events.c src/ctests/get_event_component.c src/utils/native_avail.c: Fix warnings about PAPI_enum_cmp_event() return not being checked Reported by coverity checker via Will Cohen Harmless warnings, and now the checker will likely complain about the value being checked but ignored. * b4719888 src/papi_user_events.c: Fix unused value in papi_user_events.c Reported by Coverity checker by Will Cohen * 6a8f255c src/utils/event_chooser.c: remove unused * PAPI_get_component_info() call in event_chooser Reported by Will Cohen from coverity checker 2012-08-06 * 62cda478 src/genpapifdef.c src/papi_common_strings.h src/papi_internal.c...: Remove usage of _papi_hwi_err. genpapifdef pulled the values for the PAPI_* error return codes from the _papi_hwi_err structure at configure time. Since this is now built at run-time, I added the appropriate values to genpapifdef's builting describe_t table. 2012-08-02 * d11259f3 src/papi.c src/papi_internal.c src/papi_internal.h...: Move over to generating the list of PAPI errors at library_init time. * 097ffc44 src/papi_internal.c: Functions to add/lookup errors. 2012-08-07 * 2530533f src/papi_events.csv: tests/zero fails on Power7 due to PAPI_FP_INS Error of 50%. Preset definition has been redefined and test now passes. * 8e17836f src/components/appio/Rules.appio src/components/appio/appio.c src/components/appio/appio.h...: We now intercept recv(). The support for recv() requires dynamic linkage. For static linkage, recv is not intercepted. 2012-08-06 * 8b1eb84c src/perf_events.c: perf_events: some whitespace cleanup and extra comments * f10edba6 src/perf_events.c: perf_events: MAX_READ was no longer being used, remove it * 08c06ed1 src/perf_events.c: perf_event event_id is actually 64-bit, so make our copy match * a33e8d9c src/perf_events.c: Rename context_t pe_context_t in perf_events.c Makes the code a bit clearer and matches how other components name things. * 96ce9dcd src/perf_events.c: Rename control_state_t pe_control_state_t This makes the code a bit easier to follow and matches how other components name things. 2012-08-03 * 4c5dce7f src/ctests/zero.c: Beef up error reporting. * 83b5d28a src/ctests/cycle_ratio.c: Have the cycle_ratio test skip if PAPI_REF_CYC event is not defined. 2012-08-02 * 25b1ba41 src/ctests/cycle_ratio.c: Removed all TESTS_QUIET blocks. They aren't needed because tests_quiet() overloads printf. We should probably remove TEST_QUIET blocks in ALL tests at some point for code clarity… * 8777d7d4 src/ctests/zero.c: Fixed error reporting. The error computation was inside a TESTS_QUIET block and wasn't getting executed when run quietly. Thus this test always passed on buildbot, even when it didn't. * 006fe8e9 src/ctests/Makefile: Fix typo in cycle_ratio make line. * 88e6d6a4 src/aix.c src/aix.h: Setting number of multiplex counters back to 32 for AIX. Before it was set equal to number of max HW counters. This caused ctests/sdsc-mpx to fail. * ab78deda src/papi_events.csv: ctests/calibrate on Power7/AIX failed with a 50% error all the way through. Updated the preset FP_OPS with a more appropriate definition. Now the calibrate errors range from 0.0002 to 0.0011% for double and single precision * fadce32f src/ctests/calibrate.c: Modify calibrate test in two ways: 1. add a -f flag to suppress failures and allow test to run to completion; 2. change error detection to allow warnings above MAX_WARN and failures above MAX_FAIL. Currently set to 10% and 80% respectively. This allows speculative over counting to pass with warning rather than fail completely. * 8a39ac9d src/papi_events.csv: LST_INS for Power7 was defined from 3 native events that cannot be counted together at the same time. Caused ctests/all_events to fail. Updated the preset with a more appropriate definition. * cdc16e5d src/papi_events.csv: L1_DCA for Power7 was defined from 3 native events that cannot be counted together at the same time. That caused ctests/tenth to fail. Updated the preset with a more appropriate definition. 2012-08-01 * 2bf44d13 src/papi_internal.c src/perf_events.c: icc does not like arithmetic on void pointers. Added cast to unsigned char* when arithmetic was being performed on void pointers in papi_internal and perf_events. * 7825ec14 src/ctests/api.c src/ctests/attach2.c src/ctests/attach3.c...: Modify tests that FAIL if PAPI_FP_OPS or PAPI_FP_INS not implemented. Now they will warn and continue. This is specifically to accommodate the brain-dead IvyBridge implementation. * fd70a015 src/testlib/test_utils.c: Re-writing of test_utils introduced new bugs that caused ctests/tenth to fail. test_events struct lists the same event twice (MASK_L1_DCW), hence PAPI_add_event() fails because it's forced to add the same preset twice. * 74ece3a0 src/run_tests.sh: run_tests.sh was clobbering $EXCLUDE variable if $CUDA was defined. Changed to add entries from run_tests_exclude_cuda.txt to $EXCLUDE which should already contain entries from run_tests_exclude.txt instead of replacing the entries already contained. * 11ed2364 src/libpfm4/config.mk: Added check in libpfm4/config.mk to check if using icc. If so, the -Wno-unused-parameter flag will no longer be used because icc does not provide it and provides no alternative. * dedf73f6 src/papi_user_events.c: fget() returns an int it should be treated as an int The coverity scan flagged that the int return by fget was stored in a char. The main concern with this is the EOF that fget() could return is -1. Do not want to mess up that value by typecasting to char and then back to int. * c4fcbe7e src/ctests/kufrin.c: Check return values of PAPI_get_cmp_opt() and calloc A coverity scan showed that PAPI_get_cmp_opt() could potentially return a negative number. Also it is good form to check the return value of calloc to ensure it is a non-null pointer. * e89d6ffa src/testlib/test_utils.c: Clean up test_print_event_header() There were a couple warnings flagged by coverity on test_print_events_header(). The function now checks for error conditions flagged by PAPI_get_cmp_opt() and also frees memory allocated by a calloc() function. * c81d8b60 src/threads.h: Eliminate deadcode from threads.h If HAVE_THREAD_LOCAL_STORAGE is defined, a portion of the _papi_hwi_lookup_thread() will never be executed. This patch make either one section or the other section of code be compiled. This will eliminate a coverity scan warning about unreachable code. * f70f3f56 src/ctests/all_native_events.c: Eliminate unused variable in ctests/all_native_events.c Coverity identified a variable that was set but never used in all_native_events.c. This patch removes the unused variable to eliminate that warning. * a9f29840 src/components/appio/appio.c: A couple places in appio.c used the FD_SET() without initializing the variable. Coverity scan pointed out this issue. * 9e535ae2 src/components/rapl/linux-rapl.c: A Coverity scan pointed out that read_msr() could potentially use an invalid value of fd for pread(). Need to check the value of fd before using it. * 7b55c675 src/components/rapl/linux-rapl.c: The arrays used for initialization were hard coded to 1024 packages. Want to avoid hard coding that so the day when machines with 1025 packages are available is a non-event. Also changed the initialization code to avoid having the initialization be O((number of packages)^2) in time complexity. 2012-07-27 * 3703995a src/papi_internal.c: Fix the component name predending code. When presented with a NULL component .short_name, the code did the wrong thing. * 5258db8b src/components/cuda/linux-cuda.c: Fix a warning in cuda. 2012-07-26 * ddd6f193 src/ctests/Makefile src/ctests/cycle_ratio.c: Add a test to compute nominal CPU MHz from real cycles and use PAPI_TOT_CYC and PAPI_REF_CYC to compute effective MHz. Warns if PAPI_REF_CYC is zero, which can happen on kernels < ~3.3. * fab5e9ef src/papiStdEventDefs.h src/papi_common_strings.h src/papi_events.csv: Add PAPI_REF_CYC preset event. Define it as UNHALTED_REFERENCE_CYCLES for all Intel platforms on which this native event is supported. 2012-07-25 * 8b9b6bef src/papi_events.csv: Modify SandyBridge and IvyBridge tables: SandyBridge FP_OPS only counts scalars; SP_OPS and DP_OPS now count correctly, including SSE and AVX. IvyBridge can't count FP at all; adjustments made to eliminate event differences with SandyBridge. 2012-07-26 * 5b11c982 src/components/cuda/linux-cuda.c: Fix the cuda component. The cuda component prepended CUDA. to all its event names, this is no longer the case. 2012-07-25 * db5b0857 src/papi_events.csv: Added 2 new preset definitions for BGQ. Note, these presets use the new feature where a generic event mask together with an ORed opcode string is used. This won't work until the new Q driver is released (currently scheduled for end of August). 2012-07-24 * af7cd721 src/components/coretemp/linux-coretemp.c src/components/coretemp/tests/coretemp_pretty.c src/components/cuda/linux-cuda.c...: Enforce all our components to use the same naming. We setteled on :'s as inter-event seperators. This also touches a few of the components' tests, we changed the name field so their searches needed help. 2012-07-23 * 57aeb9d4 src/papi_internal.c: Prepend component .short_name to each event name. Use ::: as a sep. 2012-07-24 * 762e9584 src/ctests/multiplex2.c src/sw_multiplex.c: Fix multiplex2 test It complained if it tried to add a multiplex event and PAPI properly told it that it couldn't. * 531870f1 src/papi_internal.c: Add sanity check at component init time Looks for num_cntrs being larger than num_mpx_cntrs which doesn't make much sense. * 53ad0259 src/extras.c src/genpapifdef.c src/papi.c...: Rename PAPI_MAX_HWCTRS to PAPI_EVENTS_IN_DERIVED_EVENT Hopefully this will make things a little less confusing. * 700af24b src/papi_internal.c: Change EventInfoArrayLength to always return num_mpx_cntrs Things should be consistently using num_mpx_cntrs rather than num_cntrs now. Issue reported by Steve Kaufmann * d1570bec src/sw_multiplex.c: Fix sw_multiplex bug when max SW counters is less than max HW counters this was causing kufrin and max_multiplex to fail * f47f5d6a src/aix.c src/components/appio/appio.c src/components/bgpm/CNKunit/linux-CNKunit.c...: Remove PAPI_MPX_DEF_DEG It was not well documented and being used in confused ways all over the code. Now there is a different define PAPI_MAX_SW_MPX_EVENTS used by the software multiplex code. All other components have had the value replaced with just the maximum number of counters. If a component can handle its own (non-software) multiplexing it is up to it to set .num_mpx_cntrs to a value that's different from .num_cntrs * 0d83f5db src/papi_internal.c src/papi_internal.h: Split NativeBits off of NativeInfoArray in EventSet previously we were doing some crazy thing where we allocated both at once and then split them afterward. The new code is easier to follow. * 98f2ecbd src/papi_internal.c: Clean up EventSet creation Sort out which sizes are used for allocating which structures. * e1024579 src/Makefile.inc src/multiplex.c src/multiplex.h...: Rename the multiplex files to be sw_multiplex That way it's more clear the stuff included only relates to software multiplexing, not generic multiplexing. * a6adc7ff src/multiplex.h src/papi_internal.c src/papi_internal.h: Move some sw-multiplex specific terms out of papi_internal.h and into multiplex.h 2012-07-23 * 1ddbe117 src/components/README: Added note that lmsensors component requires lmsensors version >=3.0.0 * 94676869 src/components/appio/appio.c src/components/appio/tests/appio_test_pthreads.c: proper checking of return codes in response to tests using coverity * ea958b18 src/components/appio/tests/appio_list_events.c src/components/appio/tests/appio_values_by_code.c: As component name in table has been changed from appio.c to appio, we now use appio in the tests. 2012-07-20 * f212cc34 src/components/appio/appio.c src/components/coretemp/linux-coretemp.c src/components/coretemp_freebsd/coretemp_freebsd.c...: Add .short_name entries to each component. * 1e755836 src/papi_libpfm4_events.c src/perf_events.c: Fix use-after-free bug in perf_events.c This turned up in the ctests/flops test, and Valgrind made it easy to track down. * 4580ed1d src/perf_events.c: Update perf_event.c rdpmc support Use the libpfm4 definition for mmap rather than our custom one, now that libpfm4 has been updated * 47558b2c src/libpfm4/examples/showevtinfo.c src/libpfm4/include/perfmon/perf_event.h src/libpfm4/lib/pfmlib_intel_nhm_unc.c...: Import current libpfm4 from libpfm4 git It has some minor uncore fixes plus the header changes needed for rdpmc. 2012-07-17 * 65d4c06c src/linux-common.c: Reorder statements to ensure the fclose() are performed Coverity pointed out that it was possible for resources to be leaked in linux-common.c if the fscanf() encountered error. This reordering of the statements ensures that the fclose() calls are done regards of the results of the fscanf() functions. 2012-07-18 * 7bf071ff src/papi_user_events.c: Ensure that load_user_event_table() frees files and memory on error A Coverity scan showed that an error condition in load_user_event_table() function would exit the the function without closing the table file or freeing allocated memory. This patch addresses those issues. 2012-07-17 * 1ba52e35 src/components/stealtime/linux-stealtime.c: Ensure that read_stealtime() closes the file in case of an error condition A Coverity scan showed that an error condition could cause read_stealtime() to exit without closing the file. This patch ensures that the file is closed regardless of success or failure. 2012-07-18 * f37f22e5 src/papi_libpfm4_events.c: Fix warning in papi_libpfm4_events.c We were setting a value but never using it. * 8e8401bc src/testlib/test_utils.c: Remove unused variable in test_utils.c Most of the machines in buildbot were complaining about this. * 133ce6a9 src/linux-timer.c: Add missing papi_vector.h include to linux-timer.c This was breaking on PPC Linux 2012-07-17 * 6fd3cedd src/perf_events.c: Fix perf_events.c warnings reported by ICC * 21c8f932 src/perfctr-x86.c: Fix perfctr-x86 build with debug enabled * 08f76743 src/configure src/configure.in src/linux-bgq.c: Attempt to fix linux-bgq compilation error. It turns out BGQ uses the standard linux-context.h header * 43457f4f src/linux-bgq.c: Made check for opcodes more robust. * d58116b4 src/perf_events.c: More cleanups of perf_events.c file * 409438b7 src/freebsd-context.h src/freebsd.c src/freebsd.h: Fix FreeBSD compile warnings Similar to the perfctr issues. * 1e6dfb02 src/aix.c src/aix.h: Fix AIX build warnings They were similar in cause to the perfctr issues. * 3d0b5785 src/Rules.perfmon2 src/components/appio/appio.c src/components/bgpm/CNKunit/linux-CNKunit.h...: Remove papi_vector.h include from papi_internal.h There were some semi-circular dependencies that came up with the context split changes. The easiest way to fix things for perfctr was just move papi_vector.h out to be included explicitly. This touches a lot of files because a lot of files include papi_internal.h This should also fix the perfctr and perfmon2 builds that were broken yesterday. 2012-07-16 * a7a14a5b src/ctests/zero.c src/testlib/test_utils.c: Modify zero test to warm up processor before measuring events, and report timing errors as signed deviations. Modify test_utils add_two_events code to check for errors after adding nominally valid events. This is a more rigorous test than just counting available registers. * de0860d3 src/perf_events.c: Remove perf_events.h module header It's no longer needed, everything important is merged into the perf_events.c file. * 22975f14 src/perf_events.c: Remove perf_event SYNCHRONIZED_RESET code This was never defined and never used, just remove the code. * 48750b8c src/perf_events.c: Remove papi_pe_allocate_registers On perf_event this code wasn't really doing anything useful, as update_control_state would end up re-doing any possible tests we could want to do here. * 1775566f src/Makefile.in src/Makefile.inc src/Rules.pfm4_pe...: Remove "include CPUCOMPONENT" from papi_internal.h This was the last major dependency on CPU component in common PAPI code. It was mostly necessary for the ucontext definitions when trying to get the instruction pointer when doing sampling. This change creates new OS-specific header files that handle the ucontext related code, and has papi_internal.h include that instead. * 969ce035 src/Rules.pfm4_pe src/Rules.pfm_pe src/configure...: Make perf_event libpfm4 only perf_event libpfm3 support is not really needed anymore and supporting it was cluttering up the perf_event component. 2012-07-13 * adad1d2a src/perf_events.c: Add init time error messages to perf_event component * 827ccc07 src/perf_events.c: Add perf_event rdpmc / fast_real_timer detection Currently we need a custom copy of struct perf_event_mmap_page because the version included with libpfm4 doesn't define the fields we need yet. * 4f82fe94 src/perf_events.c: Read in paranoid info on perf_events This indicates whether a regular user can read CPU-specific or system-wide measurements. * 03080450 src/perf_events.c: Add proper perf_event detection Using the official /proc/sys/kernel/perf_event_paranoid file * 6e71d3f7 src/linux-bgq.c: Updated BGQ opcode stuff; cleaned up code. 2012-07-11 * 3114d3dc src/multiplex.c src/papi_internal.c src/perf_events.c: Minor documentation improvements Plus fixes some typos 2012-07-09 * b60c0f0c src/perf_events.c: Minor cleanups to perf_events.c * 075278a0 src/aix.c src/freebsd.c src/linux-bgp.c...: Change return value for .allocate_registers For some reason it returned 1 on success and 0 on failure. Change it so you return PAPI_OK on success and a PAPI error on failure, to better match all of the other component vectors. * 29d9e62b src/testlib/test_utils.c: Fixed the print_header routine to report an error message if counters are not found, instead of a negative counter number. Tested by forcing the return value negative; not by running on a Mac, where the error first appeared. * 74257334 src/ctests/Makefile src/ctests/remove_events.c: Add remove_events test This just makes sure EventSets still work after an event has been removed. This is probably covered by other more elaborate tests, but I needed a simple test to make sure I wasn't breaking anything. * 1372714f src/papi.c src/papi_internal.c src/papi_internal.h: Clean up, rename, and comment _papi_hwi_remap_event_position I've found this section of code to be confusing for a long time. I think I finally have it mostly figured out. I've renamed it _papi_hwi_map_events_to_native() to better describe what it does. The issue is that the native event list in an EventSet can change at various times. At event add, event remove, and somewhat unexpectedly, whenever ->update_control_state is called (a component can re-arrange native events if it wants, to handle conflicts, etc.) Once the native event list has been changed, _papi_hwi_map_events_to_native() has to be called to make sure the events all map to the proper native_event again. Currently we have the _papi_hwi_map_events_to_native() calls in odd places. It seems to cover all possible needed locations, but analyzing that we do takes a lot of analysis... * f1b837d8 src/papi.c: Remove unused variable in papi.c * 541bcf44 src/papi_internal.h: Update commens in papi_internal.h Some of the EventSetInfo comments were out of date. * e6587847 src/papi.c src/papi_internal.c src/papi_internal.h: Remove unused paramater from _papi_hwi_remap_event_position The mysterious _papi_hwi_remap_event_position function had a "thisindex" field that was ignored. This will possibly speed PAPI_start() time as it was running a loop over num_native_events on _papi_hwi_remap_event_position even though each call did the same thing since the value being passed was ignored. * 3ad3d14b src/papi_internal.c: Clean up and comment add_native_events in papi_internal.c I'm chasing some weird perf_events behavior with the papi_event_chooser. The add_native_events code is very hard to understand, working on commenting it more. * 4e5e7664 src/utils/event_chooser.c: Fix coverity warning in papi_event_chooser * 666249a8 src/jni/EventSet.java src/jni/FlipInfo.java src/jni/FlopInfo.java...: RIP Java. Java PAPI wrappers have not been supported for years (2005?). They are being removed to declutter the source. * e18561fc src/papi_preset.c: Update cmpinfo->num_preset_events properly This value wasn't being set if we were reading the presets directly from the CSV file. * 290ab7c3 src/utils/component.c: Have papi_component_avail report counter and event info * 7c421b9c src/testlib/test_utils.c src/utils/native_avail.c: Remove counter number from the testlib header. The header was only reporting number of counters for the CPU component, even though the header is printed for many utils and the CPU component might not even be involved. This could be a bit confusing, so remove it. * 26432359 src/darwin-common.c src/darwin-memory.c: Improve OSX support This properly detects CPU information now. You can get results like this: Available native events and hardware information. - PAPI Version : 4.9.0.0 Vendor string and code : GenuineIntel (1) Model string and code : Intel(R) Core(TM) i5-2435M CPU @ 2.40GHz (42) CPU Revision : 7.000000 CPUID Info : Family: 6 Model: 42 Stepping: 7 CPU Max Megahertz : 2400 CPU Min Megahertz : 2400 CPUs per Node : 0 Total CPUs : 4 Running in a VM : no Number Hardware Counters : -4 Max Multiplex Counters : -4 - 2012-07-08 * 845d9ecb src/Makefile.inc src/configure src/configure.in...: Add Mac OSX support This is enough that things compile and simple utilities run. No CPU perf counter support. 2012-07-06 * ff6f9ab4 src/linux-bgq.c: missed to delete a debug output. 2012-04-17 * 12e9a11a RELEASENOTES.txt: Release notes for the 4.4 release. 2012-07-06 * ac2eac56 src/papi.c src/papi.h: Add a PAPI_disable_component_by_name entry point. * 8c490849 src/components/coretemp_freebsd/coretemp_freebsd.c src/freebsd.c: Fix FreeBSD to work I'm not sure how it ever worked in the past. With these changes I can at least do a papi_component_avail and a papi_native_avail and get sane results * 108b5ce6 src/freebsd.c src/freebsd.h src/freebsd/map-atom.c...: Fix FreeBSD build some of the recent changes broke the FreeBSD build * 40a60f0a src/linux-bgq.c src/linux-bgq.h: Added BGQ's opcode and generic event functionality to PAPI. For BGQ there are multiple ways to define presets. The naive way is to derive from multiple events. This eats up multiple counters and we lose sample capability as well as overflow capability. On the other side, some events come with multiple InstrGroup derived in the field. If that's the case we can use a generic event and opcodes to filter multiple groups in a single counter. This is not working properly yet due to a known error in BGPM. Bgpm_AddEvent() does not work properly when multiple generic events are added to an EventSet. The BGPM folks have been made aware of this issue, they confirmed the error, and they are currently working on a fix. * 6f72b70f src/papi_events_table.sh: Make this script robust enough to handle any line ending, including CR (Mac), CRLF (Windows), and LF (Unix). It appears that google mail now automagically converts attached files to CRLF format. * 765ed0d2 src/papi_internal.c: Fix a type warning in the UE code. * 94bc1b15 src/MACROS: Remove the MACROS file it held out of date info and hasn't been touched since 2004 * d19e73ba src/ctests/Makefile src/ctests/clockcore.c src/testlib/Makefile...: Move the clockcore.c file from ctests to testlib it's common code used by multiple tests, including some in the utils directory also add a function definition to fix a build-time warning * 1101a6aa src/aix-lock.h src/aix.h src/configure...: Make papi_lock.h changes for non Linux architectures 2012-07-05 * 3b82b03d src/Makefile.in src/Makefile.inc src/aix.c...: Make the PAPI locks be tied to OS, not to CPU There is not a papi_lock.h file that when included gets the proper lock include for the OS. This fixes a lot of previous build hacks where a CPU component was needed in order for locks to work. * 0632ef42 src/threads.c: Fix spurious init_thread call in threads.c threads.c was calling init_thread() on all components, even ones that were disabled Fix it to honor the disable bit, as well as for shutdown_thread(). This was causing perfctr disable code to not work. * 19d9de7f src/Makefile.in src/Makefile.inc src/Rules.pfm4_pe...: Replace SUBSTRATE with CPUCOMPONENT in build This was mostly a configure/build change but it also cleaned up some cases where we were including SUBSTRATE where we didn't have to. * 829780db src/solaris-common.c src/solaris-common.h src/solaris-niagara2.c...: Move some common solaris code to solaris-common * 681ef027 src/configure src/configure.in src/solaris-memory.c...: Merge solaris-memory.c and solaris-niagara2-memory.c * bbd41743 src/solaris-ultra/get_tick.S src/solaris.h: Remove solaris-ultra/get_tick.S Nothing was using it. * dc3b6920 src/papi_sys_headers.h src/solaris.h: Remove papi_sys_headers.h Solaris was the only thing including it, and it wasn't really using it. * 7ccfa9df src/Makefile.in src/Makefile.inc src/configure...: Move move OS specific code into the new OSFILESSRC Linux in particular was using MISC for this. * 6f16c0c5 src/configure: Re-run autoconf to pickup the substrate=>component change. * cfff1ede src/Makefile.in src/Makefile.inc src/configure...: Remove MEMSUBSTR In reality, what we want instead of a Memory Substrate is an idea of the OS-specific common code that includes the memory substrate. This change adds OSFILESSRC and OSFILESOBJ to handle this case in configure * ca4729e6 src/configure.in: Separate out MEMSUBSTR and make it per-OS * 3148cba5 src/Matlab/PAPI_Matlab.dsp src/ctests/calibrate.c src/ctests/flops.c...: RIP Windows, remove the windows support code. Windows has not been activly supported since the transition to Component PAPI (4.0) This cleans up the code-base. 2012-07-03 * a366adf7 src/papi.c src/utils/error_codes.c: Change PAPI_strerror and PAPI_perror to behave more like thir POSIX namesakes. PAPI_error_descr is made redundant and removed as a result. 2012-07-05 * 7df46f81 src/Rules.pfm src/aix.c src/components/coretemp/linux-coretemp.c...: Move uses of PAPI_ESBSTR to PAPI_ECMP I left PAPI_ESBSTR defined too for backward compatability. Also some of the changes update PAPI_ESBSTR to be a more relevant error code, it one is available. 2012-07-03 * fdb348ad src/components/coretemp_freebsd/coretemp_freebsd.c src/components/example/example.c src/components/net/linux-net.c...: A few more substrate removals * 791747c1 src/cpus.c src/papi.h src/perf_events.c...: Fix bugs introduced by substrate -> component change Fix some stupid compile bugs that I missed. * 79b01a47 src/aix.c src/components/appio/appio.c src/components/bgpm/CNKunit/linux-CNKunit.c...: More substrate -> component changes This changes the vectors .init_substrate -> .init_component .shutdown_substrate -> .shutdown_substrate .init -> .init_thread .shutdown -> .shutdown_thread hopefully this will make the code clearer. * 02a10d71 src/Makefile.inc src/aix.c src/cpus.c...: Rename "substrate" to "component" This first pass only re-names things in comments. 2012-07-02 * c4bbff1c src/papi.c src/papi.h: Minor documentation fixes Found when writing up the PAPI 5.0 changes document 2012-06-30 * f9cb7346 src/components/vmware/vmware.c: Fix vmware component apparently I forgot to test the build with the vmguestlib support disabled. 2012-06-22 * 599040d1 src/components/coretemp/linux-coretemp.c src/components/rapl/linux-rapl.c src/components/stealtime/linux-stealtime.c...: Fix libpfm4 ntv_event_to_info event_info_t on other components This was actually a widespead problem due to cut-and-paste. * 2b51b439 src/papi_libpfm4_events.c: Properly fix libpfm4 ntv_event_to_info event_info_t event value The previous fix was subtly wrong. This is the proper fix, which is to do nothing inside of papi_libpfm4_events.c because papi_internal.c does the right thing for us and we were overwriting that with the wrong value. * a4f576bf src/ctests/overflow_allcounters.c src/testlib/papi_test.h src/testlib/test_utils.c: Clean up overflow_allcounters code While tracking down a previous issue I also cleaned up the overflow_allcounters test code to use some of the new interfaces. * 6903e053 src/papi_libpfm4_events.c: Fix libpfm4 ntv_event_to_info event_info_t event value The recently added libpfm4 ntv_event_to_info function was not properly oring PAPI_NATIVE_MASK to the event value in the event_info_t struct. That means if you tried to use that event value to add an event it would fail. The overflow_allcounters test broke because of this. * 420c3d11 src/ctests/Makefile src/ctests/disable_component.c src/papi.c...: Add PAPI_get_component_index() and PAPI_disable_component() PAPI_get_component_index() will return a component index if you give it the name of a component to match. This saves you having to iterate the entire component list looking. PAPI_disable_component() will manually mark a component as disabled. It has to be run before PAPI_library_init() is called. * 11946525 src/aix.c src/components/cuda/linux-cuda.c src/components/example/example.c...: Standardize component names to not end in .c We were being inconsistent; the time to make them all be the same is now before 5.0 gets out. 2012-06-21 * 274e1ad8 src/components/vmware/tests/Makefile: Fix cut-and-paste error in the vmware component Makefile * 85d6438d src/utils/event_chooser.c: Update papi_event_chooser to work with components Now you can specify events from components and it will tell you all the other events on that component that can run with it. Previously the utility was limited to the CPU component (0) only. * 3c2fcc83 src/papi_libpfm3_events.c src/papi_libpfm4_events.c src/perf_events.c: Hook up .ntv_code_to_info on perf_event * 36e864b3 src/papi_libpfm4_events.c src/papi_libpfm_events.h src/perf_events.c: Enable support for showing extended umasks on perf_event With this change, papi_native_avail now shows event umasks such as :u, :k, :c, :e, and :i. (user, kernel, cmask, edge-trigger, invert) Thes are boolean or integer events. They were supported by previous PAPI but they were never enumerated. * 8f3e305e src/components/coretemp/linux-coretemp.c: Fix cut-and-paste error in linux-coretemp.c that could lead to wrong size being copied * 0eedd562 src/libpfm4/lib/events/intel_atom_events.h src/libpfm4/lib/events/intel_core_events.h src/libpfm4/lib/events/intel_coreduo_events.h...: Import most recent libpfm4 git This fixes an issue where there can be confusion between :i and :i=1 type events. It also has initial support for Uncore, though you need a specially patched kernel and PAPI does not support it yet. * 2f86ec78 src/components/appio/tests/Makefile src/components/appio/tests/appio_test_blocking.c .../appio/tests/appio_test_fread_fwrite.c...: - Fixed tests verbosity by using TESTS_QUIET macro - Fixed Makefile to only include necessary tests for automatic builds (skip blocking tests that read from stdin) * 6936b955 src/components/appio/README src/components/appio/appio.c src/components/appio/appio.h...: Added polling of read/write descriptor to check which operations would block. * 48cacccf src/papi.h: Add back PAPI_COMPONENT_INDEX() for backward compatability It turns out some people were using this for cases other than enumeration. The proper way to do things now is to use PAPI_get_event_component() which is what this PAPI_COMPONENT_INDEX() now maps to. * d1ed12b7 src/ctests/Makefile src/ctests/get_event_component.c src/papi.c...: Add PAPI_get_event_component() This function returns the component an event belongs to. It also adds a test to test this functionality. 2012-06-20 * ffccf633 src/papi.h: Add component_type field to .cmp_info The idea is we'll specify CPU, I/O, GPU, hardware, etc. * 9998eecc src/components/lmsensors/Rules.lmsensors: Another lmsensors build fix * caa94d64 src/components/lmsensors/linux-lmsensors.c: Update lmsensors component to actually compile. I finally found a machine with lmsensors installed. * fbcde325 src/components/lmsensors/linux-lmsensors.c src/components/lmsensors/linux-lmsensors.h: Update lmsensor component Unlike the other components it hadn't been updated to PAPI 5 standards. Also, it was wrongly de-allocating all state in "_shutdown" rather than "_shutdown_substrate" which was causing double-frees during tests. * 0d3c0ae2 src/papi_internal.c: Add some extra debugging to _papi_hwi_get_native_event_info * 5961c03d src/aix.c src/components/nvml/linux-nvml.c src/ctests/subinfo.c...: Remove cntr_groups from .cmp_info This information is better exposed by enumeration. * 2b4193fd src/utils/event_chooser.c: Cleanup and comment event_chooser code * 7f9fab2b src/ctests/all_native_events.c: Cleanup and add comments to all_native_events.c * a245b502 src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/freebsd.c...: Remove profile_ear from .cmp_info The CPU components should handle this internally. * bca07f3c src/papi.c: Add comments to the PAPI_sprofil code. * b1e2090c src/papi.c: Minor papi.c cleanups Fix some minor cosmetic things, including a typo in a comment. * 8f3aef4a src/ctests/subinfo.c src/papi.h: Remove opcode_match_width field from .cmp_info This should be exposed via enumeration and not by a field in the generic cmp_info structure. * 047af629 src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/papi.h...: Remove cntr_OPCM_events field from .cmp_info This should be exposed via enumeration and not by a field in the generic cmp_info structure. * 3f1f9e10 src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/papi.h...: Remove cntr_DEAR_events field from .cmp_info This should be exposed via enumeration and not by a field in the generic cmp_info structure. * 962c642a src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/papi.h...: Remove cntr_IEAR_events field from .cmp_info This should be exposed via enumeration and not by a field in the generic cmp_info structure. * 5aa7eac1 src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/papi.h...: Remove instr_address_range from .cmp_info This feature should be deteced via enumeration, not via a flag in the generic .cmp_info structure. * 1bf68d5d src/components/nvml/linux-nvml.c src/ctests/subinfo.c src/papi.h...: Remove data_address_range field from .cmp_info The proper way to detect this feature is via enumeration. 2012-06-19 * 90037307 src/linux-context.h: Change Linux from using "struct siginfo" to "siginfo_t" This conforms to POSIX, and fixes newer Fedora where struct siginfo is no longer supported. This can in theory break on older setups (possibly kernel 2.4). If that happens, we need to somehow detect this using autoconf. 2012-06-18 * ad48b4fa src/Rules.perfctr-pfm: Fix the perfctr-pfm build; for buildbot, mostly. Have the perfctr-pfm build only build libpfm, like the perfevents builds. The icc build was choking on warnings (-Werror => errors) in the example programs with libpfm, this is not something we depend upon. 2012-06-17 * 358b14f9 src/papi_events.csv: Update BGQ presets * cf26fc87 src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/L2unit/linux-L2unit.c...: Update bgpm components according to the papi5 changes * a7b08a91 src/configure src/configure.in src/linux-bgq.c: Merging the BG/Q stuff from stable_4.2 to PAPI 5 did break it. It's corrected now; also predefined events are now working.) 2012-06-15 * 2d5a4205 src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/L2unit/linux-L2unit.c src/configure...: Merging the BG/Q stuff from stable_4.2 to PAPI 5 did break it. It's corrected now (almost); predefined events are not working yet.) * 1b034920 src/papi.c: Re-enable PAPI_event_name_to_code() optimization In PAPI_event_name_to_code() there was a commented-out optimization where we would check if an event name begins with "PAPI_" before searching the entire preset list for an event name. The comment says we had to disable this due to "user events", but a check shows that was introduced in e7bd768850ecf90 and that the "user events" it means is not the current support, but the now-removed PAPI_set_event_info() function where you could change the names of presets on the fly (even to something not starting with PAPI_). Since we don't support that anymore, we can re-enable the optimization. 2012-06-14 * 9a26b43d src/papi_internal.c src/papi_internal.h src/papi_preset.c: Remove the 16-component limit This turned out to be easier than I thought it would be. Now determining which component an event is in is a two step process. Before, the code shifted and masked to find the component from bits 26-30. Now, _papi_hwi_component_index() is used. There's a new native event table which maps all native events (which are allocated incrementally at first use starting with 0x4000000) to two values, a component number and an "internal" event number. 2012-06-13 * d5c50353 src/papi_internal.c: Fix for the PAPI_COMPONENT_MASK change I missed two cases in papi_internal.c This was causing the overflow_allcounters test to fail * 46fd84ce src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/CNKunit/linux-CNKunit.h src/components/bgpm/IOunit/linux-IOunit.c...: Updating the Q substrate according to the PAPI 5 changes * 05a8dcbf src/components/appio/appio.c src/components/appio/tests/appio_list_events.c src/components/appio/tests/appio_values_by_code.c...: First steps of removing 16-component limit This change removes PAPI_COMPONENT_INDEX(), PAPI_COMPONENT_MASK and PAPI_COMPONENT_AND_MASK. It adds the new functions _papi_hwi_component_index() _papi_hwi_native_to_eventcode() _papi_hwi_eventcode_to_native() By replacing all of the former macros with the equivelant of the latter functions, it allows all of the future 16-component limit changes to be made in the functions. Components now receive as events a plain 32-bit value as their internal native event; the high bits are not set. This may break some external components. This change should not break things, but a lot of testing is needed. * af4cbb86 src/run_tests_exclude.txt: Exclude iozone helper scripts from run_tests. run_tests.sh looks for executible files under components/*/tests Some of the plotting scripts in appio/iozone were getting picked up. 2012-06-12 * c10c7ccb src/configure src/configure.in: Configure does not work on BGQ due to missing subcomp feature. It worked for stable-4.2 but got lost in current git origin. * d9a58148 src/aix.c src/ctests/hwinfo.c src/ctests/overflow.c...: Update hw_info_t CPU frequency reporting. Previously PAPI reported "float mhz" and "int clock_mhz". In theory the first was the current CPU speed, and the latter was the resolution of the cycle counter. In practice they were both set to the same value (on Linux, read from /proc/cpuinfo) and not very useful when DVFS was enabled, as the value reported was usually lower than the actual frequency running once CPU started being used. This change adds two new values "cpu_max_mhz" and "cpu_min_mhz" which are read from /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq and /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq if they are available, and falls back to /proc/cpuinfo otherwise. All of the tests were updated to use cpu_max_mhz. The old mhz and clock_mhz values are left for compatability reasons (and set to cpu_max_mhz) but are currently unused otherwise. 2012-06-11 * 0f124891 src/papi_events.csv: Initial PAPI Ivy Bridge Support for now try to re-use the sandy bridge event presets * a1f46077 src/libpfm4/docs/man3/libpfm_intel_ivb.3 src/libpfm4/include/perfmon/err.h src/libpfm4/lib/events/intel_ivb_events.h...: Import libpfm4 git snapshot This adds IvyBridge support * 3bb983cc src/libpfm-3.y/examples_v2.x/self_smpl_multi.c: Fix a libpfm3 example program for icc, local fix because libpfm3 is deprecated. icc does have more enjoyable warnings than gcc, : error 186: pointless comparison of unsigned integer with zero on this: unsigned int foo; ... if ( foo < 0 ) 2012-06-06 * d28adccf src/papi_user_events.c: The user events code had a call to exit, this was bad... 2012-06-04 * 6bf43022 src/testlib/test_utils.c: Further the hack for testing for perf SW events. Events like - | perf::CPU-CLOCK | | PERF_COUNT_SW_CPU_CLOCK | - were passing the check, now we also check the event_info_struct.long_descr field for PERF_COUNT_SW.... * fa4b1a28 src/components/nvml/linux-nvml.c: Cleanup nvml code a little. A few print statements were left over from debugging. Also check errors from nvml and cuda pciinfo functions, disabling the component in a few more cases. 2012-06-01 * da144a94 src/components/nvml/Makefile.nvml.in src/components/nvml/README src/components/nvml/Rules.nvml...: Rewrite and merge of the nVidia Management library component. This component attempts to expose all supported 'performance counters' on each card cuda knows about at runtime. Much like the cuda component reads happen on the card you're executing on at PAPI_read time. The test included is a copy of the cuda helloworld test, but it attempts to start/stop the event on each gpgpu. If you select an event that is not supported on the card you're running on we should fail gracefully but this has not been tested. 2012-05-23 * b2d414dc src/components/stealtime/linux-stealtime.c: At units to stealtime component Added the function but forgot to add a function vector for it. * ce9d4500 src/components/stealtime/linux-stealtime.c: Add units to stealtime Properly report that the units are in micro seconds. * 149948c8 src/components/rapl/linux-rapl.c: Minor cleanup of RAPL code missing "void" paramter in init_substrate function * 6a7e22fa src/components/vmware/vmware.c: More vmware component fixes. This makes the component thread-safe. Also makes it fail more gracefully if the guestlib SDK is installed but does not support our hypervisor (for example, if we are running under VM Workstation). Still need to test on ESX. * 072d6473 src/components/appio/tests/appio_test_select.c: added code to intercept and time select() calls. 2012-05-22 * 12b6d0d7 src/components/vmware/vmware.c: Some more minor fixes to VMware component Try to handle things better if VMguest SDK not working * 6e015bc5 src/components/vmware/Rules.vmware src/components/vmware/vmware.c: More vmware component fixups Now works with the events from the VMguest SDK library * 5fc0f646 src/components/vmware/vmware.c: More cleanup of vmware component The pseudo-performance counters work again. Now they behave in accumulate mode, like all other PAPI counters. * f72b0967 src/components/vmware/tests/vmware_basic.c: Make vmware test a bit more complete * 070e5481 src/components/vmware/tests/Makefile src/components/vmware/tests/vmware_basic.c: Add a test for the vmware component * 7cf62498 src/components/vmware/Makefile.vmware.in src/components/vmware/Rules.vmware src/components/vmware/configure...: Clean up the vmware component. bring it up to date with other components. make it possible to build it without the vmguest library being installed * b32ae1ae src/components/stealtime/Rules.stealtime src/components/stealtime/linux-stealtime.c src/components/stealtime/tests/Makefile...: Add a stealtime component When running in a VM, this provides information on how much time was "stolen" by the hypervisor due to the VM being disabled. This info is gathered from column 8 of /proc/stat This currently only works on KVM. * 9e95b480 src/components/appio/tests/appio_test_blocking.c: Use a non-blocking select to determine which reads and writes would block 2012-05-19 * f60d991f src/components/appio/README src/components/appio/appio.c src/components/appio/tests/appio_test_read_write.c...: Interception of close() implemented. This allows us to correctly determine the number of currently open descriptors. 2012-05-17 * 7cd8b5a3 src/libpfm4/.gitignore src/libpfm4/config.mk src/libpfm4/lib/Makefile...: Update libpfm4 to current git tree * ebffdb7e src/components/rapl/tests/rapl_overflow.c: Skip rapl_overflow test if RAPL not available * 98d21ef3 src/components/example/example.c src/components/rapl/linux-rapl.c: Fix some component warnings. * 0447f373 src/configure src/configure.in src/linux-generic.h: Make build not stall if PAPI_EVENTS_CSV not set This is some fallout from the FreeBSD changes. PAPI_EVENTS_CSV could not be set, which would make the event creation script hang forever. Also catch various fallthroughs in the code where SUBSTR wasn't being set, which is how the above problem can happen. * ef484c00 src/linux-timer.h: Fix typo in linux-timer.h 2012-04-14 * 7c3385f4 src/components/bgpm/CNKunit/CVS/Entries src/components/bgpm/CNKunit/CVS/Repository src/components/bgpm/CNKunit/CVS/Root...: Removed CVS stuff from Q code. * 2cf4aeb2 src/configure src/configure.in src/linux-bgq.c...: Removed papi_events.csv parsing from Q code. (CVS stuff still needs to be taken care of.) 2012-04-12 * 153c2bb1 INSTALL.txt: Updated INSTALL notes for Q 2012-05-17 * ff6a43fb src/Makefile.in src/Makefile.inc src/components/README...: Added missing files for Q merge. Conflicts: src/configure src/configure.in src/freq.c 2012-04-12 * 0e142630 src/Rules.bgpm src/components/bgpm/CNKunit/CVS/Entries src/components/bgpm/CNKunit/CVS/Repository...: Added PAPI support for Blue Gene/Q. 2012-05-14 * ad7e3fa0 src/components/rapl/linux-rapl.c: Properly accumulate RAPL results Previously it was resetting the counts on read, instead of continuing to count as per other PAPI events. * c79e3018 src/components/rapl/tests/rapl_overflow.c: Fix some warnings in rapl_overflow test * 731afd1a src/components/rapl/tests/Makefile src/components/rapl/tests/rapl_overflow.c: Add rapl_overflow test This test sees if we can measure RAPL in conjunction with overflow CPU performance events. * b0e201bb src/components/rapl/utils/Makefile src/components/rapl/utils/rapl_plot.c: Remove derived "uncore" values from rapl tool They weren't really measuring uncore, but were just TOTAL - PP0. It was causing some confusion. 2012-05-09 * 547e9379 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump the version number to 4.9.0.0 Read 4.9 as pre-5.0 master was at version number 4.2.1, this was archaic... Sorry for the confusion Tushar, master is the correct branch for the latest development code. * 133e3d67 src/configure src/configure.in: Fix perfctr build In the FreeBSD changes I removed the CPU determination by reading /proc/cpuinfo as that was prone to failure and non-portable. This broke perfctr as it was doing a huge CPU name lookup to determine if it was on an x86 system or not. This change fixes that. 2012-05-08 * 42b21d67 src/papi_libpfm4_events.c: Fix PAPI event enumeration inside of VMware VMware disables the availability of architectural events when virtualized PMU is not available. libpfm4 was checkign this when enumerating events, and we would end up in the situation where ix86arch was marked active but 0 events were available. We didn't check for this error condition and thus end up thoroughly confused. 2012-05-07 * fd79a584 src/freebsd.c: Fix event enumeration on FreeBSD It was passing PAPI_OK in all cases, causing papi_native_avail to try to do things like report groups even when groups weren't available. * 53732c2e src/freebsd.c: Add Virtual Machine detection support to FreeBSD again, support for this on x86 is OS Neutral * 7b4d7c96 src/configure src/configure.in src/freebsd-memory.c...: Add x86_cacheinfo support to FreeBSD The x86 cache and memory info is OS-independent, so add support for it to FreeBSD. * 91033df6 src/Makefile.in src/Makefile.inc src/configure...: Re-enable predefined events on FreeBSD * 36f6dc1b src/freebsd.c src/freebsd/map.c src/freebsd/map.h: Modify FreeBSD to use _papi_load_preset_table * 45651746 src/freebsd.c: Cleanup the freebsd code a bit. * e1554ed8 src/configure: re-run autoconf for updated configure * 1deb2f5d src/Makefile.inc: Make sure a proper dependency for papi_events_table.h exists Our Makefile code that builds a shared library is way broken; it will fail to rebuild in many cases where the static library properly detects thing. * 28e28006 src/configure.in: Make papi_events_table.h build normally, not by configure. * 9a66dfa5 src/configure.in: Another place papi_events_table.sh is called * 12e4a934 src/Makefile.inc src/papi_events_table.sh: Make papi_events_table.sh take a command line argument This way we can use it on any .csv file, not just papi_events.csv * 7018528f src/freebsd/memory.c: Remove unused freebsd/memory.c file * 819e5826 src/freebsd_events.csv: Make freebsd_events.csv a valid PAPI event file * 9cc4a468 src/freebsd.c src/freebsd/map-atom.c src/freebsd/map-core.c...: Fix FreeBSD build on head. This temporarily disables preset events. There are also a few other minor fixes. 2012-05-01 * ab36c0a2 src/Makefile.inc src/configure src/configure.in: Update build system for FreeBSD * 2b61d8b7 src/freebsd.c src/freebsd.h: Fix various compiler warnings on FreeBSD * 2c0bcc84 src/freebsd.c: Enable new Westmere events on FreeBSD * b0499663 src/freebsd/map-i7.c src/freebsd/map-i7.h src/freebsd/map-westmere.c...: Add Westmere event support for FreeBSD * e54cabc6 src/ctests/inherit.c: Fix the inherit ctest to compile on FreeBSD * d9dbdd31 src/components/appio/appio.c: - change in appio component (appio.c): removed reference to .ntv_bits_to_info as it doesn't exist in the PAPI component interface. 2012-04-27 * 5d661b2d src/Rules.pfm src/Rules.pfm_pe: Add the libpfm -Wno-override-init bandaid to the other rules files. In b33331b66137668155c02e52c98a7e389fad402e we test if gcc -Wextra complains about some structure initialization that libpfm does. This was incoperated into Rules.pfm4_pe only. Jim Galarowicz noticed the other Rules files didn't have it. * 4349b6fd src/Rules.pfm4_pe src/Rules.pfm_pe: Cleanup the perf events Rules files. Steve Kaufmann reported that CONFIG_PFMLIB_OLD_PFMV2 is only used for libpfm3 builds targeting old versions of perfmon2. 2012-04-26 * 8a7fef68 src/mb.h: Add memory barries for ia64 2012-04-24 * 9af4dd4a src/libpfm4/README src/libpfm4/config.mk src/libpfm4/include/perfmon/perf_event.h...: Import libpfm4 git snapshot This brings libpfm4 up to 9ffc45e048661a29c2a8ba4bfede78d3feb828f4 The important change is support for Intel Atom Cedarview. 2012-04-20 * fac6aec0 src/linux-bgp-memory.c src/linux-bgp.c: Some BG/P cleanups. Removed a lot of dead code, noticed when looking for any potential BG/P issues. * 977709f6 src/linux-bgp-preset-events.c src/linux-bgp.c: Fix PAPI compile on BG/P Thanks to Harald Servat 2012-04-19 * 5207799e release_procedure.txt: Modified release_procedure.txt to push tags. 2012-04-18 * b248ae80 doc/Makefile: Have clean remove the doxygen error file. * 1d4f75a3 doc/Doxyfile-man1 doc/Doxyfile-man3: Fix an error in the Doxygen config files. Doxygen includes things with @INCLUDE not @include. The html file had this, the man page files did not... 2012-04-17 * 979cda20 cvs2cl.pl delete_before_release.sh gitlog2changelog.py...: Update the release machinery for git. gitlog2changelog.py takes the output of git log and parses it to something like a changelog. * 67bdd45f doc/Doxyfile-html: Cover up an instance of doxygen using full paths. Doxygen ( up to 1.8.0, the most recent at this writing ) would use full paths in directory dependencies ignoring the use relative paths config option. 2012-04-13 * c38eb0b7 src/libpfm-3.y/lib/intel_corei7_events.h src/libpfm-3.y/lib/intel_wsm_events.h src/libpfm-3.y/lib/pfmlib_intel_nhm.c: Add missing update to libpfm3 Somehow during all of the troubles we had with importing libpfm3 into CVS, we lost some Nehalem/Westmere updates. Tested on a Nehalem machine to make sure this doesn't break anything. * 193d8d06 src/papi_libpfm3_events.c: Fix max_multiplex case on perf_event/libpfm3 num_mpx_cntrs was being set to 512 even though the real maximum is 32, causing a buffer overflow and segfault. 2012-04-12 * f1f7fb5b src/threads.h: Fix minor typo in a comment * 0373957d src/linux-timer.c: Fix potential fd leak Noticed by coverity checker. * 71727e38 src/ctests/max_multiplex.c: Improve max_multiplex ctest on perfmon2, this test was failing because the maximum number of multiplexed counters was much more than the available counters we could test with. This change modifies the test to not fail in this case. 2012-04-11 * fdbdac9f src/perfmon.c: Fix the perfmon substrate. It was missing a _papi_libpfm_init() call, which meant the number of events was being left at 0. 2012-04-09 * 2a44df97 src/libpfm-3.y/examples_v2.x/multiplex.c src/libpfm-3.y/examples_v2.x/pfmsetup.c src/libpfm-3.y/examples_v2.x/rtop.c...: Catch a few libpfm-3.y files up to libpfm-3.10. More skeletons keep falling out of the cvs closet. This is just what diff -q -r catches. 2012-04-04 * 0e05da68 src/components/rapl/utils/Makefile src/components/rapl/utils/README src/components/rapl/utils/rapl_plot.c: Add the rapl_plot utility to the RAPL component. This utility uses PAPI to periodicly poll the RAPL counters and generate average power results suitable for plotting. There's been a lot of interest in this utility so it's probably useful to include it with the RAPL component. * 2daa03ac src/papi_internal.c: Check if a component is disabled at init time. This change modifies the code so that at PAPI_library_init() time we check the component disable field, and we don't call the init routines for components the user has disabled. This allows code like the following to happen _before_ PAPI_library_init(): numcmp = PAPI_num_components(); for(cid=0; cidname,"cuda")) { cmpinfo->disabled=1; strncpy(cmpinfo->disabled_reason,"Disabled by user",PAPI_MAX_STR_LEN); } } We might want to add a specific PAPI_disable_component(int cid) call of maybe even a PAPI_disable_component(char *name) as the above code causes compiler warnings since cmpinfo is returned as a const pointer. This all works because currently PAPI currently statically allocates all of the components at compile time, so we can view and modify the cmp_info structure before PAPI_library_init() is called. * 3fd2b21e src/components/appio/README src/components/appio/appio.c src/components/appio/appio.h...: Added support to count reads that are interrupted or would block. 2012-04-03 * dd3a192f release_procedure.txt: Change chmod flags for doxygen stuff from 755 to 775 to allow group write permissions. 2012-03-30 * deac54cc src/components/coretemp/linux-coretemp.c src/components/coretemp/tests/coretemp_basic.c src/components/coretemp/tests/coretemp_pretty.c...: Add new PAPI_enum_cmp_event() function This will be needed when we remove the 16-component limit. Currently in PAPI_enum_event() the component number is gathered from bits 29-26 of the eventcode. This won't work anymore once we remove those bits. Also update the various components to not use PAPI_COMPONENT_MASK() as this too will go away in the transition. * 48331cc9 src/configure src/configure.in src/papi.c...: Place all compiled-in components in the _papi_hwd[] array. Previously we had separate compiled_in[] and _papi_hwd[] arrays. At init time a pointer to the compiled_in[] was copied to _papi_hwd[] if initialization passed. This kind of code setup makes enumerating components hard, and finding info from non-available components would require additional function entry points. This change leaves all compiled in components to _papi_hwd[]. Availability of the component can be checked with the new "disabled" field. This will make enumeration support a lot easier to add. It can possibly cause user confusion if they try to access component structures directly without checking the "disabled" field first. This change should also make any eventual support for run-time component enabling/disabling a lot easier. * 66a72f44 src/papi.c: Documentation was referring to nonexistent "PAPI_enum_events()" The actual function we have is PAPI_enum_event() * 0f2c2593 src/components/coretemp/linux-coretemp.c src/components/lustre/linux-lustre.c src/components/mx/linux-mx.c...: Add support for reporting reason for failed component initialization. This change adds the fields "disabled" and "disabled_reason" to the component_info_t structure. At initialization time, PAPI will set the "disabled" field to the value returned by component init (that is PAPI_OK if OK or an error otherwise). This can be checked later to find why component init failed. Also provided is the "disabled_reason" string. The components can set this at failure time, and this can be printed later. For example, this is sample output of the updated papi_component_avail routine: - Compiled-in components: Name: perf_events.c Linux perf_event CPU counters Name: linux-rapl Linux SandyBridge RAPL energy measurements \-> Disabled: Not a SandyBridge processor Name: example.c A simple example component Name: linux-coretemp Linux hwmon temperature and other info \-> Disabled: No coretemp events found Name: linux-net.c Linux network driver statistics Name: linux-mx.c Myricom MX (Myrinet Express) statistics \-> Disabled: No MX utilities found Name: linux-lustre.c Lustre filesystem statistics \-> Disabled: No lustre filesystems found Active components: Name: perf_events.c Linux perf_event CPU counters Name: example.c A simple example component Name: linux-net.c Linux network driver statistics 2012-03-29 * d84b144e src/components/rapl/Rules.rapl src/components/rapl/linux-rapl.c src/components/rapl/tests/Makefile...: Add a SandyBridge RAPL (Running Average Power Level) Component This component allows energy measurement at the package-level on Sandybridge machines. To run, you need the Linux x86-msr kernel module installed and read permissions to /dev/cpu/*/msr The output from the rapl_busy test looks like this on a SandyBridge-EP machine: Trying all RAPL events Found rapl component at cid 2 Starting measurements... Doing a naive 1024x1024 MMM... Matrix multiply sum: s=1016404871450364.375000 Stopping measurements, took 3.979s, gathering results... Energy measurements: PACKAGE_ENERGY:PACKAGE0 175.786011J (Average Power 44.2W) PACKAGE_ENERGY:PACKAGE1 73.451096J (Average Power 18.5W) DRAM_ENERGY:PACKAGE0 11.663467J (Average Power 2.9W) DRAM_ENERGY:PACKAGE1 8.055389J (Average Power 2.0W) PP0_ENERGY:PACKAGE0 119.215500J (Average Power 30.0W) PP0_ENERGY:PACKAGE1 16.315216J (Average Power 4.1W) Fixed values: THERMAL_SPEC:PACKAGE0 135.000W THERMAL_SPEC:PACKAGE1 135.000W MINIMUM_POWER:PACKAGE0 51.000W MINIMUM_POWER:PACKAGE1 51.000W MAXIMUM_POWER:PACKAGE0 215.000W MAXIMUM_POWER:PACKAGE1 215.000W MAXIMUM_TIME_WINDOW:PACKAGE0 0.046s MAXIMUM_TIME_WINDOW:PACKAGE1 0.046s rapl_basic.c PASSED 2012-03-26 * b44d60ca src/components/appio/appio.c src/components/appio/appio.h src/components/appio/tests/appio_test_read_write.c: Added support for intercepting open calls. 2012-03-23 * 9e9fac4b src/Makefile.in src/Rules.pfm4_pe src/configure...: Fix the test case in configure at 0cea1848 Make use of the structure we're using for the override-init test case. * 0cea1848 src/configure src/configure.in: Doctor CFLAGS when testing for a gcc warning. -Wextra was not in CFLAGS when I attempted to check for the initialized field overwritten warning. So we set -Wall -Wextra -Werror when running the test code. 2012-03-22 * b33331b6 src/Makefile.in src/Rules.pfm4_pe src/configure...: Fix initialized field overwritten warning when building libpfm4 on some gcc versions. In gcc 4.2 or so, -Woverride-init was added to -Wextra causing issues with code like struct foo { int a; int b;}; struct foo bar = { .a=0, .b=0, .b=5; }; --Wno-override-init allows us to keep -Werror for libpfm4 compiles. 2012-03-21 * ae149766 src/papi_internal.h: Delete an old comment. Yes, Dan in 2003, we should and do use MAX_COUNTER_TERMS as the size of the event position array. 2012-03-20 * b937cdd8 src/papi_user_events.c: Move the user events code over to using the new preset event data structure. 2012-03-14 * 6ca599e2 src/papi_internal.c: Fix a small memory leak. We weren't freeing _papi_hwd, causing a lot of MEM_LEAK warnings in buildbot. 2012-03-13 * 473b8203 src/aix.h src/configure src/configure.in...: Remove last MY_VECTOR usage. Have configure explicitly set the name of the perf counter substrate vector in the components_config.h file This removes one more special case, and gets us slightly closer to being able to have multiple CPU substrates compiled in at once. * 360c3003 src/papi.c src/papi_libpfm3_events.c src/papi_libpfm_events.h...: Clean up the papi_libpfm3_events.c code. Move code that was perfctr specific into perfctr-x86.c * 03de65e3 src/libpfm-3.y/examples_v2.x/multiplex.c src/libpfm-3.y/examples_v2.x/pfmsetup.c src/libpfm-3.y/examples_v2.x/rtop.c...: Fix some libpfm3 warnings. libpfm3 is not maintained anymore, so applied these changes locally. libpfm3 is compiled with -Werror so they broke the build with newer gcc even though they are just warnings in example programs. * ad490353 src/ctests/zero_named.c src/utils/multiplex_cost.c: Fix a few compiler warnings in the tests. * a0fec783 src/linux-timer.c: Fix another linux-timer.c compile problem. I hadn't tested with debug enabled, so all of buildbot failed last night. 2012-03-12 * a3733ecd src/linux-timer.h: Fix typo in the linux-timer.h header _linux_get_virt_usec_timess should have been _linux_get_virt_usec_times Thanks to Steve Kaufmann for noticing this. * 785db5ae src/linux-common.c src/linux-timer.c: Fix timer compile on Power machines Power, ARM, and MIPS have no get_cycles() call so provide a dummy function on these architectures. * 708090ee src/linux-common.c src/linux-timer.h: Another fix for non-POSIX timers The recent changes had the name of the fallback usec method wrong. * 88e8d355 src/papi_libpfm3_events.c: Fix a warning in the libpfm3 code. * 8ca63705 src/configure src/configure.in src/linux-common.c...: Fix build when not using POSIX timers The PAPI build system was being overly clever with how it defined what kind of wall clock timers were to be used, so of course I broke things when breaking the timer code out to make it a bit more understandable. This patch breaks out the timer define into two pieces; one saying it's a POSIX timer and one saying whether to use HR timers or not. 2012-03-09 * b69ad727 src/linux-common.c src/linux-timer.c src/linux-timer.h: Add Linux posix gettime() nanosecond functions * af2c9a49 src/papi.c src/papi_vector.c src/papi_vector.h: Add ->get_virt_nsec() and ->get_real_nsec() OS vectors Currently PAPI was just cheating and running the usec functions and multiplying by 1000. Make this the default, but allow the OS code to override if they have timers capable of returning nsec percision. * 24c68dbe src/aix.c src/freebsd.c src/linux-bgp.c...: Clean up ->get_virt_usec() It no longer needs to be passed a context, so remove that from all callers. Also, ->get_virt_cycles() was just a get_virt_usec()*MHz on most platforms. While this is a bit dubious (especially as MHz can't be relied on) make this a common routine that will be added at innoculate time if ->get_virt_cycles() is set to NULL. * a3ef7cef src/linux-common.c src/linux-timer.c src/linux-timer.h: Cleanup the Linux timer code. Split things up a bit to make the code more readable. * 50ce8ea0 src/papi_internal.c: Change a strcpy() to strncpy() just to be a bit safer. * 0526b125 src/components/lmsensors/linux-lmsensors.c: Fix buffer overrun in lmsensors component * b088db70 src/libpfm4/config.mk src/libpfm4/docs/man3/pfm_get_os_event_encoding.3 src/libpfm4/examples/showevtinfo.c...: Update to current git libpfm4 snapshot * ccb45f61 src/aix.c src/extras.c: Fix segfault on AIX During some of the cleanups, the extras.h header was not added to aix.c This made some of the functions (silently) use default data types for the function parameters, leading to segfaults in some of the tests. 2012-03-08 * 1cb22d0b src/components/coretemp/linux-coretemp.c src/utils/native_avail.c: Make "native_avail -d" report units if available Add units support to the coretemp component, have native_avail -d (detailed mode) print it to make sure it works. * 9c54840e src/extras.c src/extras.h src/papi_internal.c...: Add new ntv_code_to_info vector This will allow components to return the extended event_info data for native events. If a component doesn't implement ntv_code_to_info then get_event_info falls back to the old way of just reporting symbol name and long description. * c4579559 src/papi.h: Add new event_info fields New fields are added to event_info that allow passing on extended information. This includes things such as measurement units, data type, location, timescope, etc. * 17533e4e src/ctests/all_events.c src/ctests/derived.c src/ctests/kufrin.c...: Restore fields to event_info structure The changes made were probably too ambitious, even for a 5.0 release. In the end it looks like we can remain API compatible while just using up a little more memory. We can still save space by shrinking preset_t behind the scenes. * 6f13a5f6 src/aix.c src/components/coretemp/linux-coretemp.c src/components/coretemp_freebsd/coretemp_freebsd.c...: Remove ->ntv_bits_to_info vector from component interface We weren't using it anymore, and many of the components were just setting it to NULL unncessarily. We'll be replacing the functionality soon with ntv_code_to_info * 401f37bc src/components/example/example.c src/ctests/subinfo.c src/papi.h: Remove invert and edge_detect fields from component info These fields were there to indicate if a CPU component supported these attributes (for Intel processors) but in the end we never used these. The proper way to export this info is during event enumeration. * f32fe481 src/papi_events.csv: We had the PAPI_VEC_INS preset wrong on amd fam12h llano * 38a8d8a7 src/ctests/multiplex2.c src/papi_preset.c: Fix preset adding code to be more robust. If an invalid event is in a preset definition, we'd currently add it with an eventcode of 0 to the preset, which would break if you tried to use the event. This change properly prints a warning in this case, and sets the preset to be unavailable. * 2591a546 src/ctests/val_omp.c src/ctests/zero_omp.c: Remove the hw_info field from add_two_events calls. Two ctests missed the bus when Vince reworked the add_two_events call. * 358a2e32 src/papi_internal.c src/papi_preset.c: Fix segfault seen on an AMD fusion machine With the recent preset and component hanges, we were not properly resetting papi_num_components if PAPI_library_init()/PAPI_shutdown() was called multiple times. 2012-03-07 * 7751f5d8 src/ftests/zeronamed.F: Fix a compile error on aix. Dan ran over 72 characters on a single line. xlf actually enforced that part the Fortran spec. 2012-03-06 * 1c87d89c src/ftests/Makefile src/ftests/zeronamed.F src/papi_fwrappers.c: Add support for {add, remove, query}_named to Fortran interface; add zero named.F test case; modify ftests Makefile to support "all" tag. * 71bd4fdd src/configure src/configure.in: Modify configure to define the default FTEST_TARGETS as "all" * 54e39855 src/components/vmware/vmware.c: Changed tri8ggering environment variable to PAPI_VMWARE_PSEUDOPERFORMANCE per Vince's earlier email. This should complete all the VMware component changes. 2012-03-05 * 845503fb src/Makefile.inc: Add missing MISCSRCS line to Makefile.inc This was breaking the shared library build 2012-02-01 * 11be8e4b .../appio/tests/appio_test_fread_fwrite.c src/components/appio/tests/appio_test_pthreads.c src/components/appio/tests/appio_test_read_write.c: updated these tests to print timing information * 9ad62ab1 src/components/appio/README src/components/appio/appio.c src/components/appio/appio.h...: Added support for timing I/O calls. Updated tests and README. 2012-01-31 * beaa5ff0 src/components/appio/tests/iozone/Changes.txt src/components/appio/tests/iozone/Generate_Graphs src/components/appio/tests/iozone/Gnuplot.txt...: added the latest stable iozone to the appio tests. * 4af58174 src/components/appio/README src/components/appio/tests/Makefile src/components/appio/tests/init_fini.c: added a hook to run the appio test for iozone. 2012-01-21 * 15c733cf src/components/appio/CHANGES src/components/appio/README src/components/appio/appio.c...: Removed stray 'net' references. All remaining references are only for the purpose of giving credit. Updated change log. 2012-01-20 * ca4b6785 src/components/appio/README src/components/appio/appio.c src/components/appio/tests/appio_list_events.c...: - general cleanup - improved tests to be quiet and be conform to other PAPI tests - replaced hardwire constants in appio.c with symbolic ones - tests will now write to /dev/null to avoid filling the terminal screen with useless text - more comments added - @author added to files - updated README 2012-01-18 * bb22ed9f src/components/appio/README src/components/appio/Rules.appio src/components/appio/appio.c...: - Added support to measure bytes/calls/eof/short calls for read/write calls. - Interception of read/write and fread/fwrite calls. - Works for static and dynamic linkage (without need for LD_PRELOAD) - Tested OK on 32-bit i686 Linux 2.6.38. Tushar 2011-12-03 * d58b34b6 src/components/appio/tests/Makefile src/components/appio/tests/appio_list_events.c src/components/appio/tests/appio_values_by_code.c...: *** empty log message *** * cd7d7acc src/components/appio/tests/appio_values_by_name.c: file appio_values_by_name.c was added on branch appio on 2011-12-03 05:22:06 +0000 * 425e4d09 src/components/appio/tests/appio_values_by_code.c: file appio_values_by_code.c was added on branch appio on 2011-12-03 05:22:06 +0000 * 596ad9bb src/components/appio/tests/appio_list_events.c: file appio_list_events.c was added on branch appio on 2011-12-03 05:22:06 +0000 * 119543dc src/components/appio/tests/Makefile: file Makefile was added on branch appio on 2011-12-03 05:22:06 +0000 2012-03-05 * ba748a41 src/components/vmware/configure: Remove old configuration parameters from vmware/configure 2012-03-02 * 2b7e2abb src/ctests/Makefile src/ctests/max_multiplex.c: Add a new max_multiplex test This tries to use the maximum number of multiplexed events. This was written in response to the 32/64 perf_event multiplexed event limit reported by Mohammad j. Ranji * a0985ff5 src/multiplex.c src/papi_internal.c src/papi_libpfm4_events.c...: Fix issue when using more than 32 multiplexed events on perf_event On perf_event we were setting num_mpx_cntrs to 64. This broke, as the MPX_EventSet struct only allocates room for PAPI_MPX_DEF_DEG events, which is 32. This patch makes perf_event use a value of 32 for num_mpx_cntrs, especially as 64 was arbitrarily chosen at some point (the actual value perf_event can support is static, but I'm pretty sure it is higher than 64). * 331c516c src/ctests/acpi.c: Remove the acpi.c file from ctests It wasn't being built, and we removed the ACPI component a while ago. * 73e7d191 src/components/vmware/vmware.c: Removed all old references to #define VMWARE_PSEUDO_PPERF and switched over to getenv 2012-03-01 * 969b8aa9 src/ctests/Makefile src/ctests/zero_named.c src/papi.c: Three new APIs: PAPI_query_named_event PAPI_add_named_event PAPI_remove_named_event and a new test: zero_named Still to do: maybe test named native events and support Fortran * 97bf9bf8 src/papi.c src/papi.h: First pass implementation of {add, remove, query}_named_event * 2416af88 src/components/vmware/vmware.c: Add functionality to getenv selectors * 297f9cd6 src/papi.c: Fix possible race in _papi_hwi_gather_all_thrspec_data The valgrind helgrind tool noticed this with the thrspecific test * be599976 src/papi_internal.c: Add some locking in _papi_hwi_shutdown_global_internal This caused a glibc double-free warning, and was caught by the Valgrind helgrind tool in krentel_pthreads There are some other potential locking issues in PAPI_shutdown, especially when debug is enabled. * 8444d577 src/utils/clockres.c src/utils/command_line.c: Cleanup the oxygen markup for the utilities. * 7144394f doc/Doxyfile-html: Missed a recursive tag for the html config file. * 63b2efc4 src/papi_preset.c: Fix segfaults in tests on AMD machines The papi_preset code was wrongly calling papi_free() on some code that was allocated with strdup() (not with papi_malloc). We were only noticing this on AMD machines because it was the code for freeing developer notes in presets, and currently only AMD events have developer notes. * 0b1350df src/linux-common.c: Touch 'virtual_vendor_name' to cleanup a warning on bluegrass. 2012-02-29 * 1f17b571 src/Makefile.inc src/Rules.perfctr-pfm src/Rules.pfm4_pe...: Merge the contents of papi_libpfm_presets.c into papi_preset.c The code isn't libpfm specific at all anymore, it's the generic "read presets from a file" code. It makes more sense to find it in papi_presets.c * d087d49f src/papi_fwrappers.c: Fix Fortran breakage after the preset event changes * 156141ec src/papi_libpfm_presets.c src/papi_preset.c src/papi_preset.h: Simplify papi_libpfm_presets.c Previously adding presets from papi_events.csv was a three step process. 1. Load the presets from the file, put in temporary structure. 2. Convert this temporary structure to a "findem" dense structure 3. Pass this dense structure to _papi_hwi_setup_all_presets for final assignment. This change creates the final assignment directly without the intermediate two steps. * 8bc2bafd src/papi.c src/papi.h src/papi_common_strings.h...: Make the internal preset_info match the one exported by papi.h There were a lot of cases where the same structure fields were available, just with different names. That was confusing. Also, this allows using a pointer to the preset info instead of having to copy values out of the structure when gathering event info for presets. * 8fda68cb src/genpapifdef.c src/papi.c src/papi_common_strings.h...: Merge the 4 separate preset structs into one. _papi_hwi_presets was a structure containing pointers to 4 other arrays of structures which held the actual preset data. This change merges all of these into one big structure. 2012-02-28 * e69815d7 src/linux-bgp.c src/papi_internal.c src/papi_internal.h...: Removing remaining vestiges of references to bipartite routines. Now the only references are in papi_bipartite.h, perfctr-x86.c and winpmc-p3.c. * 5766b641 src/papi_bipartite.h src/perfctr-x86.c src/win2k/substrate/winpmc-p3.c: These changes implement the bipartite allocation routine as a header file to be included in whatever cpu component needs it. Right now, that's just perfecter-x86 and windows. Both components have been modified and perfecter-86 compiles cleanly. Neither has been tested since I don't have access to a test bed. * 7f444b76 src/papi_libpfm_presets.c src/papi_preset.c src/papi_preset.h: Merge the hwi_dev_notes structure into hwi_preset_data * 21a1d197 src/components/vmware/vmware.c: add getenv * 08c1b474 src/perfctr-x86.c: Merge bipartite routine into perfecter-x86 component, since this is effectively the only place it is used. * 9ed9b1f5 src/papi.c: Remove a reference to PAPI_set_event_info() which was removed for PAPI 4 * c626f064 src/ctests/all_events.c src/ctests/derived.c src/ctests/kufrin.c...: Convert PAPI_event_info_t to separate preset event info This moves the preset event info to its own separate structure, which reduces greatly the large string overhead that is not used by the native events. * 787d6822 src/perfctr-x86.c: Move bipartite stuff to perfctr_x86 since that's really the only place it's currently used. * 229c8b41 src/components/vmware/vmware.h: Add env_var definition to vmware.h * 46aaf6ca src/components/vmware/vmware.c: Remove all unneeded cases * 874a5718 src/freebsd.c src/perfctr-ppc64.c: Remove more unused references to .bpt_ routines in preparation for refactoring. * 74e5a5fd src/components/vmware/vmware.h: Remove uneeded defines from vmware.h header * 58b51367 src/components/coretemp_freebsd/coretemp_freebsd.c src/components/vmware/vmware.c src/solaris-niagara2.c...: Remove unused references to .bpt_ routines in preparation for refactoring. 2012-02-27 * 6b184158 src/Makefile.inc src/components/coretemp/linux-coretemp.c src/configure...: Have separate concept of "compiled-in" versus "active" components With this change, the _papi_hwd[] component info array only contains a null-terminated list of _active_ components. The _papi_compiled_components[] array has the original full list. At init_substrate[] time a pointer to a component is only put in the _papi_hwd[] list if it is successfully initialized. In addition the PAPI_num_compiled_components() and PAPI_get_compiled_component_info() calls have been added, but this is probably a confusing interface so they might only be temporary additions. * 042bfd5b src/Makefile.inc src/papi.c src/papi_data.c...: Split the contents of papi_data.c to various other files. The data declarations in papi_data.c were mostly used in other files. Move these into more relevant locations. * 1877862c src/papiStdEventDefs.h src/papi_common_strings.h: Remove the BGL and BGP specific pre-defined events. They can be better replaced by user-events, and we also had already removed BGL support completely a while back. This removes some ifdefs from the pre-defined event list and keeps future pre-defined events from having different eventcodes on different platforms. * c3986b79 src/components/coretemp/linux-coretemp.c src/components/cuda/linux-cuda.c src/components/infiniband/linux-infiniband.c...: Add names and descriptions for components. Also fixes cuda and lmsensors build issues introduced by vector.h cleanup * 2c84f920 src/aix.c src/freebsd.c src/perf_events.c...: Add names and descriptions to all of the CPU substrates. * 9f3e634a src/components/example/example.c src/papi.h src/utils/component.c: Add new "description" and "short_name" fields to .cmp_info structure This description field allows components to provide extra information on what they do. The short_name field will eventually be used to pre-pend event names. The papi_component_avail utility has been updated to print the description. The example component was updated to fill in these values. * ab61c9a7 src/Makefile.inc src/genpapifdef.c src/papi_common_strings.h...: Split papi_data.c into two parts papi_data.c was half data structure definitions for all of PAPI and half string definitions used by both PAPI *and* genpapifdef This splits the common string definitions to papi_common_strings.h so that genpapifdef can still be built w/o linking libpapi.a while making the code a lot easier to follow. * b8e6294c src/solaris-ultra.c: Remove unncessary extern declarations from solaris-ultra.c. * 5ddaff91 src/sys_perf_event_open.c: Remove unncessary extern declarations from sys_perf_event_open.c * a6c463b7 doc/Doxyfile-common.config: Create a common config file for doxygen. As part of streamlining the doxygen process, this is a new template doxygen config files. This is a blank template file generated by doxygen 1.7.4 (the version currently mandated by the release procedure ) * dc2c11fa src/aix.c src/aix.h src/perfmon.c...: The vector pre-definition should be in the .c file, not the .h file * 0b3c83c3 src/perf_events.c: Remove unnecessary extern declarations in perf_events.c * b93efca0 src/perfmon.c src/perfmon.h: Remove unnecessary extern declarations in perfmon.c * 7f7a2359 src/papi_preset.c: Remove unnecessary extern declarations from papi_preset.c * ecec03ad src/papi_libpfm_presets.c: Remove extraneous extern declarations from papi_libpfm_presets.c * 7b5f3991 src/extras.c: remove extraneous extern declarations from extras.c * f6470e4d src/aix-memory.c src/aix.c src/aix.h: Remove unncessary extern declarations from aix.c * f197d4ab src/papi_data.h src/papi_internal.c: Remove unncessary extern declarations in papi_internal.c * e7b39d48 src/papi.c src/papi_data.c src/papi_data.h...: remove unnecessary extern definitions from papi.c 2012-02-24 * 92689f62 src/configure src/configure.in src/linux-common.c...: Add a --with-pthread-mutexes option to enable using pthread mutexes rather than PAPI custom locks This is useful when running on new architectures that don't have a custom PAPI lock coded yet, and also for running valgrind deadlock detection utilities that only work with pthread based locking. * ca51ae67 src/papi_events.csv: Fix broken Pentium 4 Prescott support We were missing the netbusrt_p declaration in papi_events.csv * f6460736 src/linux-common.c: Fix build on POWER, broken by the virtualization change. * 91d32585 src/perfctr-x86.c src/perfmon.c: Fix some warnings that have appeared due to recent changes. * ae0cf00f src/linux-common.c src/papi_libpfm3_events.c src/papi_libpfm4_events.c...: Clean up the Linux lock files The locking primitives for some reason were spread among the libpfm code and the substrate codes. This change moves them into linux-common and has them part of the OS code. This way they will get properly initialized even if the perf counter or libpfm code isn't being used. 2012-02-23 * 88847e52 src/papi.c src/papi_memory.h: Remove _papi_cleanup_all_memory define from papi_memory.h The code in papi_memory.h said: /* define an alternate entry point for compatibility with papi.c for 3.1.x*/ /* this line should be deleted for the papi 4.0 head */ Since we are post papi-4.0 I thought it was time to act on this. Of course papi.c was still using the old name in one place. * 1d29dfc6 src/papi_libpfm_presets.c src/perfctr.c src/perfmon.c: Fix some missing includes found after the header cleanup. * b425a9f4 src/Makefile.inc src/extras.c src/extras.h...: Header file cleanup The papi_protos.h file contained a lot of no-longer in use exports. I split up the ones that are still relevant to header files corresponding to the C file that the functions are defined in. * 07199b41 src/extras.c src/papi_vector.c src/papi_vector.h: Clean up the papi_vector code. Remove things no longer being used, mark static functions as static. * d7496311 src/linux-common.c src/x86_cpuid_info.c src/x86_cpuid_info.h: Fix a missing "return 1" which meant that the virtualization flag wasn't being set right. With this fix, on saturn-vm1 we now get: Running in a VM : yes VM Vendor: : VMwareVMware in the papi_native_avail header * 8da36222 src/freebsd.c src/linux-bgp.c src/papi.c...: Remove the ->add_prog_event function vector As far as I can tell this is a PAPI 2.0 remnant that was never properly removed. This also removes PAPI_add_pevent(), PAPI_save(), and PAPI_restore(), none of which were exported in papi.h so in theory no one could have been using them. Also removes _papi_hwi_add_pevent() * a5f3c8b5 src/aix.c src/freebsd.c src/linux-timer.c...: Reduce the usage of MY_VECTOR whenever possible. This is an attempt to make the cpu-counter components to be as similar as possible to external components. * abbcbf29 src/any-null.h: Missed removing any-null.h during the any-null removal. * 665d4c5c src/linux-common.c: Somehow missed an include during the virtualization addition. * 0c06147b src/perfctr-2.6.x/usr.lib/event_set_centaur.os src/perfctr-2.6.x/usr.lib/event_set_p5.os src/perfctr-2.6.x/usr.lib/event_set_p6.os: Removes the last of the binary files from perfctr2.6.x Some binary files were left out in the cold after a mishap trying to configure perfctr for the build test. * 3acb7d57 src/Makefile.inc src/configure src/configure.in...: Add support for reporting if we are running in a virtualized environment to the PAPI_hw_info_t structure. This currently only works on x86. it works by looking at bit 31 of ecx after a cpuid (the "in a VM" bit) and then using leaf 0x40000000 to get the name of the VM software (this works for VMware and Xen at least) x86_cache_info.c was renamed to x86_cpuid_info.c to better reflect what goes on in that file (it does various things based on the cpuid instruction). the testlib header was updated to report virtualization status in the papi header (printed for things like papi_native_avail). 2012-02-22 * 9c7659b5 src/Makefile.inc src/freq.c: Remove the freq.c file as nothing seemed to be using it. * d205e2d3 src/perfctr-x86.c: Made a stupid typo when converting perfctr to call libpfm functions with the component id. * 25b41779 src/papi_libpfm3_events.c src/papi_libpfm4_events.c src/papi_libpfm_events.h...: When updating the preset code to take a component index I missed a few callers. * a713ffb1 src/papi_internal.c src/papi_vector.c: Remove any-null component * 27e1c2c5 src/any-null-memory.c src/any-null.c src/any-proc-null.c...: Remove the any-null component. * 25779ae0 PAPI_FAQ.html: Saving another version of the FAQ after adding a git section, and removing several obsolete sections. These questions still need detailed review for relevance and timeliness. * 449a1a61 src/ctests/overflow_allcounters.c: Fix overflow_allcounters which was making assumptions about component 0 existing. * f21be742 src/ctests/hwinfo.c: Make the hwinfo test not bail out if no counters are available. * ebc675e6 src/ctests/memory.c: Make sure the memory ctest runs even if no components are available. * 9b3de551 src/linux-common.c src/perf_events.c src/perfmon-ia64.c...: Make sure the system info init happens at os init time. Otherwise the system info never gets set if a perfcounter component isn't available. * 59e47e12 src/papi_internal.c: Make sure that _papi_hwi_assign_eventset() does the right thing if no components are available. * dd51e5d6 src/ctests/api.c: The api test would fail in the no cpu component case. Fix it to properly check for errors before attempting to run high-level PAPI tests. * 069e9d2f src/aix.c src/papi.c src/papi_internal.h...: Fix code that was depending on _papi_hwd[0] existing. Most of this was in the presets code. The preset code had many assumptions so that you can only code presets with component[0]. This fixes some of them by passing the component index around. * 7259eaec src/papi_vector.c: Fix up papi_vector to get rid of some warnings introduced on AIX. * 16fe0a61 src/aix.c src/solaris-ultra.c: Fix two last substrates where I missed some fields in the OS structure conversion. * 625871ec src/perfmon.c: Missed a cmp_info field in perfmon.c * 680919d9 PAPI_FAQ.html: Saving the latest version of the FAQ before undertaking major revisions. * 3d4fa2e5 src/linux-timer.c src/perfctr-x86.h: Fix the perfctr code to compile if configured with --with-virtualtimer=perfctr * bbd7871f src/perfctr.c: Missed two OS vector calls in the perfctr code during the conversion. * bc6d1713 src/Makefile.inc: Removed one of the two instances of MISCOBJS listed in Makefile.inc. 2012-02-21 * 40bc4c57 src/papi_vector.c src/papi_vector.h: Remove now-unused OS vectors from the main papi vector table. * 3c6a0f7b src/aix.c src/freebsd.c src/linux-bgp.c...: Convert PAPI to use the _papi_os_vector for the operating-system specific function vectors. * 568abad5 src/papi_vector.h: Add new _papi_os_vector structure to hold operating-system specific function vectors. * a39d2373 src/ctests/subinfo.c: Missed removing a field from the subinfo ctest. * 1d930868 src/papi.h: Remove fields now in PAPI_os_info_t from the component_info_t struct. * d397d74a src/components/example/example.c: Remove fields now in PAPI_os_info_t from the example component. * 8cd5c8e0 src/aix.c src/freebsd.c src/linux-bgp.c...: Modify all the substrates to use _papi_os_info. instead of _papi_hwd[0]->cmp_info for the values moved to the OS struct * 58855d3a src/papi_internal.h: Add padding for future expansion to PAPI_os_info_t Add _papi_hwi_init_os(void); definition * ea1930e1 src/papi_internal.h: Add new PAPI_os_info_t structure to papi_internal.h * 0eac1b29 src/utils/multiplex_cost.c: Modify multiplex_cost to properly use the API_get_opt() interface to get itimer data, rather than directly accessing the fields from the cmp_info structure. This would have broken after the OS split. * 87c2aa2f src/ctests/subinfo.c: subinfo was printing itimer data from the cmpinfo structure. These values will not be in cmpinfo once the OS split happens. * f2c62d50 src/components/vmware/vmware.h: Clean up the VMware Header a bit 2012-02-17 * 6f0c1230 src/aix.c src/components/coretemp/linux-coretemp.c src/components/coretemp_freebsd/coretemp_freebsd.c...: The git conversion reset all of the CVS $Id$ lines to just $Id$ Since we depend on the $Id$ lines for the component names, I had to go back and fix all of them to be the component names again. * 2d208d0e src/perfctr-2.6.x/usr.lib/event_set_centaur.o src/perfctr-2.6.x/usr.lib/event_set_p5.o src/perfctr-2.6.x/usr.lib/event_set_p6.o: Remove a few binary files in perfctr-2.6.x * f78bf1af src/libpfm-3.y/Makefile src/libpfm-3.y/README src/libpfm-3.y/docs/Makefile...: More cleanups from the migration, latest version of libpfm-3.y perfctr-2.[6,7] Version numbers got really confused in cvs and the git cvsimport didn't know that eg 1.1.1.28 > 1.1 ( see perfctr-2.6.x/CHANGES revision 1.1.1.28.6.1 :~) * e8aa2e61 INSTALL.txt: Explicitly state that 3.7 was the last version of PAPI with good windows support. * 546901fa src/components/cuda/linux-cuda.c: Modified CUDA component so that a PAPI version - that was configured with CUDA - will successfully build on a machine that does not have GPUs. 2012-02-16 * 49d9f71c src/.gitignore: Add a .gitignore file with the files that PAPI autogenerates. This way they won't clutter up "git status" messages papi-5.3.0/ChangeLogP410.txt0000600003276200002170000003621312247131117015103 0ustar ralphundrgrad2010-06-21 terpstra * src/Makefile.in 1.52: * src/configure 1.224: * src/configure.in 1.224: Change version numbers in anticipation of the impending 4.1 release. 2010-06-18 vweaver1 * src/components/example/example.c 1.4: Correct a comment. 2010-06-18 ralph * doc/Doxyfile 1.5: * doc/Doxyfile-everything 1.2: Upped the version number in doxygen config files for upcoming release. * INSTALL.txt 1.47: Friday afternoon typo... the command given for generating all documentation was wrong * src/components/lustre/linux-lustre.c 1.6: * src/components/lustre/linux-lustre.h 1.5: Fixed some of the comments to get doxygen's attention /* -> /** I'm still working out how to best do the papi_components group but for now I just put the .h file for the component into the group. (@ingroup papi_components) So that one file per component shows up listing. * src/papi.h 1.208: Added a small section about components on the main doxygen generated page. 2010-06-17 jagode * src/components/lustre/Rules.lustre 1.3: * src/components/lustre/host_counter.c 1.2: * src/components/lustre/host_counter.h 1.2: Added new component for infiniband devices. Major changes for lustre component. * src/components/README 1.4: Added documentation (Doxygen) for InfiniBand (and lustre) component. 2010-06-15 ralph * src/components/acpi/linux-acpi.c 1.3: * src/components/acpi/linux-acpi.h 1.2: * src/components/lmsensors/linux-lmsensors.h 1.3: * src/components/mx/linux-mx.h 1.2: * src/components/net/linux-net.h 1.2: * src/papi.c 1.360: * src/papi_hl.c 1.85: * src/utils/avail.c 1.53: * src/utils/clockres.c 1.25: * src/utils/command_line.c 1.15: * src/utils/cost.c 1.40: * src/utils/decode.c 1.9: * src/utils/event_chooser.c 1.18: * src/utils/mem_info.c 1.17: * src/utils/native_avail.c 1.47: Added documentation for the several components. Doxygen will now search recursivly under the components directory for documented *.[c|h] files ( /** @file */ somewhere in it). Several other files got brief descriptions of what is in the file. 2010-06-14 terpstra * papi.spec 1.9: Minor tweak to make sure libpfm builds without warnings. 2010-06-11 jagode * src/components/lmsensors/linux-lmsensors.c 1.2: removed compiler warnings for lm-sensors component; switched to stderr so that papi_xml_event_info creates a clean output. 2010-06-11 bsheely * src/ctests/api.c 1.2: Added first few api test cases 2010-06-10 bsheely * src/ctests/papi_test.h 1.39: * src/ctests/test_utils.c 1.82: Added test_fail_exit for use in single threaded tests 2010-06-09 vweaver1 * src/perfctr-2.6.x/patches/aliases 1.13: * src/perfctr-2.6.x/usr.lib/Makefile 1.31: Fix conflicts from import. * src/perfctr-2.6.x/CHANGES 1.1.1.28: ... * src/perfctr-2.6.x/usr.lib/x86.c 1.1.1.11: Import of perfctr 2.6.41 2010-06-07 bsheely * src/any-null.c 1.60: * src/freq.c 1.1: * src/papi_vector.c 1.31: Moved timer impl from any-null.c into papi_vector.c and added generic functionality to compute frequency if unable to determine based on platform * src/papi_data.c 1.40: * src/papi_data.h 1.6: Added new error code * src/Makefile.inc 1.163: Added freq.c to build * src/configure 1.223: * src/configure.in 1.223: ctests/api (not yet implemented) added to default ctests 2010-06-03 bsheely * src/ctests/Makefile 1.155: Initial commit for ctests/api which is not yet implemented 2010-06-02 bsheely * src/papi_lock.h 1.7: Fixed for BG/P 2010-06-01 vweaver1 * README 1.10: Fix typo in README 2010-06-01 bsheely * src/config.h.in 1.13: Added code to define _rtc when Cray is compiled with gcc * src/cycle.h 1.4: Rolled back previous changes 2010-05-27 bsheely * src/papi_internal.c 1.158: * src/threads.h 1.15: --with-no-cpu-component renamed --with-no-cpu-counters * src/components/mx/configure 1.3: * src/components/mx/configure.in 1.3: Rollback last change * src/ctests/multiattach.c 1.8: * src/ctests/zero_attach.c 1.8: Attempt to fix xlc compile errors 2010-05-21 bsheely * src/Rules.perfctr 1.66: * src/Rules.perfctr-pfm 1.57: * src/Rules.pfm 1.57: * src/Rules.pfm_pe 1.18: Use MISCHDRS from configure 2010-05-20 bsheely * src/components/mx/linux-mx.c 1.2: Fixed compile error and warnings. Added option to configure 2010-05-19 terpstra * src/ctests/all_native_events.c 1.24: Hard-code an exception for Nehalem OFFCORE_RESPONSE_0. This event can't be counted because it uses a shared chip-level register. 2010-05-19 bsheely * src/linux-ia64-memory.c 1.25: * src/linux-ia64.c 1.183: * src/pfmwrap.h 1.43: Fixed warning in ia64 * src/components/net/linux-net.c 1.2: Fixed compile warnings * src/Makefile.in 1.51: Extra compiler warning flags are not added until after the libpfm build 2010-05-14 vweaver1 * src/linux-bgp.c 1.5: Temporary fix to emulate cycles HW counter on BlueGeneP using the get_cycles() call. 2010-05-13 bsheely * src/x86_cache_info.c 1.13: added missing C library headers * src/hwinfo_linux.c 1.7: fixed compile errors on torc0 by including missing C library headers * src/ftests/Makefile 1.66: * src/utils/Makefile 1.16: Replaced missing MEMSUBSTR macro in configure. AC_ARG_ENABLE macros replaced with AC_ARG_WITH macros. Continued changes for -- with-no-cpu-component 2010-05-07 ralph * doc/Doxyfile-everything 1.1: * doc/Makefile 1.1: Added makefile in doc to generate user and developer documentation. from src, make doc builds the user documentation in doc/html (do we want this?) 2010-05-07 jagode * src/utils/event_info.c 1.14: papi_xml_event_info generated some invalid xml output. This bug was introduced in Revision 1.10 2010-05-07 bsheely * src/any-null-memory.c 1.11: * src/any-null.h 1.23: * src/extras.c 1.170: * src/multiplex.c 1.85: * src/papi_preset.c 1.29: * src/papi_vector.h 1.14: * src/threads.c 1.36: Added --with-no-cpu-component option which has only been tested on x86 2010-05-03 ralph * src/freebsd-memory.c 1.1: * src/freebsd.c 1.9: * src/freebsd.h 1.6: * src/papi_fwrappers.c 1.86: Updated Harald Servat's freebsd work to Component Papi. Has had cursory testing, but should be considered alpha quality. (there is a really nasty bug when running the overflow_pthreads test) * src/genpapifdef.c 1.43: Removed a holdout from catamount support, are there any platforms where we don't get malloc from stdlib? 2010-05-03 bsheely * src/papi_table.c 1.5: Removed obsolete file 2010-04-30 terpstra * release_procedure.txt 1.17: Add a few more steps on testing a patch. 2010-04-30 bsheely * src/components/acpi/Rules.acpi 1.2: * src/components/lmsensors/Rules.lmsensors 1.2: * src/components/lustre/Rules.lustre 1.2: * src/components/mx/Rules.mx 1.2: * src/components/net/Rules.net 1.2: Adding new components no longer requires modification of Papi code 2010-04-29 bsheely * src/components/Rules.components 1.1: * src/components/acpi/linux-acpi-memory.c 1.1: * src/components/lmsensors/Makefile.lmsensors.in 1.1: * src/components/lmsensors/configure 1.1: * src/components/lmsensors/configure.in 1.1: * src/components/lustre/host_counter.c 1.1: * src/components/lustre/host_counter.h 1.1: * src/components/mx/Makefile.mx.in 1.1: * src/components/net/Makefile.net.in 1.1: * src/components/net/configure 1.1: * src/components/net/configure.in 1.1: * src/host_counter.c 1.2: * src/host_counter.h 1.2: * src/linux-acpi-memory.c 1.4: * src/linux-acpi.c 1.18: * src/linux-acpi.h 1.10: * src/linux-lmsensors.c 1.4: * src/linux-lmsensors.h 1.4: * src/linux-lustre.c 1.4: * src/linux-lustre.h 1.2: * src/linux-mx.c 1.17: * src/linux-mx.h 1.10: * src/linux-net.c 1.6: * src/linux-net.h 1.4: Created new build environment for components 2010-04-21 bsheely * src/perfmon.c 1.105: removed code that was commented out (accidentally uncommented out on last commit 2010-04-20 bsheely * src/freebsd/map-i7.c 1.3: * src/freebsd/map-i7.h 1.3: Updated on 3.7 branch * src/linux-bgl-events.c 1.4: * src/linux-bgl-memory.c 1.4: * src/linux-bgl.c 1.11: * src/linux-bgl.h 1.4: * src/linux-ia64.h 1.61: * src/linux.c 1.77: * src/papi_events.csv 1.9: * src/papi_pfm_events.c 1.40: * src/perf_events.c 1.26: * src/perf_events.h 1.11: * src/perfctr-ppc64.c 1.19: * src/perfctr-x86.c 1.4: * src/perfmon.h 1.24: * src/pmapi-ppc64.c 1.11: * src/solaris-ultra.c 1.128: Removed code for obsolete platforms 2010-04-16 jagode * src/ctests/native.c 1.63: * src/papiStdEventDefs.h 1.41: * src/papi_internal.h 1.190: * src/papi_preset.h 1.22: * src/papi_protos.h 1.74: After further investigations of the stack corruption issue on BGP, the real problem has been nailed down. The size of the PAPI_event_info_t struct is different on BGP systems which is due to a bigger PAPI_MAX_INFO_TERMS value. A _BGP was defined at configure time to differentiate between BGP and other systems. However, the problem is that a user program does not know this macro. When PAPI_event_info_t is initialized to zero, the beginning of the user program's stack frame is zeroed out --> BAD. It was fun, though. * src/aix.c 1.87: Fixed compilation errors for AIX which were due to missing inclusion of new header file papi_defines.h. 2010-04-15 bsheely * src/freebsd/map-atom.c 1.5: ... * src/freebsd/memory.c 1.4: Added files 2010-04-09 bsheely * src/linux-ppc64-memory.c 1.9: * src/perfctr-ppc32.c 1.11: * src/perfctr-ppc32.h 1.4: * src/perfctr-ppc64.h 1.11: * src/ppc32_events.c 1.8: * src/ppc64_events.c 1.9: * src/ppc64_events.h 1.12: Removed support for ppc32 architectures. Removed support for perfmon versions older than 2.5 except for Itanium. Removed all code related to POWER3 and POWER4. 2010-04-08 bsheely * src/solaris-niagara2.h 1.5: Added new include file * src/solaris-niagara2.c 1.7: Removed recently added include file since that file is now included in the header which is included here 2010-04-06 jagode * src/linux-bgp.h 1.4: Missing declaration of PAPI_MAX_LOCK (fixed for linux-bgp only) 2010-04-05 bsheely * src/papi_memory.c 1.23: Resolved compile warning * src/ctests/profile.c 1.60: Modified code to exit properly on test failure 2010-04-01 bsheely * src/ctests/clockcore.c 1.21: Prevent output after test failure 2010-03-30 vweaver1 * src/libpfm-3.y/lib/pfmlib_intel_nhm.c 1.4: Fix conflict from merge. * src/libpfm-3.y/lib/intel_corei7_events.h 1.1.1.6: * src/libpfm-3.y/lib/pfmlib_itanium2.c 1.1.1.3: * src/libpfm-3.y/lib/pfmlib_montecito.c 1.1.1.4: import libpfm CVS adds additional i7 model 46 support, fixes ia64 builds 2010-03-29 bsheely * src/ctests/pthrtough.c 1.11: Fixed buffer overflow debug output related to threads.c. Rolled back change to pthrtough.c 2010-03-19 bsheely * src/solaris-ultra.h 1.43: Add new include for remaining substrates 2010-03-18 bsheely * src/ctests/p4_lst_ins.c 1.5: * src/ftests/native.F 1.56: * src/p3_pfm_events.c 1.14: * src/p4_events.c 1.56: * src/p4_events.h 1.10: * src/papi_defines.h 1.2: * src/papi_memory.h 1.12: * src/perfctr-p3.c 1.95: * src/perfctr-p3.h 1.52: * src/perfctr-p4.c 1.109: * src/perfctr-p4.h 1.47: * src/perfctr-x86.h 1.2: Merge bsheely-temp branch by hand 2010-03-12 vweaver1 * src/ctests/multiplex1.c 1.53: * src/ctests/multiplex1_pthreads.c 1.54: * src/solaris-memory.c 1.14: Fix PAPI support for solaris-ultra. This code had not worked for some time. * Derived events now work (although the events are still hard-coded and not read from the csv file) * Add cache size detection routines * Fix ntv_code_to_name() * Modify the multiplex* ctests to use proper events on UltraSPARC All of the regression tests pass except for profile_pthreads. This is because overflow handling is still partially broken. 2010-03-05 ralph * doc/doxygen_procedure.txt 1.1: doc/doxygen_procedure.txt provides a quick overview of how to use doxygen for commenting the PAPI code. The utilities are now commented, cloning the wiki man pages. The high level api is also documented, cloning the wiki again. In the low level api, PAPI_accum - PAPI_destroy_eventset are documented. 2010-03-05 bsheely * src/ctests/thrspecific.c 1.6: Test now passes while testing the same functionality without memory leaks 2010-03-04 vweaver1 * src/libpfm-3.y/lib/pfmlib_priv.h 1.7: Fix conflicts from the libpfm import. * src/libpfm-3.y/docs/man3/libpfm_westmere.3 1.1.1.1: * src/libpfm-3.y/examples_v2.x/showevtinfo.c 1.1.1.3: * src/libpfm-3.y/include/perfmon/pfmlib.h 1.1.1.13: * src/libpfm-3.y/lib/intel_wsm_events.h 1.1.1.1: * src/libpfm-3.y/lib/intel_wsm_unc_events.h 1.1.1.1: * src/libpfm-3.y/lib/pfmlib_common.c 1.1.1.14: * src/libpfm-3.y/lib/pfmlib_intel_nhm_priv.h 1.1.1.3: Import latest libpfm, which includes Westmere support 2010-03-04 bsheely * src/ctests/fork.c 1.7: * src/ctests/fork2.c 1.4: * src/ctests/krentel_pthreads.c 1.8: * src/ctests/kufrin.c 1.15: * src/ctests/overflow_pthreads.c 1.43: * src/ctests/profile_pthreads.c 1.37: Fixed memory leaks 2010-03-03 vweaver1 * src/p3_ath_event_tables.h 1.4: * src/p3_core_event_tables.h 1.5: * src/p3_events.c 1.65: * src/p3_opt_event_tables.h 1.4: * src/p3_p2_event_tables.h 1.4: * src/p3_p3_event_tables.h 1.4: * src/p3_pm_event_tables.h 1.4: Now that Athlon and Pentium II events use libpfm, remove the old hard coded event table files. * src/perfctr-2.6.x/README 1.1.1.6: * src/perfctr-2.6.x/patches/patch-kernel-2.6.18-164.el5-redhat 1.1.1.1: * src/perfctr-2.6.x/patches/patch-kernel-2.6.31 1.1.1.1: * src/perfctr-2.6.x/patches/patch-kernel-2.6.32 1.1.1.1: Import of perfctr 2.6.40 2010-03-03 bsheely * src/ctests/clockres_pthreads.c 1.11: * src/ctests/fork_exec_overflow.c 1.12: * src/ctests/zero_pthreads.c 1.29: Fixed memory leaks 2010-02-24 bsheely * src/linux-memory.c 1.44: Removed hack to compile without warnings using Wconversion 2010-02-23 bsheely * src/ctests/all_events.c 1.15: * src/ctests/multiplex2.c 1.36: * src/ctests/multiplex3_pthreads.c 1.45: Fixed (debug) compile warnings 2010-02-22 jagode * src/.indent.pro 1.1: ... * src/utils/version.c 1.4: Added and applied new PAPI-coding-style profile file * src/windows.c 1.6: Added missing comment closer */ This misindented the rest of the source code in windows.c 2010-02-16 terpstra * src/ctests/prof_utils.h 1.8: Cleaned up a bunch of implicit type conversions. 2010-02-15 terpstra * src/run_tests_exclude.txt 1.7: Remove the PAPI_set_event_info and PAPI_encode_event API calls, since they were never supported, and generally come to be thought of as a bad idea. * src/ctests/encode.c 1.7: * src/ctests/encode2.c 1.5: Remove the encode and encode2 tests that exercise PAPI_set_event_info and PAPI_encode_event API calls, since they were never supported, and generally come to be thought of as a bad idea. 2010-01-25 bsheely * src/examples/PAPI_flips.c 1.4: * src/examples/PAPI_flops.c 1.4: * src/examples/PAPI_get_opt.c 1.5: * src/examples/PAPI_ipc.c 1.4: * src/examples/PAPI_overflow.c 1.5: * src/examples/PAPI_profil.c 1.7: * src/examples/high_level.c 1.4: * src/examples/locks_pthreads.c 1.3: * src/examples/overflow_pthreads.c 1.5: Fixed remaining compile warnings * src/examples/sprofile.c 1.5: Fixed compile warnings papi-5.3.0/ChangeLogP510.txt0000600003276200002170000004560512247131117015111 0ustar ralphundrgrad2013-01-15 * 0917f567 src/threads.c: Cleaned up compiler warning (gcc version 4.4.6) * 06ca3faa src/components/bgpm/CNKunit/linux-CNKunit.c src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/L2unit/linux-L2unit.c...: Cleaned up compiler warnings on BG/Q (gcc version 4.4.6 (BGQ-V1R1M2-120920)) 2013-01-14 * 56400627 .../build/lib.linux-x86_64-2.7/perfmon/__init__.py .../lib.linux-x86_64-2.7/perfmon/perfmon_int.py .../build/lib.linux-x86_64-2.7/perfmon/pmu.py...: libpfm4: remove extraneous build artifacts. Steve Kaufmann reported differences between the libpfm4 I imported into PAPI and the libpfm4 that can be attained with a git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4 Self: Do libpfm4 imports from a fresh clone of libpfm4. 2013-01-11 * 4ad994bc src/papi_events.csv: Clean up armv7 cortex a15 presets Clean up armv7 cortex a15 presets and add presets for L1 and L2 cache * d54dabf5 ChangeLogP510.txt RELEASENOTES.txt doc/Doxyfile-common...: Prepare the repo for a 5.1 release. * Bump the version number to 5.1 * Update the man pages * Create a changelog for 5.1 * Update RELEASENOTES * 8816a3b8 INSTALL.txt: Update INSTALL.txt Add information about installing PAPI on Intel MIC. Based upon information from Vince Weaver's PAPI MIC support page. http://www.eece.maine.edu/~vweaver/projects/mic/ * 8dc1ca23 TEST.TXT: Remove TEST.TXT This was a leftover from a switch over to git. * 292d6c9b src/papi_libpfm3_events.c: Fix build on ia64 When trying to build papi 5.0.1 for IA64, my collegue got compile errors due to perfmon.h not being included. We're not sure if this actually is a configure bug, but this patch fixed it. * 25424f41 src/extras.c: Fix kernel warning in _papi_hwi_stop_timer() In _papi_hwi_stop_timer() we were calling setitimer( timer, NULL, NULL ) to disable the itimer. Recent Linux kernels print warnings if you do this; NULL is not a valid second argument to setitimer() and possibly this wasn't really working before. According to the manpage the proper fix is to call setitimer() with a valid "new_value" field but with the values all 0. That is what this patch does. 2012-11-30 * a7d70127 src/components/micpower/README src/components/micpower/Rules.micpower src/components/micpower/linux-micpower.c...: MIC power component The Intel MIC (Xeon PHI) card reports power of several components of the card. These values are reported in a sysfs file, so this component is cloned from the coretemp component. 2013-01-08 * 121cd0a6 src/Makefile.in src/Rules.pfm4_pe src/configure...: configure: Add shortcut for mic support. * Add a --with-mic flag to enable the several options to cross compile for mic. MIC builds are cross-compiled and Matt and I were unable to figure out how to trigger cross compilation with just our flag. This is short-hand for setting --with-arch=k1om --without-ffsll --with-walltimer=clock_realtime_hr \ --with-perf-events --with-tls=__thread --with-virtualtimer=cputime_id * Automatically cause make to pass CONFIG_PFMLIB_ARCH_X86=y to libpfm4's make. So to build for the mic card one has to do: {Set pathing to find the x86_64-k1om-linux-gcc cross-compiler} $ ./configure --host=x86_64-k1om-linux --with-mic $ make Thanks to Matt Johnson for the legwork on configure shortcuting. 2013-01-07 * f65c9d9e src/papi_events.csv: Add preset events for ARM Cortex A15 2012-12-14 * 61a9c7b1 man/man3/PAPI_get_eventset_component.3 src/papi.c: Doxygen: Add a new API entry Add the manpage for the new PAPI_get_eventset_component api entry. 2013-01-02 * 38d969ab doc/Doxyfile-man1 doc/Doxyfile-man3 doc/Makefile...: Doxygen: Cleanup generated man pages. Mark a few \page sections as \htmlonly so that man pages are not built for them. Modify the makefile to rm some data structures that are generated. Doxyfile-man3: * Take out papi_vector.h, this file only defines a few data structures from which we don't need manpages. papi.h: * PAPI_get_component_index's inline comment had the close /**> to delimit its description, but doxygen uses /**<. papi_fwrappers.c: * Mark the group PAPIF as internal so that a man page is not generated for it. utils/*: * Remove some useless htmlonly directrives, doxygen will generate pages for any data structure, htmlonly doesn't stop that. Doxyfile-man1: * Change a flag in Doxyfile-man1 so that we don't document internal data structures in the utilities. We don't do this in -man3 because of the \class workaround we use to create manpages for each of the PAPI_* api entry points. Because we call them classes, they would be caught in the no data structures flag. * 7b790c09 doc/Doxyfile-html src/papi.h src/papi_fwrappers.c...: Doxygen: Cleanup some of the markup We were not using htmlonly correctly... The idea was to use \htmlonly to not build manpages for a few things. To properly hide \page s you want things like: /** \htmlonly \page Foo I don't want this to generate a manpage. \endhtmlonly */ 2012-12-07 * 152bac19 src/papi.c: Doxygen: Cleanup papi.c Cleanup some \ref s, \ref PAPI_function() isn't happy, use \ref PAPI_function it'll put in the proper links. Remove _papi_overflow_handler doc block. We had the block but no code. 2012-12-20 * 7a40c769 src/components/rapl/tests/rapl_overflow.c: RAPL test code: Add flexibility to the test code. Per Will Cohen; ------------------ I was reviewing some test results for the papi test and found that the rapl_overflow.c tests makes an assumption that there are exactly two packages. As a result the test will fail on machines with a single package. The following is a patch to make it a bit more flexible allow 1-n packages in the test. -Will ----------------- 2012-12-19 * 96c9afb0 src/components/appio/README src/components/appio/appio.c src/components/appio/appio.h...: Added events for seek statistics and support for intercepting lseek() calls. 2012-12-14 * 003abf6d src/Rules.perfctr-pfm: Rules.perfctr-pfm: pass CC in all cases. Perfctr user library was not being passed CC when built. 2012-12-05 * e2c05b29 src/papi_internal.c: papi_internal.c: Refactor dublicated code in cleanup and free eventset. Currently the code to free runtime state is duplicated in cleanup and free. The perf_event_uncore test exposed an issue where free cleaned up cpu_attach state but cleanup did not, causing a leak. Have _papi_hwi_free_EventSet call _papi_hwi_cleanup_eventset to free most of the runtime state of the eventset and then allow free_eventset to free the Eventset Info struct. 2012-12-13 * 7d020224 src/configure src/configure.in: configure: Change fortran compiler search order. Bandaid fix to buildbot errors. By default, configure would find icc before gcc but gfortran would be used before ifort. The real fix is to test that object code from the c compiler can be linked to by the fortran compiler. 2012-12-12 * 87b6e913 src/papi_events.csv: ivy_bridge: remove PAPI_HW_INT event Apparently recent Intel Vol3B documentation removed this event, and the most recent libpfm4 merge followed suit. I asked at Intel about this and possibly they only removed it because they didn't think anyone was using it. Maybe they'll ad it back 2012-12-10 * 293b26b9 src/Makefile.inc: Makefile.inc: Fix library link ordering. Per Will Cohen ----------------------------------------------------------- I ran across a problem when trying to build papi with the bundled libpfm and an earlier incompatible version of libpfm was already installed on the machine. The make would use the /usr/lib{64}/libpfm.so before trying to use the locally built version and this would cause problems. The attached patch changes the order of the linking and uses the local built libpfm before it tries the installed version. -Will ----------------------------------------------------------- 2012-12-12 * 57e6aa0d src/Makefile.in: Makefile.in: export CC_COMMON_NAME In 17cfcb4a I started using CC_COMMON_NAME in Rules.pfm4 but failed to have configure put it in Makefile. 2012-12-11 * 17cfcb4a src/Rules.pfm4_pe src/configure src/configure.in: Cleanup icc build Start using -diag-disable to quiet down some of the remarks icc carps about in libpfm4. Also have configure export CC_COMMON_NAME and check against that in Rules.pfm4_pe. afec8fc9a reverted us to passing -Wno-unused-parameter to icc, polluting buildbot. 2012-12-10 * afec8fc9 src/configure src/configure.in: configure: Attempt to better detect which C compiler we are using. This attempts to address trac bug 162. http://icl.cs.utk.edu/trac/papi/ticket/162 Specifying full paths for CC caused issues in our configure logic. We set several flags specific to gcc or icc and this was breaking down EG "/usr/bin/gcc" != "gcc" Now we attempt to execute whatever CC we are going to use and grep its version string. We set a CC_COMMON_NAME \in {"gcc", "icc", "xlc", "unknown"} based upon the above and later check CC_COMMON_NAME inplace of CC to set compiler specific flags. * 14432aa0 src/linux-timer.c src/papi.c: Minor Coverity fixes. Thanks, Will Cohen. 2012-12-07 * ba5e83d4 src/papi_user_events.c: papi_user_events.c: Fix memory leak. Reported by William Cohen as detected by the coverity tool. * 166498a8 src/components/nvml/linux-nvml.c: nvml component: fix detectDevices() The routine detectDevices() always returned with the error PAPI_ESYS when there was a device available. This resulted in that there were no nvml events available. Fixed. * 11ad5894 src/components/nvml/linux-nvml.c: nvml component: add missing variable declaration In the routine _papi_nvml_init_componen(), the variable papi_errorcode was not declared which prevented this component to build. Added declaration of papi_errorcode as int. 2012-12-06 * 9567dfef src/ftests/first.F src/ftests/second.F: Fix warning messages issued by gfortran 4.6.x regarding loss of precision when casting REAL to INT. Thanks to Heike for identifying the proper intrinsics. * 72588227 src/papi.c src/papi.h: Add PAPI_get_eventset_component() to get the component index from an event set. This is symmetric with PAPI_get_event_component which extracts the information from an event. In response to a request from John Mellor-Crummey. * 2e055d40 src/components/rapl/linux-rapl.c: Fix a compiler warning about a possibly uninitialized return value. 2012-12-05 * 1aae2246 src/utils/command_line.c: Reformat the floating point output string to recognize that you can't cast the *value* of a long long to a double and expect to get the right answer; you need to cast the *pointer* to a double, then everything works. * 0e834fc2 src/utils/command_line.c: Incorporated use of the new PAPI_add_named_event API. Restructured output to support formatted printing of built-in DATATYPEs: UINT64 prints as unsigned followed by (u); INT64 prints as signed; FP64 prints as float (but I don't like the default format); BIT64 prints a hex, prefixed by '0x'. Also if info.units is not empty, units are appended to output values. These features can be demo'd with the RAPL component. * af6abec2 src/papi.h: Rearranged DATATYPE enums so INT64 is now default (0) value. Also added a BIT64 type for unspecified bitfields. 2012-12-04 * 862033e0 src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/IOunit/linux-IOunit.h src/components/bgpm/L2unit/linux-L2unit.c...: Resolved multiple components conflict on BG/Q when overflow is enabled for multiple events from different components at the same time. * 44744002 src/utils/command_line.c: Add -x and -u options to papi_command_line to allow printing counter values in hexadecimal and unsigned formats. 2012-11-30 * 25a914c5 src/papi_user_events.c: Cleanup unused variable warnings in user_events code. 2012-11-28 * 9a75f872 src/Rules.pfm4_pe src/configure src/configure.in: Cleanup the build under icc. libpfm4's build system uses a gcc specific flag, -Wno-unused-parameter. It does this via a variable, DBG, in config.mk: DBG?=-g -Wall -Werror -Wextra -Wno-unused-parameter The Intel compiler doesn't understand -Wno-unused-parameter and complains about it. In Rules.pfm4_pe we set DBG for icc builds. 2012-11-27 * 4def827b src/configure src/configure.in: Fix the perfctr build that was breaking due to missing CPU Mark Gates was reporting PAPI 5 wasn't running properly on Keeneland. It looks like some CPU cleanups in the configure code broke things. Hopefully this helps the situation. 2012-11-21 * 4316f172 src/perf_events.c: perf_events: get rid of "PAPI Error: Didn't close all events" error This was more meant as a warning; it could trigger when closing an EventSet that had an event partially added but failed for some reason. * 671e10bd src/utils/command_line.c: papi_command_line: fix error output The error messages got a bit weird looking due to the PAPI error printing changes a while back. * 959afa49 src/papi_internal.c: Fix _papi_hwi_add_event to report errors back to user. Previously _papi_hwi_add_event would report all errors returned by add_native_events() as being PAPI_ECNFLCT even though add_native_events() returned a wider range of errors. * 8ecb70ba src/perf_events.c: Have perf_event return PAPI_EPERM rather than PAPI_ECNFLCT if the kernel itself returns EPERM * 9053ca1c src/perf_events.c: Work around kernel issue with PERF_EVENT_IOC_REFRESH It's unclear exactly the best way to restart sampling. Refreshing with 1 is the "official" way as espoused by the kernel developers, but it doesn't work on Power. 0 works for Power and most other machines, but the kernel developers say not to use it. This makes power user 0 until we can figure out exactly what is going on. * e85df04b src/components/appio/tests/appio_test_socket.c: - added support distinguishing between network and file I/O. - added events to measure statistics for sockets - updated README 2012-11-15 * 248694ef src/x86_cpuid_info.c: Update x86_cpuid_info code for KNC. On Knight's Corner the leaf2 code returns 0 for the count value. We were printing a warning on this; better would be to just skip the cache detection code if we get this result. 2012-11-08 * 82c93156 src/linux-bgp-memory.c src/linux-bgp.c src/linux-bgp.h: There was more cleaning up necessary in order to get PAPI compiled on BG/P. It should work now with the recommended configure steps described in INSTALL. 2012-11-07 * 77da80b3 src/Makefile.inc src/configure src/configure.in...: Make BGP use papi_events.csv This was easier than trying to clean up the linux-bgp-preset-events.c file to have the proper file layout. * fc8a4168 src/linux-bgp.c: Fix some linux-bgp build issues. No one has tried compiling after all the PAPI 5.0 changes so many bugs slipped in. * c16ef312 src/ctests/perf_event_uncore.c: Fix type warnings in perf_event_uncore test. * 3947e9c8 src/ctests/perf_event_uncore.c: Put a bandaid on the perf_event_uncore test. Check for an Intel family 6 model 45 processor (sandybridge ep) before executing the test. 2012-09-27 * a23d95f8 src/papi.c src/papi.h src/papi_fwrappers.c...: Mark some comments @htmlonly. This cleans up what man pages are generated. 2012-11-07 * d239c350 src/Makefile.inc src/Rules.pfm4_pe: Factor out duplicate install code from Rules.pfm4_pe The Makefile.inc has a rule to installed shared libraries. However, Rules.pfm4_pe also has a slightly different set of rules to install code for shared libraries. This leads to the same shared library being installed under two different names. The duplicate code has been removed from Rules.pfm4_pe and a symbolic link has been added to ensure that any code that might have linked with libpapi.so.$(PAPIVER).$(PAPIREV).$(PAPIAGE) still runs. 2012-10-30 * fcc64ff9 src/papi_events.csv: Add PAPI_HW_INT event for IvyBridge 2012-10-26 * ef89fc56 src/papi_events.csv: MIC: update PAPI_FP_INS / PAPI_VEC_INS instruction We were using VPU_INSTRUCTIONS_EXECUTED for PAPI_FP_INS but really it's more appropriate for PAPI_VEC_INS This leaves PAPI_FP_INS undefined, which breaks a lot of the ctests. A long term goal should probably be modifying the tests to use another counter if PAPI_FP_INS isn't available (this affects Ivy Bridge too). 2012-10-25 * 975c03f1 src/perf_events.c: perf_event: fix granularity bug cut-and paste error in the last set of changes. Would have meant if you tried to explicitly set granularity to thread you'd get system instead. * 3cd3a62d src/configure src/configure.in src/ctests/Makefile...: Add perf_event_uncore ctest Also add a new type of ctest, perf_event specific In theory we should have configure only enable this if perf_event support is being used. * 5ee97430 src/perf_events.c: perf_event: add PAPI_DOM_SUPERVISOR to allowed perf_event domains perf_event supports this domain but since we didn't have it in the list PAPI wasn't letting us set/unset this. This is needed for uncore support, as for uncore domain must be set to allow monitoring everything. * c9325560 src/perf_events.c: perf_event enable granularity support Add support for PAPI_GRAN_SYS to perf_event. This is needed for uncore support. 2012-10-18 * 59d3d758 src/mb.h src/perf_events.c: Update the memory barriers It turns out PAPI fails on older 32-bit x86 machines because it tries to use an SSE rmb() memory barrier. (Yes, I'm trying to run PAPI on a Pentium II. Don't ask) It looks like our memory barriers were copied out of the kernel, which doesn't quite work because it expects some kernel infrastructure instead. This patch uses the definitions used by the "perf" tool instead. Also dropped the use of the mb() memory barrier on mmap tail write, as the perf tool itself did a while ago so I'm hoping it's safe to do so as well. It makes these definitions a lot simpler. 2012-10-08 * bcdce5bc src/perf_events.c: perf_event: clarify an error message The message was saying detecting rdpmc support broke, but the real error is that perf_events itself is totally broken on this machine and it's just rdpcm was the first code that tried to access it. 2012-10-02 * 3bb3558f src/mb.h: Update memory barries for Knights Corner Despite being x86_64 they don't support the SSE memory barrier instructions, so add a case in mb.h to handle this properly. 2012-10-01 * 38a5d74c src/libpfm4/README src/libpfm4/docs/Makefile src/libpfm4/docs/man3/libpfm_intel_atom.3...: Merge libpfm4 with Knights Corner Support * bf959960 src/papi_events.csv: Change "phi" to "knc" to match libpfm4 for Xeon Phi / Knights Corner support 2012-09-20 * d9249635 ChangeLogP501.txt RELEASENOTES.txt: Update releasenotes and add a changelog for 5.0.1 * a1e30348 man/man1/papi_avail.1 man/man1/papi_clockres.1 man/man1/papi_command_line.1...: Rebuild the manpages for a 5.0.1 release. papi-5.3.0/ChangeLogP411.txt0000600003276200002170000004154712247131117015112 0ustar ralphundrgrad2010-09-30 * src/: configure, configure.in: When --with-OS=CLE is enabled, check the kernel version and use perfmon2 for old kernels and perf_events for new kernels. * src/: configure, configure.in: If no sources of perf counters are available, then use the generic_platform substrate instead. Currently the code would always fall back on perfctr even if no perfctr support was available. * src/: configure, configure.in: If you specify --with-perf-events or --with-pe-include but the required perf_event.h header is not available, then have configure fail with an error. * papi.spec: Bump version number to 4.1.1 in affected files. Also bump requirement for kernel from 2.6.31 to 2.6.32. This in prep for the pending release. * src/: configure, Makefile.in, configure.in, papi.h: Bump version number to 4.1.1 in affected files. This in prep for the pending release. * INSTALL.txt: Hope this late commit doesn't interfere with anything. This updates the INSTALL.txt to reflect all of the improvements we've made to perf_event support since the last release. 2010-09-29 * src/Rules.pfm: The -Werror problem was still occurring on ia64/perfmon compiles, as I hadn't updated Rules.pfm * src/: configure, configure.in, perf_events.c, perf_events.h, sys_perf_counter_open.c, sys_perf_event_open.c, syscalls.h: Remove support for the perf_counter interface in kernel 2.6.31. Now supports only the perf_event interface in kernel 2.6.32 and above. 2010-09-22 * src/perf_events.c: Attempt to add mmtimer support to perf_events substrate. * src/: multiplex.c, papi.c, papi_protos.h: The multiplex code currently does not make a final adjustment at the time of MPX_read(). This is to avoid the case where counts could be decreasing if you have multiple reads returning estimated values before the next actual counter read. While this code works to keep the results non-decreasing, it can cause significant differences from expected results for final reads, especially if many counters are being multiplexed. This is seen in the sdsc-mpx test. It was failing occasionally on some machines by having error of over 20% (the cutoff for a test error) when multiplexing 11 events. What this fix does is to special case the PAPI_stop() case when multiplexing is enabled, having the PAPI_stop() do a final adjustment. The intermediate PAPI_read() case is not changed. This fixes the sdsc-mpx case, while still passing the mendes-alt case (which checks for non-decreasing values). There is a #define that can be set in multiplex.c to restore the previous behavior. * src/ctests/mendes-alt.c: This is our only test that checks to see if multiplexed values are non-decreasing or not. Unfortunately the test currently doesn't fail if values do go backward. This change causes the test to fail if it finds multiplexed counts that decrease. 2010-09-17 * src/libpfm-3.y/: config.mk, lib/intel_wsm_events.h: Fix conflicts from merge. 2010-09-15 * src/: Makefile.inc, Rules.perfctr-pfm, Rules.pfm_pe: Finally fix the -WExtra problem. The issue was -WExtra was being passed to libpfm, but only in the case where the user had a CFLAGS env variable. It turns out this is due to the following from section 5.7.2 of the gmake manual: Except by explicit request, make exports a variable only if it is either defined in the environment initially or set on the command line, And the fix is also described: If you want to prevent a variable from being exported, use the unexport directive, So I've added an "unexport CFLAGS" directive, which seems to be the right thing as our Makefile explicitly passes CFLAGS to the sub-Makefiles that need it. This seems to fix the build. 2010-09-13 * src/libpfm-3.y/: docs/man3/libpfm_westmere.3, lib/intel_wsm_events.h, lib/intel_wsm_unc_events.h, lib/pfmlib_intel_nhm.c, lib/pfmlib_priv.h: Fix the missing files from the import (CVS claims this as a "conflict") 2010-09-08 * src/Makefile.inc: Fixed the recipies for [c|f]tests and utils. $(LIBRARY) => $(papiLIBS) (this way we don't build libpapi.a if we won't want it) 2010-09-03 * src/ctests/sdsc.c: Had a "%d" instead of "%lld" in that last commit. * src/ctests/sdsc.c: Give a more detailed error message on the sdsc-mpx test. We're seeing sporadic failures (probably due to results being close to the threshold value) but it's hard to tell on buildbot which counter is failing because the error message didn't print the value. 2010-09-02 * src/papi.c: Remove code that reported ENOSUPP if HW multiplexing is not available. PAPI can automatically perform SW multiplexing if HW is not available. With this part of my previous multiplexing patch reverted, multiplexing seems to work even on 2.6.32 perf_events (by reverting to SW mode on those machines) 2010-08-31 * src/perf_events.c: Explicitly set the disabled flag to zero in perf_events for new events. It was possible with an event set that if you removed an event then added a new one that the disabled flag was obtaining the value from the previously removed event. This fix doesn't seem to break anything, but the code involved is a bit tricky to follow. This fixes the sdsc4-mpx test on sol. * src/components/coretemp/: Rules.coretemp, linux-coretemp.c, linux-coretemp.h: Initial stab at a coretemp component. This component exposes every thing that looks like a useful file under /sys/class/hwmon. 2010-08-30 * src/perf_events.c: F_SETOWN_EX is not available until 2.6.32, so don't use it unless we are running on a recent enough kernel. * src/perf_events.c: Pentium 4 was not supported by perf_events until version 2.6.35. Print an error if we attempt to use it on an older kernel. 2010-08-27 * src/ctests/overflow_allcounters.c: The "overflow_allcounters" test failed on perfmon2 kernels because the behavior of a counter on overflow differs between the various substrates. Therefore detect if we're running on perfmon2 and print a warning, but still pass the test. * src/libpfm-3.y/lib/: intel_wsm_events.h, intel_wsm_unc_events.h, pfmlib_intel_nhm.c, pfmlib_priv.h: updating * src/libpfm-3.y/docs/man3/libpfm_westmere.3: removing westmere documentation * src/perf_events.c: Fix warning in compile due to missing parameter in a debug statement. * src/ctests/test_utils.c: In the ctests, test_skip() was attempting a PAPI_shutdown() before exiting. On multithreaded tests (that had already spawned threads before the decision to skip) this really causes the programs to end up confused and reports spurious memory errors. So remove the PAPI_shutdown() from test_skip(). There's a comment in test_fail() that indicates this was already done there for similar reasons. 2010-08-26 * src/ctests/byte_profile.c: byte_profile was failing on systems where fp_ops is a derived event. modify the test so it gives a warning instead of failing and avoids using the derived event. * src/perf_events.c: At PAPI_stop() time a counter with overflow enabled is being adjusted by a value equal to the sampling period. It looks like this isn't needed (and is generating an overcount that breaks overflow_allcounters). I'm still checking up on this code; if it turns out to be necessary I may have ro revert this later. * src/ctests/overflow_allcounters.c: Add validation check to overflow_allcounters It turns out perf_event kernels overcount overflows for some reason, while perfctr doesn't. I'm investigating. * src/ctests/: overflow_allcounters.c, papi_test.h, test_utils.c: On Power5 and Power6, hardware counters 5 and 6 cannot generate interrupts. This means the overflow_allcounters test was failing because overflow could not be generated for events 5 and 6. Add code that special cases Power5 and Power6 for this test (and generate a warning) * src/perf_events.c: Change some debug messages to be warnings instead of errors. * src/: papi.c, ctests/second.c: Fix ctests/second on bluegrass (POWER6) The test was testing domains by trying PAPI_DOM_ALL^PAPI_DOM_SUPERVISOR in an attempt to turn off the SUPERVISOR bit. This fails on Power6 as it leaves the PAPI_DOM_OTHER bit set, which isn't allowed. How did the test earlier measure PAPI_DOM_ALL then, which has all bits set? Well it turns out papi.c silently corrects PAPI_DOM_ALL to be available_domains. But if you fiddle any of the bits this correction is lost. This is probably not the right thing to do, but the best way to fix it is not clear. For now this modifies the "second" test to clear the DOM_OTHER bit too if the domain setting fails with it set. 2010-08-25 * src/: papi.c, papi.h, perf_events.c, ctests/kufrin.c, ctests/mendes-alt.c, ctests/multiplex1.c, ctests/multiplex1_pthreads.c, ctests/multiplex2.c, ctests/multiplex3_pthreads.c, ctests/sdsc.c, ctests/sdsc2.c, ctests/sdsc4.c, ftests/fmultiplex1.F, ftests/fmultiplex2.F: Add support for including the OS version in the component_info_t struct. Use this support under perf_events to disable multiplexing support if the kernel is < 2.6.33 Modify the various multiplexing tests to "skip" if they get a PAPI_ENOSUPP when attempting to set up multiplexing. * src/ctests/all_native_events.c: Update all_native_events ctest to print warning in the case where we skip events because they aren't implemented yet (offcore and uncore mostly). 2010-08-24 * src/ctests/: papi_test.h, profile.c, test_utils.c: Adds a new "test_warn()" function for the ctests. This allows you to let tests pass with a warning. This is useful in cases where you don't want to forget that an option needs implementing, but that the feature being missed isn't important enough to fail the test. The first user of this is the "profile" test. We warn that PAPI_PROFIL_RANDOM is not supported on perf_events. * src/perf_events.c: From what I can tell, on perf_events the overflow PAPI_OVERFLOW_FORCE_SW case was improperly falling through in _papi_pe_dispatch_timer() to also run the HARDWARE code. This meant that we were attempting to read non-existant hardware overflow data, causing a lot of errors to be printed to the screen. This shows up in the overflow_force_software test * src/ctests/: ipc.c, multiplex2.c, multiplex3_pthreads.c, test_utils.c: Some minor changes to the ctests. + ipc -- fail if the reported IPC value is zero + multiplex2 -- fail if all 32 counter values report as zero + multiplex3_pthread -- give up sooner if each counter returns zero. otherwise the test can take upwards of an hour to finish and makes the fan on my laptop sound like it's going to explode in the process 2010-08-20 * src/Makefile.inc: Disable CFLAGS += $(EXTRA_CFLAGS) (-Wextra) for now. This will get buildbot running again, and if I can manage to figure out exactly what the Makefiles are doing I'll re-enable it again. * src/perf_events.c: Add support for Pentium 4 under perf events. This requires a 2.6.35 kernel. On p4 perf events requires a special format for the raw event, so we modify the results from libpfm3 to conform to what the kernel expects. * release_procedure.txt: release_procedure updated to reflect files to keep under /doc 2010-08-18 * src/perf_events.c: Patch from Gary Mohr that allows PAPI on perf events to catch permissions problems at the time of configuration, rather than only appearing once papi_start() is called. Quick summary of changes: + Adds a check_permissions() routine PERF_COUNT_HW_INSTRUCTIONS is used as the test event. + check_permissions() is called during PAPI_ATTACH, PAPI_CPU_ATTACH and PAPI_DOMAIN + Various "ctl" structures renamed "pe_ctl" + Some minor debug changes 2010-08-05 * src/perf_events.c: Use F_SETOWN_EX instead of F_SETOWN in tune_up_fd() This fixes a multi-thread overflow bug found with the Rice test-suite. F_SETOWN_EX doesn't exist until Linux 2.6.32. We really need some infrastructure that detects the running kernel at init time and warns that things like F_SETOWN_EX, multiplexing, etc., are unavailable if the kernel is too old. 2010-08-04 * src/: Makefile.inc, cpus.c, cpus.h, genpapifdef.c, papi.c, papi.h, papi_defines.h, papi_internal.c, papi_internal.h, perf_events.c, perf_events.h, threads.h: This is the PAPI_CPU_ATTACH patch from Gary Mohr that also fixes a problem with multiple event sets on perf events. Changes by file: papi.h + Add PAPI_CPU_ATTACHED + Add strutctures needed for CPU_ATTACH Makefile.in + include the new cpus.c file papi_internal.c + add call to _papi_hwi_shutdown_cpu() in _papi_hwi_free_EventSet() + make remap_event_position() non-static + add_native_events() and remove_native_events() use _papi_hwi_get_context() + _papi_hw_read() has some whitespace and debug message changes, and removes an extraneous loop index papi_internal.h + a new CPUS_LOCK is added + cpuinfo struct added to various structures + an inline call called _papi_hwi_get_context() added perf_events.h + a cpu_num field added to control_state_t perf_events.c + open_pe_events() allows per-cpu counting, additional debug was added + set_cpu() function added + new debug messages in set_granularity() and _papi_pe_read() + _papi_pe_ctl() has PAPI_CPU_ATTACH code added + _papi_pe_update_control_state() has the default domain set to be PAPI_DOM_USER instead of pe_ctl->domain genpapifdef.c + PAPI_CPU_ATTACHED added threads.h + an ESI field added to ThreadInfo_t papi.c + many new ABIDBG() debug messages added + PAPI_start() updated to check for CPU_ATTACH conflicts, has whitespace fixes, gets context now, if dirty calls update_control_state() + PAPI_stop(), PAPI_reset(), PAPI_read(), PAPI_read_ts(), PAPI_accum(), PAPI_write(), PAPI_cleanup_eventset(), all use _papi_hwi_get_context() to get context + PAPI_read() has some braces added + PAPI_get_opt() and PAPI_set_opt() have CPU_ATTACHED code added. + PAPI_overflow() and PAPI_sprofil() now report errors if CPU_ATTACH enabled cpus.c, cpus.h + New files based on threads.c and threads.h I made some additional changes, based on warnings given by gcc + Added a few missing function prototypes in cpus.h + Update PAPI_MAX_LOCK as it wasn't increased to handle the new addition of CPUS_LOCK + Removed various variables and functions reported as being unused. 2010-08-03 * src/: papi_internal.h, papi_lock.h: The option --with-no-cpu-counters was not supported on AIX. This has been fixed and works now. Also the get_{real|virt}_{cycles|usec} implementations for AIX (checked in Jul 29) have now been tested and work correctly. 2010-07-29 * src/: configure, configure.in, papi_lock.h, papi_vector.c: Added AIX support for the get_{real|virt}_{cycles|usec} functions +++ Fortran tests are now compiling on AIX. Wrong compiler flags were used for the AIX compilers. 2010-07-26 * src/papi_events.csv: add PAPI_L1_DCM for atom * src/x86_cache_info.c: Update the x86 cache_info table. The data from this table now comes from figure 3-17 in the Intel Architectures Software Reference Manual 2A (cpuid instruction section) This fixes an issue on my Atom N270 machine where the L2 cache was not reported. 2010-07-16 * INSTALL.txt, src/perf_events.c, src/perf_events.h: Perf Events now support attach and detach. The patch for supporting this was written by Gary Mohr * src/papi_events.csv: Add a few missing events to Nehalem, based on reading Intel Volume 3b. * src/papi_events.csv: Fix Westmere to not use L1D_ALL_REF:ANY I tested this on a Nehalem which has the proper behavior, unfortunately no Westmere here to test on. * src/: papi_events.csv, papi_pfm_events.c, perfctr-x86.c: Enable support for having more than one CPU block with the same name in the .csv file. This allows easier support for sharing events between similar architectures. I *think* this is needed and *think* it shouldn't break anything, but I might have to back it out. Also fixes event support for Pentium Pro / Pentium III/ P6 on perfmon2 and perf events kernels. Also fixed some confusion where perfctr called chips "Intel Core" meaning Core Duo wheras pfmon called "Intel Core" meaning Core2. This was tested on actual Pentium Pro and PIII hardware (as well as on a few Pentium 4 machines plus a Core2 machine) 2010-07-02 * src/: papi_hl.c, ctests/api.c: Added remaining low-level api tests papi-5.3.0/.gitattributes0000600003276200002170000000035612247131117015040 0ustar ralphundrgradPAPI_FAQ.html -diff release_procedure.txt -diff gitlog2changelog.py -diff doc/DataRange.html -diff doc/PAPI-C.html -diff doc/README -diff src/buildbot_configure_with_components.sh -diff delete_before_release.sh -diff .gitattributes -diff papi-5.3.0/ChangeLogP511.txt0000600003276200002170000001732412247131117015107 0ustar ralphundrgrad2013-05-21 * 602d8dbc man/man1/papi_avail.1 man/man1/papi_clockres.1 man/man1/papi_command_line.1...: Rebuild man pages for a 5.1.1 release. * 93d9be34 doc/Doxyfile-common papi.spec src/Makefile.in...: Bump version number for a 5.1.1 release. 2013-04-15 * 8e47838d src/components/cuda/linux-cuda.c: When creating two event sets - one for the CUDA and one for the CPU component - the order of event set creation appears crucial. When the CPU event set has been created before the CUDA event set then PAPI_start() for the CUDA event set works fine. However, if the CUDA event set has been created before the CPU event set, then PAPI_start(CUDA_event_set) forces the CUDA control state to be updated one more time, even if the CUDA event set has not been modified. The CUDA control state function did not properly handle this case and hence cause PAPI_start() to fail. This has been fixed. 2013-05-13 * c93dfa68 src/perf_events.c: perf_event component: update error returns This passes more error return values back to PAPI. Before this change a lot of places were hardcoded to PAPI_EPERM even if sys_perf_event_open() was reporting a different error. 2013-05-08 * d1db58e8 src/configure src/configure.in: Force the use of pthread_mutexes on ARM This lets the system libraries worry about the best way to define mutexes, rather than trying to hand-code in assembly around all of the various issues there are with atomic instructions in the ARM architecture. It might make sense to enable this for *all* Linux architectures, but for now just do it for ARM. * 29662e3e src/linux-lock.h: Commit 59d3d7584b2925bd05b4b5d0f4fe89666eb8494a removed the definition of mb(). mb() was defined as rmb(). This just corrects it back. (Note from VMW -- this fixes some things, but ARM still won't build on a Cortex A9 pandaboard due to the use of the "swp" instruction. Proper fix is probably to enforce posix-mutexes on ARM) 2013-04-22 * ff29fd12 src/run_tests.sh: The test for determining whether to run valgrind was backwards. Correcting that allow the run_test.sh script to stay the same and one just needs to define "VALGRIND=yes" (or any non-null string) to make run_test.sh use valgrind. --- src/run_tests.sh | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/run_tests.sh b/src/run_tests.sh index d1ce205..9337ff2 100755 --- a/src/run_tests.sh +++ b/src/run_tests.sh @@ -19,10 +19,8 @@ else export TESTS_QUIET fi -if [ "x$VALGRIND" = "x" ]; then -# Uncomment the following line to run tests using Valgrind -# VALGRIND="valgrind --leak-check=full"; - VALGRIND=""; +if [ "x$VALGRIND" != "x" ]; then + VALGRIND="valgrind --leak-check=full"; fi #CTESTS=`find ctests -maxdepth 1 -perm -u+x -type f`; -- 2013-03-28 * 1e8101f6 src/run_tests.sh: run_tests.sh: further refine component test find Exclude *.cu when looking for component tests. 2013-03-25 * 0b600bc5 src/run_tests.sh: run_tests.sh: File mode changes. run_tests.sh is now expected to run from the install location in addition to src. The script tried to remove execute from *.[c|h], now it just excludes *.[c|h] from the find commands. 2013-03-18 * 06f9c43b src/perfctr-x86.c: perfctr: don't read in event table multiple times papi_libpfm3_events.c now reads in the predefined events, we don't also need to do this in perfctr setup_x86_presets() * 48d7330c src/perfctr.c: Fix segfault in perfctr.c The preset lookup uses the cidx index, but in perfctr.c we weren't passing a cidx value (it was being left off). The old perfctr code plays games with defining extern functions so the compiler wasn't giving us a warning. 2013-03-14 * eda94e50 src/components/bgpm/L2unit/linux-L2unit.c src/linux-bgq.c: If a counter is not set to overflow (threshold==0; happens when PAPI_shutdown is called) then we do not want to rebuild the BGPM event set, even if the event set has been used previously and hence "applied or attached". Usually if an event set has been applied or attached prior to setting overflow, the BGPM event set needs to be deleted and recreated (which implies malloc() from within BGPM). Not so, though, if threshold is 0 which is the case when PAPI_shutdown is called. Note, this only applies to Punit and L2unit, not IOunit since an IOunit event set in not applied or attached. 2013-03-13 * 46f6123a src/components/bgpm/IOunit/linux-IOunit.c src/components/bgpm/IOunit/linux-IOunit.h src/components/bgpm/L2unit/linux-L2unit.c...: Overflow issue on BG/Q resolved. Overflow with multiple components worked; overflow with multiple components and multiple events did not work as supposed to. 2013-03-07 * 6a0813f8 src/linux-common.c src/linux-memory.c: Fix the build on Linux-SPARC I dug out an old SPARC machine and fixed the PAPI build on it. * 51fe7e53 src/perf_events.c: More comprehensive sys_perf_open to PAPI error mappings This tries to cover more of the errors returned by sys_perf_open and map them to better results. EINVAL is a problem because it can mean Conflict as well as Event not found and many other things, so it's unclear what to do with it. * 1479a67f src/perf_events.c src/sys_perf_event_open.c: Return proper error codes for sys_perf_event_open For some reason on x86 and x86_64 we were trying to set errno manually and thus over-writing the proper errno value, causing all errors to look like PAPI_EPERM This removes that code, as well as adds code to report ENOENT as PAPI_ENOEVENT. With this change, on IVY this happens which looks more correct. ./utils/papi_command_line perf::L1-ICACHE-PREFETCHES Failed adding: perf::L1-ICACHE-PREFETCHES because: Event does not exist command_line.c PASSED 2013-03-06 * 7a3e75e8 src/papi_libpfm4_events.c src/papi_user_events.c: Coverity fixes: Coverity pointed out that there was a case where load_user_eent_table() could leak memory. The change in the location of the papi_free(foo) ensures that the allocated memory is freed. Coverity pointed out one path through the code in _papi_libpfm4_ntv_code_to_descr() that did not free up memory allocated in the function. Added a free on the path in free up that memory. Thanks Will Cohen. 2013-03-04 * b19bd1a2 src/components/rapl/linux-rapl.c: Remove a stray debug statement. Thanks to Harald Servat for catching this. 2013-03-01 * 6e5be510 src/utils/command_line.c: Wrestled some horribly convoluted indexing into shape. The -u and -x options now print as expected (I think). 2013-01-31 * 02bd70ad src/components/nvml/linux-nvml.c: linux-nvml.c: Fix type warning. CUDA and NVML have an signed vs unsigned thing going on in their returned device counts, cast away the warning. 2013-01-23 * a5bed384 src/linux-memory.c src/linux-timer.c: ia64 fixes. Thanks to Tony Jones for patches. 2013-01-16 * 021db23a src/components/nvml/linux-nvml.c: nvml component: cleanup a memory leak We did not free a buffer at shutdown time. 2013-05-17 * b25fc417 src/perf_events.c: perf_event: allow running with perf_event_paranoid is 2 perf_event_paranoid set to 2 means allow user monitoring only (no kernel domain). The code before this mistakenly disabled all events in this case. Also set the allowed domains to exclude PAPI_DOM_KERNEL. 2013-05-16 * 12768bec src/papi_events.csv: papi_events.csv Revert a little mishap in adding ivbep support Somehow the contents of papi_hl.c ended up in the events file. * 5e97ad7f src/papi_events.csv: Add identifier for ivb_ep 2013-01-29 * e201b8eb src/papi.c: General doxygen cleanup: remove all "No known bugs" messages; correct and cleanup examples for PAPI_code_to_name and PAPI_name_to_code papi-5.3.0/man/0000700003276200002170000000000012247131117012712 5ustar ralphundrgradpapi-5.3.0/man/Makefile0000600003276200002170000000053012247131117014352 0ustar ralphundrgradclean: rm -f *~ core man3/*~ install: @echo "Man pages (MANDIR) being installed in: \"$(MANDIR)\""; -mkdir -p $(MANDIR)/man3 -chmod go+rx $(MANDIR)/man3 -cp man3/PAPI*.3 $(MANDIR)/man3 -chmod go+r $(MANDIR)/man3/PAPI*.3 -mkdir -p $(MANDIR)/man1 -chmod go+rx $(MANDIR)/man1 -cp man1/*.1 $(MANDIR)/man1 -chmod go+r $(MANDIR)/man1/*.1 papi-5.3.0/man/README0000600003276200002170000000102612247131117013573 0ustar ralphundrgrad/* * File: README * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ This directory contains: Makefile Installs man pages. man1/ Man pages for the PAPI utility applications. man3/ Man pages for the PAPI API functions. Makefile Usage: make make install DESTDIR= Beginning with PAPI 4.2.0, man pages are generated from the PAPI sources using doxygen scripts found in the papi/doc directory. They are updated prior to each release.papi-5.3.0/man/man3/0000700003276200002170000000000012247131120013542 5ustar ralphundrgradpapi-5.3.0/man/man3/PAPI_event_info_t.30000600003276200002170000000622512247131120017125 0ustar ralphundrgrad.TH "PAPI_event_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_event_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "unsigned int \fBevent_code\fP" .br .ti -1c .RI "char \fBsymbol\fP [1024]" .br .ti -1c .RI "char \fBshort_descr\fP [64]" .br .ti -1c .RI "char \fBlong_descr\fP [1024]" .br .ti -1c .RI "int \fBcomponent_index\fP" .br .ti -1c .RI "char \fBunits\fP [64]" .br .ti -1c .RI "int \fBlocation\fP" .br .ti -1c .RI "int \fBdata_type\fP" .br .ti -1c .RI "int \fBvalue_type\fP" .br .ti -1c .RI "int \fBtimescope\fP" .br .ti -1c .RI "int \fBupdate_type\fP" .br .ti -1c .RI "int \fBupdate_freq\fP" .br .ti -1c .RI "unsigned int \fBcount\fP" .br .ti -1c .RI "unsigned int \fBevent_type\fP" .br .ti -1c .RI "char \fBderived\fP [64]" .br .ti -1c .RI "char \fBpostfix\fP [256]" .br .ti -1c .RI "unsigned int \fBcode\fP [12]" .br .ti -1c .RI "char \fBname\fP [12][256]" .br .ti -1c .RI "char \fBnote\fP [1024]" .br .in -1c .SH "Field Documentation" .PP .SS "unsigned int PAPI_event_info_t::code[12]" array of values that further describe the event: .IP "\(bu" 2 presets: native event_code values .IP "\(bu" 2 native:, register values(?) .PP .SS "int PAPI_event_info_t::component_index" component this event belongs to .SS "unsigned int PAPI_event_info_t::count" number of terms (usually 1) in the code and name fields .IP "\(bu" 2 presets: these are native events .IP "\(bu" 2 native: these are unused .PP .SS "int PAPI_event_info_t::data_type" data type returned by PAPI .SS "char PAPI_event_info_t::derived[64]" name of the derived type .IP "\(bu" 2 presets: usually NOT_DERIVED .IP "\(bu" 2 native: empty string .PP .SS "unsigned int PAPI_event_info_t::event_code" preset (0x8xxxxxxx) or native (0x4xxxxxxx) event code .SS "unsigned int PAPI_event_info_t::event_type" event type or category for preset events only .SS "int PAPI_event_info_t::location" location event applies to .SS "char PAPI_event_info_t::long_descr[1024]" a longer description: typically a sentence for presets, possibly a paragraph from vendor docs for native events .SS "char PAPI_event_info_t::name[12][256]" < names of code terms: - presets: native event names, .IP "\(bu" 2 native: descriptive strings for each register value(?) .PP .SS "char PAPI_event_info_t::note[1024]" .PP .nf an optional developer note supplied with a preset event to delineate platform specific .fi .PP anomalies or restrictions .SS "char PAPI_event_info_t::postfix[256]" string containing postfix operations; only defined for preset events of derived type DERIVED_POSTFIX .SS "char PAPI_event_info_t::short_descr[64]" a short description suitable for use as a label .SS "char PAPI_event_info_t::symbol[1024]" name of the event .SS "int PAPI_event_info_t::timescope" from start, etc\&. .SS "char PAPI_event_info_t::units[64]" units event is measured in .SS "int PAPI_event_info_t::update_freq" how frequently event is updated .SS "int PAPI_event_info_t::update_type" how event is updated .SS "int PAPI_event_info_t::value_type" sum or absolute .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_real_nsec.30000600003276200002170000000114612247131120017235 0ustar ralphundrgrad.TH "PAPI_get_real_nsec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_real_nsec \- .PP Get real time counter value in nanoseconds\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP This function returns the total real time passed since some arbitrary starting point\&. The time is returned in nanoseconds\&. This call is equivalent to wall clock time\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_virt_usec\fP .PP \fBPAPI_get_virt_cyc\fP .PP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_read_ts.30000600003276200002170000000271312247131120016065 0ustar ralphundrgrad.TH "PAPI_read_ts" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_read_ts \- .PP Read hardware counters with a timestamp\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_read_ts(int EventSet, long long *values, long long *cycles )\fP; .RE .PP \fBPAPI_read_ts()\fP copies the counters of the indicated event set into the provided array\&. It also places a real-time cycle timestamp into the cycles array\&. .PP The counters continue counting after the read\&. .PP \fBPAPI_read_ts()\fP assumes an initialized PAPI library and a properly added event set\&. .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI Event Set as created by \fBPAPI_create_eventset()\fP .br \fI*values\fP -- an array to hold the counter values of the counting events .br \fI*cycles\fP -- an array to hold the timestamp values .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ENOEVST\fP The event set specified does not exist\&. .RE .PP \fBExamples\fP .RS 4 .PP .nf * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_read\fP .PP \fBPAPI_accum\fP .PP \fBPAPI_start\fP .PP \fBPAPI_stop\fP .PP \fBPAPI_reset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_multiplex.30000600003276200002170000000431512247131120017326 0ustar ralphundrgrad.TH "PAPI_get_multiplex" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_multiplex \- .PP Get the multiplexing status of specified event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_get_multiplex( int EventSet )\fP; .RE .PP \fBFortran Interface:\fP .RS 4 #include fpapi\&.h .br \fBPAPIF_get_multiplex( C_INT EventSet, C_INT check )\fP .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid, or the EventSet is already multiplexed\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .RE .PP \fBPAPI_get_multiplex\fP tests the state of the PAPI_MULTIPLEXING flag in the specified event set, returning \fITRUE\fP if a PAPI event set is multiplexed, or FALSE if not\&. .PP \fBExample:\fP .RS 4 .PP .nf * int EventSet = PAPI_NULL; * int ret; * * // Create an empty EventSet * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Bind it to the CPU component * ret = PAPI_assign_eventset_component(EventSet, 0); * if (ret != PAPI_OK) handle_error(ret); * * // Check current multiplex status * ret = PAPI_get_multiplex(EventSet); * if (ret == TRUE) printf("This event set is ready for multiplexing\n\&.") * if (ret == FALSE) printf("This event set is not enabled for multiplexing\n\&.") * if (ret < 0) handle_error(ret); * * // Turn on multiplexing * ret = PAPI_set_multiplex(EventSet); * if ((ret == PAPI_EINVAL) && (PAPI_get_multiplex(EventSet) == TRUE)) * printf("This event set already has multiplexing enabled\n"); * else if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_multiplex_init\fP .PP \fBPAPI_set_opt\fP .PP \fBPAPI_create_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_multiplex.30000600003276200002170000000101412247131120017441 0ustar ralphundrgrad.TH "PAPIF_set_multiplex" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_multiplex \- .PP Convert a standard event set to a multiplexed event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_multiplex( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_multiplex\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_multiplex.30000600003276200002170000000100712247131117017435 0ustar ralphundrgrad.TH "PAPIF_get_multiplex" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_multiplex \- .PP Get the multiplexing status of specified event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_multiplex( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_multiplex\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_cmp_granularity.30000600003276200002170000000111512247131120020620 0ustar ralphundrgrad.TH "PAPIF_set_cmp_granularity" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_cmp_granularity \- .PP Set the default counting granularity for eventsets bound to the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_cmp_granularity( C_INT granularity, C_INT cidx, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_cmp_granularity\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_dmem_info.30000600003276200002170000000102412247131117017346 0ustar ralphundrgrad.TH "PAPIF_get_dmem_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_dmem_info \- .PP get information about the dynamic memory usage of the current program .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_dmem_info( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_dmem_info\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_mh_tlb_info_t.30000600003276200002170000000106112247131120017242 0ustar ralphundrgrad.TH "PAPI_mh_tlb_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_mh_tlb_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBtype\fP" .br .ti -1c .RI "int \fBnum_entries\fP" .br .ti -1c .RI "int \fBpage_size\fP" .br .ti -1c .RI "int \fBassociativity\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "int PAPI_mh_tlb_info_t::type" Empty, instr, data, vector, unified .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_hw_info_t.30000600003276200002170000000514712247131120016424 0ustar ralphundrgrad.TH "PAPI_hw_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_hw_info_t \- .PP Hardware info structure\&. .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBncpu\fP" .br .ti -1c .RI "int \fBthreads\fP" .br .ti -1c .RI "int \fBcores\fP" .br .ti -1c .RI "int \fBsockets\fP" .br .ti -1c .RI "int \fBnnodes\fP" .br .ti -1c .RI "int \fBtotalcpus\fP" .br .ti -1c .RI "int \fBvendor\fP" .br .ti -1c .RI "char \fBvendor_string\fP [128]" .br .ti -1c .RI "int \fBmodel\fP" .br .ti -1c .RI "char \fBmodel_string\fP [128]" .br .ti -1c .RI "float \fBrevision\fP" .br .ti -1c .RI "int \fBcpuid_family\fP" .br .ti -1c .RI "int \fBcpuid_model\fP" .br .ti -1c .RI "int \fBcpuid_stepping\fP" .br .ti -1c .RI "int \fBcpu_max_mhz\fP" .br .ti -1c .RI "int \fBcpu_min_mhz\fP" .br .ti -1c .RI "\fBPAPI_mh_info_t\fP \fBmem_hierarchy\fP" .br .ti -1c .RI "int \fBvirtualized\fP" .br .ti -1c .RI "char \fBvirtual_vendor_string\fP [128]" .br .ti -1c .RI "char \fBvirtual_vendor_version\fP [128]" .br .ti -1c .RI "float \fBmhz\fP" .br .ti -1c .RI "int \fBclock_mhz\fP" .br .ti -1c .RI "int \fBreserved\fP [8]" .br .in -1c .SH "Field Documentation" .PP .SS "int PAPI_hw_info_t::clock_mhz" Deprecated .SS "int PAPI_hw_info_t::cores" Number of cores per socket .SS "int PAPI_hw_info_t::cpu_max_mhz" Maximum supported CPU speed .SS "int PAPI_hw_info_t::cpu_min_mhz" Minimum supported CPU speed .SS "int PAPI_hw_info_t::cpuid_family" cpuid family .SS "int PAPI_hw_info_t::cpuid_model" cpuid model .SS "int PAPI_hw_info_t::cpuid_stepping" cpuid stepping .SS "\fBPAPI_mh_info_t\fP PAPI_hw_info_t::mem_hierarchy" PAPI memory heirarchy description .SS "float PAPI_hw_info_t::mhz" Deprecated .SS "int PAPI_hw_info_t::model" Model number of CPU .SS "char PAPI_hw_info_t::model_string[128]" Model string of CPU .SS "int PAPI_hw_info_t::ncpu" Number of CPUs per NUMA Node .SS "int PAPI_hw_info_t::nnodes" Total Number of NUMA Nodes .SS "float PAPI_hw_info_t::revision" Revision of CPU .SS "int PAPI_hw_info_t::sockets" Number of sockets .SS "int PAPI_hw_info_t::threads" Number of hdw threads per core .SS "int PAPI_hw_info_t::totalcpus" Total number of CPUs in the entire system .SS "int PAPI_hw_info_t::vendor" Vendor number of CPU .SS "char PAPI_hw_info_t::vendor_string[128]" Vendor string of CPU .SS "char PAPI_hw_info_t::virtual_vendor_string[128]" Vendor for virtual machine .SS "char PAPI_hw_info_t::virtual_vendor_version[128]" Version of virtual machine .SS "int PAPI_hw_info_t::virtualized" Running in virtual machine .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_component_info.30000600003276200002170000000234712247131120020323 0ustar ralphundrgrad.TH "PAPI_get_component_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_component_info \- .PP get information about a specific software component .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @param cidx Component index This function returns a pointer to a structure containing detailed information about a specific software component in the PAPI library. This includes versioning information, preset and native event information, and more. For full details, see @ref PAPI_component_info_t. @par Examples: .fi .PP .PP .nf const PAPI_component_info_t *cmpinfo = NULL; if (PAPI_library_init(PAPI_VER_CURRENT) != PAPI_VER_CURRENT) exit(1); if ((cmpinfo = PAPI_get_component_info(0)) == NULL) exit(1); printf("This component supports %d Preset Events and %d Native events\&.\n", cmpinfo->num_preset_events, cmpinfo->num_native_events); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_executable_info\fP .PP \fBPAPI_get_hardware_info\fP .PP \fBPAPI_get_dmem_info\fP .PP \fBPAPI_get_opt\fP .PP \fBPAPI_component_info_t\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_domain_option_t.30000600003276200002170000000105412247131120017623 0ustar ralphundrgrad.TH "PAPI_domain_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_domain_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBdef_cidx\fP" .br .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "int \fBdomain\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "int PAPI_domain_option_t::def_cidx" this structure requires a component index to set default domains .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_inherit.30000600003276200002170000000100212247131120017055 0ustar ralphundrgrad.TH "PAPIF_set_inherit" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_inherit \- .PP Turn on inheriting of counts from daughter to parent process\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_inherit( C_INT inherit, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_eventset_component.30000600003276200002170000000156112247131120021222 0ustar ralphundrgrad.TH "PAPI_get_eventset_component" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_eventset_component \- .PP return index for component an eventset is assigned to .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval PAPI_ENOEVST eventset does not exist @retval PAPI_ENOCMP component is invalid or does not exist @retval positive value valid component index @param EventSet EventSet for which we want to know the component index @par Examples: .fi .PP .PP .nf int cidx,eventcode; cidx = PAPI_get_eventset_component(eventset); * .fi .PP \fBPAPI_get_eventset_component()\fP returns the component an event belongs to\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_event_component\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_remove_named_event.30000600003276200002170000000463012247131120020306 0ustar ralphundrgrad.TH "PAPI_remove_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_remove_named_event \- .PP removes a named hardware event from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf A hardware event can be either a PAPI Preset or a native hardware event code. For a list of PAPI preset events, see PAPI_presets or run the papi_avail utility in the PAPI distribution. PAPI Presets can be passed to PAPI_query_event to see if they exist on the underlying architecture. For a list of native events available on the current platform, run papi_native_avail in the PAPI distribution. @par C Interface: \#include @n int PAPI_remove_event( int EventSet, int EventCode ); @param[in] EventSet -- an integer handle for a PAPI event set as created by PAPI_create_eventset @param[in] EventName -- a defined event such as PAPI_TOT_INS or a native event. @retval PAPI_OK Everything worked. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOINIT The PAPI library has not been initialized. @retval PAPI_ENOEVST The EventSet specified does not exist. @retval PAPI_EISRUN The EventSet is currently counting events. @retval PAPI_ECNFLCT The underlying counter hardware can not count this event and other events in the EventSet simultaneously. @retval PAPI_ENOEVNT The PAPI preset is not available on the underlying hardware. @par Example: .fi .PP .PP .nf * char EventName = "PAPI_TOT_INS"; * int EventSet = PAPI_NULL; * int ret; * * // Create an empty EventSet * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_named_event(EventSet, EventName); * if (ret != PAPI_OK) handle_error(ret); * * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Stop counting, ignore values * ret = PAPI_stop(EventSet, NULL); * if (ret != PAPI_OK) handle_error(ret); * * // Remove event * ret = PAPI_remove_named_event(EventSet, EventName); * if (ret != PAPI_OK) handle_error(ret); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_remove_event\fP .br \fBPAPI_query_named_event\fP .br \fBPAPI_add_named_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_enum_event.30000600003276200002170000000101012247131117016712 0ustar ralphundrgrad.TH "PAPIF_enum_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_enum_event \- .PP Return the number of events in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_enum_event( C_INT EventCode, C_INT modifier, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_enum_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_query_named_event.30000600003276200002170000000272212247131120020156 0ustar ralphundrgrad.TH "PAPI_query_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_query_named_event \- .PP Query if a named PAPI event exists\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_query_named_event(char *EventName)\fP; .RE .PP \fBPAPI_query_named_event()\fP asks the PAPI library if the PAPI named event can be counted on this architecture\&. If the event CAN be counted, the function returns PAPI_OK\&. If the event CANNOT be counted, the function returns an error code\&. This function also can be used to check the syntax of native and user events\&. .PP \fBParameters:\fP .RS 4 \fIEventName\fP -- a defined event such as PAPI_TOT_INS\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBExamples\fP .RS 4 .PP .nf * int retval; * // Initialize the library * retval = PAPI_library_init(PAPI_VER_CURRENT); * if (retval != PAPI_VER_CURRENT) { * fprintf(stderr,\"PAPI library init error!\\n\"); * exit(1); * } * if (PAPI_query_named_event("PAPI_TOT_INS") != PAPI_OK) { * fprintf(stderr,\"No instruction counter? How lame\&.\\n\"); * exit(1); * } * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_query_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_event_code_to_name.30000600003276200002170000000105312247131117020371 0ustar ralphundrgrad.TH "PAPIF_event_code_to_name" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_event_code_to_name \- .PP Convert a numeric hardware event code to a name\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_event_code_to_name( C_INT EventCode, C_STRING EventName, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_code_to_name\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_virt_cyc.30000600003276200002170000000076612247131117017247 0ustar ralphundrgrad.TH "PAPIF_get_virt_cyc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_virt_cyc \- .PP Get virtual time counter value in clock cycles\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_virt_cyc( C_LONG_LONG virt_cyc )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_virt_cyc\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_real_nsec.30000600003276200002170000000076212247131117017354 0ustar ralphundrgrad.TH "PAPIF_get_real_nsec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_real_nsec \- .PP Get real time counter value in nanoseconds\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_real_nsec( C_LONG_LONG time )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_real_nsec\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_overflow_event_index.30000600003276200002170000000344312247131120021537 0ustar ralphundrgrad.TH "PAPI_get_overflow_event_index" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_overflow_event_index \- .PP converts an overflow vector into an array of indexes to overflowing events .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @param EventSet an integer handle to a PAPI event set as created by PAPI_create_eventset @param overflow_vector a vector with bits set for each counter that overflowed. This vector is passed by the system to the overflow handler routine. @param *array an array of indexes for events in EventSet. No more than *number indexes will be stored into the array. @param *number On input the variable determines the size of the array. On output the variable contains the number of indexes in the array. @retval PAPI_EINVAL One or more of the arguments is invalid. This could occur if the overflow_vector is empty (zero), if the array or number pointers are NULL, if the value of number is less than one, or if the EventSet is empty. @retval PAPI_ENOEVST The EventSet specified does not exist. @par Examples .fi .PP .PP .nf void handler(int EventSet, void *address, long_long overflow_vector, void *context){ int Events[4], number, i; int total = 0, retval; printf("Overflow #%d\n Handler(%d) Overflow at %p! vector=0x%llx\n", total, EventSet, address, overflow_vector); total++; number = 4; retval = PAPI_get_overflow_event_index(EventSet, overflow_vector, Events, &number); if(retval == PAPI_OK) for(i=0; i .br int \fBPAPI_list_events(int *EventSet, int *Events, int *number )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP An integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fI*Events\fP A pointer to a preallocated array of codes for events, such as PAPI_INT_INS\&. No more than *number codes will be stored into the array\&. .br \fI*number\fP On input, the size of the Events array, or maximum number of event codes to be returned\&. A value of 0 can be used to probe an event set\&. On output, the number of events actually in the event set\&. This value may be greater than the actually stored number of event codes\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP .br \fIPAPI_ENOEVST\fP .RE .PP \fBExamples:\fP .RS 4 .PP .nf if (PAPI_event_name_to_code("PAPI_TOT_INS",&EventCode) != PAPI_OK) exit(1); if (PAPI_add_event(EventSet, EventCode) != PAPI_OK) exit(1); Convert a second event name to an event code if (PAPI_event_name_to_code("PAPI_L1_LDM",&EventCode) != PAPI_OK) exit(1); if (PAPI_add_event(EventSet, EventCode) != PAPI_OK) exit(1); number = 0; if(PAPI_list_events(EventSet, NULL, &number)) exit(1); if(number != 2) exit(1); if(PAPI_list_events(EventSet, Events, &number)) exit(1); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_code_to_name\fP .PP \fBPAPI_event_name_to_code\fP .PP \fBPAPI_add_event\fP .PP \fBPAPI_create_eventset\fP .RE .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPI_list_events\fP( C_INT EventSet, C_INT(*) Events, C_INT number, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_list_events\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_remove_event.30000600003276200002170000000454112247131120017143 0ustar ralphundrgrad.TH "PAPI_remove_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_remove_event \- .PP removes a hardware event from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf A hardware event can be either a PAPI Preset or a native hardware event code. For a list of PAPI preset events, see PAPI_presets or run the papi_avail utility in the PAPI distribution. PAPI Presets can be passed to PAPI_query_event to see if they exist on the underlying architecture. For a list of native events available on the current platform, run papi_native_avail in the PAPI distribution. @par C Interface: \#include @n int PAPI_remove_event( int EventSet, int EventCode ); @param[in] EventSet -- an integer handle for a PAPI event set as created by PAPI_create_eventset @param[in] EventCode -- a defined event such as PAPI_TOT_INS or a native event. @retval PAPI_OK Everything worked. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOEVST The EventSet specified does not exist. @retval PAPI_EISRUN The EventSet is currently counting events. @retval PAPI_ECNFLCT The underlying counter hardware can not count this event and other events in the EventSet simultaneously. @retval PAPI_ENOEVNT The PAPI preset is not available on the underlying hardware. @par Example: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * int ret; * * // Create an empty EventSet * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) handle_error(ret); * * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Stop counting, ignore values * ret = PAPI_stop(EventSet, NULL); * if (ret != PAPI_OK) handle_error(ret); * * // Remove event * ret = PAPI_remove_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) handle_error(ret); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_cleanup_eventset\fP .PP \fBPAPI_destroy_eventset\fP .PP \fBPAPI_event_name_to_code\fP .PP PAPI_presets .PP \fBPAPI_add_event\fP .PP \fBPAPI_add_events\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_domain.30000600003276200002170000000101712247131117016662 0ustar ralphundrgrad.TH "PAPIF_get_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_domain \- .PP Get the domain setting for the specified EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_domain( C_INT eventset, C_INT domain, C_INT mode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_multiplex_init.30000600003276200002170000000076612247131120017626 0ustar ralphundrgrad.TH "PAPIF_multiplex_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_multiplex_init \- .PP Initialize multiplex support in the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_multiplex_init( C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_multiplex_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_mh_info_t.30000600003276200002170000000060612247131120016405 0ustar ralphundrgrad.TH "PAPI_mh_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_mh_info_t \- .PP mh for mem hierarchy maybe? .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBlevels\fP" .br .ti -1c .RI "\fBPAPI_mh_level_t\fP \fBlevel\fP [4]" .br .in -1c .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_opt.30000600003276200002170000001036612247131120016110 0ustar ralphundrgrad.TH "PAPI_get_opt" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_opt \- .PP Get PAPI library or event set options\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_get_opt( int option, PAPI_option_t * ptr )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIoption\fP Defines the option to get\&. Possible values are briefly described in the table below\&. .br \fIptr\fP Pointer to a structure determined by the selected option\&. See \fBPAPI_option_t\fP for a description of possible structures\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP The specified option or parameter is invalid\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_ECMP\fP The option is not implemented for the current component\&. .br \fIPAPI_ENOINIT\fP PAPI has not been initialized\&. .RE .PP \fBPAPI_get_opt()\fP queries the options of the PAPI library or a specific event set created by \fBPAPI_create_eventset\fP\&. Some options may require that the eventset be bound to a component before they can execute successfully\&. This can be done either by adding an event or by explicitly calling \fBPAPI_assign_eventset_component\fP\&. .PP Ptr is a pointer to the \fBPAPI_option_t\fP structure, which is actually a union of different structures for different options\&. Not all options require or return information in these structures\&. Each returns different values in the structure\&. Some options require a component index to be provided\&. These options are handled explicitly by the \fBPAPI_get_cmp_opt()\fP call\&. .PP \fBNote:\fP .RS 4 Some options, such as PAPI_DOMAIN and PAPI_MULTIPLEX are also available as separate entry points in both C and Fortran\&. .RE .PP The reader is encouraged to peruse the ctests code in the PAPI distribution for examples of usage of \fBPAPI_set_opt\fP\&. .PP \fBPossible values for the PAPI_get_opt option parameter\fP .RS 4 OPTION DEFINITION PAPI_DEFDOM Get default counting domain for newly created event sets. Requires a component index. PAPI_DEFGRN Get default counting granularity. Requires a component index. PAPI_DEBUG Get the PAPI debug state and the debug handler. The debug state is specified in ptr->debug.level. The debug handler is specified in ptr->debug.handler. For further information regarding debug states and the behavior of the handler, see PAPI_set_debug. PAPI_MULTIPLEX Get current multiplexing state for specified EventSet. PAPI_DEF_ITIMER Get the type of itimer used in software multiplexing, overflowing and profiling. PAPI_DEF_MPX_NS Get the sampling time slice in nanoseconds for multiplexing and overflow. PAPI_DEF_ITIMER_NS See PAPI_DEF_MPX_NS. PAPI_ATTACH Get thread or process id to which event set is attached. Returns TRUE if currently attached. PAPI_CPU_ATTACH Get ptr->cpu.cpu_num and Attach state for EventSet specified in ptr->cpu.eventset. PAPI_DETACH Get thread or process id to which event set is attached. Returns TRUE if currently attached. PAPI_DOMAIN Get domain for EventSet specified in ptr->domain.eventset. Will error if eventset is not bound to a component. PAPI_GRANUL Get granularity for EventSet specified in ptr->granularity.eventset. Will error if eventset is not bound to a component. PAPI_INHERIT Get current inheritance state for specified EventSet. PAPI_PRELOAD Get LD_PRELOAD environment equivalent. PAPI_CLOCKRATE Get clockrate in MHz. PAPI_MAX_CPUS Get number of CPUs. PAPI_EXEINFO Get Executable addresses for text/data/bss. PAPI_HWINFO Get information about the hardware. PAPI_LIB_VERSION Get the full PAPI version of the library. PAPI_MAX_HWCTRS Get number of counters. Requires a component index. PAPI_MAX_MPX_CTRS Get maximum number of multiplexing counters. Requires a component index. PAPI_SHLIBINFO Get shared library information used by the program. PAPI_COMPONENTINFO Get the PAPI features the specified component supports. Requires a component index. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_multiplex\fP .PP \fBPAPI_get_cmp_opt\fP .PP \fBPAPI_set_opt\fP .PP \fBPAPI_option_t\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_cmp_domain.30000600003276200002170000000106412247131120017531 0ustar ralphundrgrad.TH "PAPIF_set_cmp_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_cmp_domain \- .PP Set the default counting domain for new event sets bound to the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_cmp_domain( C_INT domain, C_INT cidx, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_cmp_domain\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_mh_cache_info_t.30000600003276200002170000000113512247131120017526 0ustar ralphundrgrad.TH "PAPI_mh_cache_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_mh_cache_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBtype\fP" .br .ti -1c .RI "int \fBsize\fP" .br .ti -1c .RI "int \fBline_size\fP" .br .ti -1c .RI "int \fBnum_lines\fP" .br .ti -1c .RI "int \fBassociativity\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "int PAPI_mh_cache_info_t::type" Empty, instr, data, vector, trace, unified .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_thread_init.30000600003276200002170000000251112247131120016732 0ustar ralphundrgrad.TH "PAPI_thread_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_thread_init \- .PP Initialize thread support in the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @param *id_fn Pointer to a function that returns current thread ID. PAPI_thread_init initializes thread support in the PAPI library. Applications that make no use of threads do not need to call this routine. This function MUST return a UNIQUE thread ID for every new thread/LWP created. The OpenMP call omp_get_thread_num() violates this rule, as the underlying LWPs may have been killed off by the run-time system or by a call to omp_set_num_threads() . In that case, it may still possible to use omp_get_thread_num() in conjunction with PAPI_unregister_thread() when the OpenMP thread has finished. However it is much better to use the underlying thread subsystem's call, which is pthread_self() on Linux platforms. .fi .PP .PP .PP .nf if ( PAPI_thread_init(pthread_self) != PAPI_OK ) exit(1); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_register_thread\fP \fBPAPI_unregister_thread\fP \fBPAPI_get_thr_specific\fP \fBPAPI_set_thr_specific\fP \fBPAPI_thread_id\fP \fBPAPI_list_threads\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_debug.30000600003276200002170000000413612247131120016406 0ustar ralphundrgrad.TH "PAPI_set_debug" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_debug \- .PP Set the current debug level for error output from PAPI\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Prototype:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_set_debug( int level )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIlevel\fP one of the constants shown in the table below and defined in the \fBpapi\&.h\fP header file\&. .br The possible debug levels for debugging are shown below\&. .PD 0 .IP "\(bu" 2 PAPI_QUIET Do not print anything, just return the error code .IP "\(bu" 2 PAPI_VERB_ECONT Print error message and continue .IP "\(bu" 2 PAPI_VERB_ESTOP Print error message and exit .br .PP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP The debug level is invalid\&. .br .br The current debug level is used by both the internal error and debug message handler subroutines\&. .br The debug handler is only used if the library was compiled with -DDEBUG\&. .br The debug handler is called when there is an error upon a call to the PAPI API\&. .br The error handler is always active and its behavior cannot be modified except for whether or not it prints anything\&. .RE .PP The default PAPI debug handler prints out messages in the following form: .br PAPI Error: Error Code code, symbol, description .PP If the error was caused from a system call and the return code is PAPI_ESYS, the message will have a colon space and the error string as reported by strerror() appended to the end\&. .PP The PAPI error handler prints out messages in the following form: .br PAPI Error: message\&. .br .PP \fBNote:\fP .RS 4 This is the ONLY function that may be called BEFORE \fBPAPI_library_init()\fP\&. .br .RE .PP \fBExample:\fP .RS 4 .PP .nf int ret; ret = PAPI_set_debug(PAPI_VERB_ECONT); if ( ret != PAPI_OK ) handle_error(); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_library_init\fP .PP \fBPAPI_get_opt\fP .PP \fBPAPI_set_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_thr_specific.30000600003276200002170000000371512247131120017750 0ustar ralphundrgrad.TH "PAPI_get_thr_specific" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_thr_specific \- .PP Retrieve a pointer to a thread specific data structure\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par Prototype: \#include @n int PAPI_get_thr_specific( int tag, void **ptr ); @param tag An identifier, the value of which is either PAPI_USR1_TLS or PAPI_USR2_TLS. This identifier indicates which of several data structures associated with this thread is to be accessed. @param ptr A pointer to the memory containing the data structure. @retval PAPI_OK @retval PAPI_EINVAL The @em tag argument is out of range. In C, PAPI_get_thr_specific PAPI_get_thr_specific will retrieve the pointer from the array with index @em tag. There are 2 user available locations and @em tag can be either PAPI_USR1_TLS or PAPI_USR2_TLS. The array mentioned above is managed by PAPI and allocated to each thread which has called PAPI_thread_init. There is no Fortran equivalent function. @par Example: .fi .PP .PP .nf int ret; HighLevelInfo *state = NULL; ret = PAPI_thread_init(pthread_self); if (ret != PAPI_OK) handle_error(ret); // Do we have the thread specific data setup yet? ret = PAPI_get_thr_specific(PAPI_USR1_TLS, (void *) &state); if (ret != PAPI_OK || state == NULL) { state = (HighLevelInfo *) malloc(sizeof(HighLevelInfo)); if (state == NULL) return (PAPI_ESYS); memset(state, 0, sizeof(HighLevelInfo)); state->EventSet = PAPI_NULL; ret = PAPI_create_eventset(&state->EventSet); if (ret != PAPI_OK) return (PAPI_ESYS); ret = PAPI_set_thr_specific(PAPI_USR1_TLS, state); if (ret != PAPI_OK) return (ret); } * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_register_thread\fP \fBPAPI_thread_init\fP \fBPAPI_thread_id\fP \fBPAPI_set_thr_specific\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_query_event.30000600003276200002170000000074412247131120017122 0ustar ralphundrgrad.TH "PAPIF_query_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_query_event \- .PP Query if PAPI event exists\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_query_event(C_INT EventCode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_query_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_is_initialized.30000600003276200002170000000267612247131120017454 0ustar ralphundrgrad.TH "PAPI_is_initialized" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_is_initialized \- .PP check for initialization .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval PAPI_NOT_INITED Library has not been initialized @retval PAPI_LOW_LEVEL_INITED Low level has called library init @retval PAPI_HIGH_LEVEL_INITED High level has called library init @retval PAPI_THREAD_LEVEL_INITED Threads have been inited @param version upon initialization, PAPI checks the argument against the internal value of PAPI_VER_CURRENT when the library was compiled. This guards against portability problems when updating the PAPI shared libraries on your system. @par Examples: .fi .PP .PP .nf int retval; retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT && retval > 0) { fprintf(stderr,"PAPI library version mismatch!\en"); exit(1); } if (retval < 0) handle_error(retval); retval = PAPI_is_initialized(); if (retval != PAPI_LOW_LEVEL_INITED) handle_error(retval); * .fi .PP \fBPAPI_is_initialized()\fP returns the status of the PAPI library\&. The PAPI library can be in one of four states, as described under RETURN VALUES\&. .PP \fBSee Also:\fP .RS 4 PAPI .PP \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_event_info.30000600003276200002170000000115212247131117017547 0ustar ralphundrgrad.TH "PAPIF_get_event_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_event_info \- .PP Get the event's name and description info\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_event_info\fP(C_INT EventCode, C_STRING symbol, C_STRING long_descr, C_STRING short_descr, C_INT count, C_STRING event_note, C_INT flags, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_event_info\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_flops.30000600003276200002170000000110712247131117015677 0ustar ralphundrgrad.TH "PAPIF_flops" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_flops \- .PP Simplified call to get Mflops/s (floating point instruction rate), real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_flops( C_FLOAT real_time, C_FLOAT proc_time, C_LONG_LONG flpops, C_FLOAT mflops, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_flops\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_stop.30000600003276200002170000000352512247131120015433 0ustar ralphundrgrad.TH "PAPI_stop" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_stop \- .PP Stop counting hardware events in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_stop( int EventSet, long long * values )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fIvalues\fP -- an array to hold the counter values of the counting events .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_ENOTRUN\fP The EventSet is currently not running\&. .RE .PP \fBPAPI_stop\fP halts the counting of a previously defined event set and the counter values contained in that EventSet are copied into the values array Assumes an initialized PAPI library and a properly added event set\&. .PP \fBExample:\fP .RS 4 .PP .nf * int EventSet = PAPI_NULL; * long long values[2]; * int ret; * * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) handle_error(ret); * * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * poorly_tuned_function(); * ret = PAPI_stop(EventSet, values); * if (ret != PAPI_OK) handle_error(ret); * printf("%lld\\n",values[0]); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_create_eventset\fP \fBPAPI_start\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_reset.30000600003276200002170000000300412247131120015560 0ustar ralphundrgrad.TH "PAPI_reset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_reset \- .PP Reset the hardware event counts in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Prototype: \#include @n int PAPI_reset( int EventSet ); @param EventSet an integer handle for a PAPI event set as created by PAPI_create_eventset @retval PAPI_OK @retval PAPI_ESYS A system or C library call failed inside PAPI, see the errno variable. @retval PAPI_ENOEVST The EventSet specified does not exist. PAPI_reset() zeroes the values of the counters contained in EventSet. This call assumes an initialized PAPI library and a properly added event set @par Example: .fi .PP .PP .nf int EventSet = PAPI_NULL; int Events[] = {PAPI_TOT_INS, PAPI_FP_OPS}; int ret; // Create an empty EventSet ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); // Add two events to our EventSet ret = PAPI_add_events(EventSet, Events, 2); if (ret != PAPI_OK) handle_error(ret); // Start counting ret = PAPI_start(EventSet); if (ret != PAPI_OK) handle_error(ret); // Stop counting, ignore values ret = PAPI_stop(EventSet, NULL); if (ret != PAPI_OK) handle_error(ret); // reset the counters in this EventSet ret = PAPI_reset(EventSet); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_create_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_remove_events.30000600003276200002170000000504612247131120017327 0ustar ralphundrgrad.TH "PAPI_remove_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_remove_events \- .PP Remove an array of hardware event codes from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP A hardware event can be either a PAPI Preset or a native hardware event code\&. For a list of PAPI preset events, see PAPI_presets or run the papi_avail utility in the PAPI distribution\&. PAPI Presets can be passed to \fBPAPI_query_event\fP to see if they exist on the underlying architecture\&. For a list of native events available on current platform, run papi_native_avail in the PAPI distribution\&. It should be noted that \fBPAPI_remove_events\fP can partially succeed, exactly like \fBPAPI_add_events\fP\&. .PP \fBC Prototype:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_remove_events( int EventSet, int * EventCode, int number )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fI*Events\fP an array of defined events .br \fInumber\fP an integer indicating the number of events in the array *EventCode .RE .PP \fBReturn values:\fP .RS 4 \fIPositive\fP integer The number of consecutive elements that succeeded before the error\&. .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_ECNFLCT\fP The underlying counter hardware can not count this event and other events in the EventSet simultaneously\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBExample:\fP .RS 4 .PP .nf int EventSet = PAPI_NULL; int Events[] = {PAPI_TOT_INS, PAPI_FP_OPS}; int ret; // Create an empty EventSet ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); // Add two events to our EventSet ret = PAPI_add_events(EventSet, Events, 2); if (ret != PAPI_OK) handle_error(ret); // Start counting ret = PAPI_start(EventSet); if (ret != PAPI_OK) handle_error(ret); // Stop counting, ignore values ret = PAPI_stop(EventSet, NULL); if (ret != PAPI_OK) handle_error(ret); // Remove event ret = PAPI_remove_events(EventSet, Events, 2); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP .PP .nf @see PAPI_cleanup_eventset PAPI_destroy_eventset PAPI_event_name_to_code PAPI_presets PAPI_add_event PAPI_add_events.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_remove_named_event.30000600003276200002170000000105612247131120020413 0ustar ralphundrgrad.TH "PAPIF_remove_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_remove_named_event \- .PP Remove a named hardware event from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_remove_named_event( C_INT EventSet, C_STRING EventName, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_remove_named_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_dmem_info.30000600003276200002170000000235012247131120017235 0ustar ralphundrgrad.TH "PAPI_get_dmem_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_dmem_info \- .PP Get information about the dynamic memory usage of the current program\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Prototype:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_get_dmem_info( PAPI_dmem_info_t *dest )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIdest\fP structure to be filled in \fBPAPI_dmem_info_t\fP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_ECMP\fP The funtion is not implemented for the current component\&. .br \fIPAPI_EINVAL\fP Any value in the structure or array may be undefined as indicated by this error value\&. .br \fIPAPI_SYS\fP A system error occured\&. .RE .PP \fBNote:\fP .RS 4 This function is only implemented for the Linux operating system\&. This function takes a pointer to a \fBPAPI_dmem_info_t\fP structure and returns with the structure fields filled in\&. A value of PAPI_EINVAL in any field indicates an undefined parameter\&. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_executable_info\fP \fBPAPI_get_hardware_info\fP \fBPAPI_get_opt\fP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_accum_counters.30000600003276200002170000000305212247131120017453 0ustar ralphundrgrad.TH "PAPI_accum_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_accum_counters \- .PP Accumulate and reset counters\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_accum_counters( long long *values, int array_len ); .fi .PP .PP \fBParameters:\fP .RS 4 \fI*values\fP an array to hold the counter values of the counting events .br \fIarry_len\fP the number of items in the *events array .RE .PP \fBPrecondition:\fP .RS 4 These calls assume an initialized PAPI library and a properly added event set\&. .RE .PP \fBPostcondition:\fP .RS 4 The counters are reset and left running after the call\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .RE .PP \fBPAPI_accum_counters()\fP adds the event counters into the array *values\&. .PP .PP .nf do_100events(); if ( PAPI_read_counters( values, num_hwcntrs ) != PAPI_OK ) handlw_error(1); // values[0] now equals 100 do_100events(); if ( PAPI_accum_counters( values, num_hwcntrs ) != PAPI_OK ) handle_error(1); // values[0] now equals 200 values[0] = -100; do_100events(); if ( PAPI_accum_counters(values, num_hwcntrs ) != PAPI_OK ) handle_error(); // values[0] now equals 0 * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt()\fP \fBPAPI_start_counters()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_granularity_option_t.30000600003276200002170000000110412247131120020711 0ustar ralphundrgrad.TH "PAPI_granularity_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_granularity_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBdef_cidx\fP" .br .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "int \fBgranularity\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "int PAPI_granularity_option_t::def_cidx" this structure requires a component index to set default granularity .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_shutdown.30000600003276200002170000000141312247131120016313 0ustar ralphundrgrad.TH "PAPI_shutdown" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_shutdown \- .PP Finish using PAPI and free all related resources\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Prototype:\fP .RS 4 #include <\fBpapi\&.h\fP> .br void \fBPAPI_shutdown( void )\fP; .RE .PP \fBPAPI_shutdown()\fP is an exit function used by the PAPI Library to free resources and shut down when certain error conditions arise\&. It is not necessary for the user to call this function, but doing so allows the user to have the capability to free memory and resources used by the PAPI Library\&. .PP \fBSee Also:\fP .RS 4 PAPI_init_library .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_flips.30000600003276200002170000000372012247131120015560 0ustar ralphundrgrad.TH "PAPI_flips" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_flips \- .PP Simplified call to get Mflips/s (floating point instruction rate), real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface: \fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_flips( float *rtime, float *ptime, long long *flpins, float *mflips )\fP; .RE .PP \fBParameters:\fP .RS 4 \fI*rtime\fP total realtime since the first call .br \fI*ptime\fP total process time since the first call .br \fI*flpins\fP total floating point instructions since the first call .br \fI*mflips\fP incremental (Mega) floating point instructions per seconds since the last call .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP The counters were already started by something other than \fBPAPI_flips()\fP\&. .br \fIPAPI_ENOEVNT\fP The floating point instructions event does not exist\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .RE .PP The first call to \fBPAPI_flips()\fP will initialize the PAPI High Level interface, set up the counters to monitor the PAPI_FP_INS event and start the counters\&. .PP Subsequent calls will read the counters and return total real time, total process time, total floating point instructions since the start of the measurement and the Mflip/s rate since latest call to \fBPAPI_flips()\fP\&. A call to \fBPAPI_stop_counters()\fP will stop the counters from running and then calls such as \fBPAPI_start_counters()\fP or other rate calls can safely be used\&. .PP \fBPAPI_flips\fP returns information related to floating point instructions using the PAPI_FP_INS event\&. This is intended to measure instruction rate through the floating point pipe with no massaging\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_flops()\fP .PP \fBPAPI_ipc()\fP .PP \fBPAPI_epc()\fP .PP \fBPAPI_stop_counters()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_address_map_t.30000600003276200002170000000215112247131120017245 0ustar ralphundrgrad.TH "PAPI_address_map_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_address_map_t \- .PP get the executable's address space info .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "char \fBname\fP [1024]" .br .ti -1c .RI "caddr_t \fBtext_start\fP" .br .ti -1c .RI "caddr_t \fBtext_end\fP" .br .ti -1c .RI "caddr_t \fBdata_start\fP" .br .ti -1c .RI "caddr_t \fBdata_end\fP" .br .ti -1c .RI "caddr_t \fBbss_start\fP" .br .ti -1c .RI "caddr_t \fBbss_end\fP" .br .in -1c .SH "Field Documentation" .PP .SS "caddr_t PAPI_address_map_t::bss_end" End address of program bss segment .SS "caddr_t PAPI_address_map_t::bss_start" Start address of program bss segment .SS "caddr_t PAPI_address_map_t::data_end" End address of program data segment .SS "caddr_t PAPI_address_map_t::data_start" Start address of program data segment .SS "caddr_t PAPI_address_map_t::text_end" End address of program text segment .SS "caddr_t PAPI_address_map_t::text_start" Start address of program text segment .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_inherit_option_t.30000600003276200002170000000060012247131120020012 0ustar ralphundrgrad.TH "PAPI_inherit_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_inherit_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "int \fBinherit\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_num_counters.30000600003276200002170000000077612247131120017302 0ustar ralphundrgrad.TH "PAPIF_num_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_num_counters \- .PP Get the number of hardware counters available on the system\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_num_counters( C_INT numevents )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_num_counters\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_ipc.30000600003276200002170000000355612247131120015225 0ustar ralphundrgrad.TH "PAPI_ipc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_ipc \- .PP Simplified call to get instructions per cycle, real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface: \fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_ipc( float *rtime, float *ptime, long long *ins, float *ipc )\fP; .RE .PP \fBParameters:\fP .RS 4 \fI*rtime\fP total realtime since the first call .br \fI*ptime\fP total process time since the first call .br \fI*ins\fP total instructions since the first call .br \fI*ipc\fP incremental instructions per cycle since the last call .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP The counters were already started by something other than \fBPAPI_ipc()\fP\&. .br \fIPAPI_ENOEVNT\fP The floating point operations event does not exist\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .RE .PP The first call to \fBPAPI_ipc()\fP will initialize the PAPI High Level interface, set up the counters to monitor PAPI_TOT_INS and PAPI_TOT_CYC events and start the counters\&. .PP Subsequent calls will read the counters and return total real time, total process time, total instructions since the start of the measurement and the IPC rate since the latest call to \fBPAPI_ipc()\fP\&. .PP A call to \fBPAPI_stop_counters()\fP will stop the counters from running and then calls such as \fBPAPI_start_counters()\fP or other rate calls can safely be used\&. .PP \fBPAPI_ipc\fP should return a ratio greater than 1\&.0, indicating instruction level parallelism within the chip\&. The larger this ratio the more effeciently the program is running\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_flips()\fP .PP \fBPAPI_flops()\fP .PP \fBPAPI_epc()\fP .PP \fBPAPI_stop_counters()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_domain.30000600003276200002170000000102212247131120016664 0ustar ralphundrgrad.TH "PAPIF_set_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_domain \- .PP Set the default counting domain for new event sets bound to the cpu component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_domain( C_INT domain, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_domain\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_reset.30000600003276200002170000000074212247131120015674 0ustar ralphundrgrad.TH "PAPIF_reset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_reset \- .PP Reset the hardware event counts in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_reset( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_reset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_stop_counters.30000600003276200002170000000104112247131120017452 0ustar ralphundrgrad.TH "PAPIF_stop_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_stop_counters \- .PP Stop counting hardware events and reset values to zero\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_stop_counters\fP( C_LONG_LONG(*) values, C_INT array_len, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_stop_counters\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_num_hwctrs.30000600003276200002170000000100512247131120016734 0ustar ralphundrgrad.TH "PAPIF_num_hwctrs" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_num_hwctrs \- .PP Return the number of hardware counters on the cpu\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_num_hwctrs( C_INT num )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_num_hwctrs\fP .PP \fBPAPI_num_cmp_hwctrs\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_list_threads.30000600003276200002170000000242312247131120017127 0ustar ralphundrgrad.TH "PAPI_list_threads" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_list_threads \- .PP List the registered thread ids\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP \fBPAPI_list_threads()\fP returns to the caller a list of all thread IDs known to PAPI\&. .PP This call assumes an initialized PAPI library\&. .PP \fBC Interface\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_list_threads(PAPI_thread_id_t *tids, int * number )\fP; .RE .PP \fBParameters:\fP .RS 4 \fI*tids\fP -- A pointer to a preallocated array\&. This may be NULL to only return a count of threads\&. No more than *number codes will be stored in the array\&. .br \fI*number\fP -- An input and output parameter\&. Input specifies the number of allocated elements in *tids (if non-NULL) and output specifies the number of threads\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP The call returned successfully\&. .br \fIPAPI_EINVAL\fP *number has an improper value .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_thr_specific\fP .PP \fBPAPI_set_thr_specific\fP .PP \fBPAPI_register_thread\fP .PP \fBPAPI_unregister_thread\fP .PP \fBPAPI_thread_init\fP \fBPAPI_thread_id\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_multiplex_init.30000600003276200002170000000201212247131120017502 0ustar ralphundrgrad.TH "PAPI_multiplex_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_multiplex_init \- .PP Initialize multiplex support in the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf PAPI_multiplex_init() enables and initializes multiplex support in the PAPI library. Multiplexing allows a user to count more events than total physical counters by time sharing the existing counters at some loss in precision. Applications that make no use of multiplexing do not need to call this routine. .fi .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_multiplex_init\fP (void); .RE .PP \fBExamples\fP .RS 4 .PP .nf * retval = PAPI_multiplex_init(); * .fi .PP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP This call always returns PAPI_OK .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_multiplex\fP .PP \fBPAPI_get_multiplex\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_attach_option_t.30000600003276200002170000000060412247131120017620 0ustar ralphundrgrad.TH "PAPI_attach_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_attach_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "unsigned long \fBtid\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_assign_eventset_component.30000600003276200002170000000111112247131117022032 0ustar ralphundrgrad.TH "PAPIF_assign_eventset_component" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_assign_eventset_component \- .PP assign a component index to an existing but empty EventSet .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_assign_eventset_component( C_INT EventSet, C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_assign_eventset_component\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_destroy_eventset.30000600003276200002170000000264112247131120020052 0ustar ralphundrgrad.TH "PAPI_destroy_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_destroy_eventset \- .PP Empty and destroy an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_destroy_eventset( int * EventSet ); .fi .PP .PP \fBPAPI_destroy_eventset\fP deallocates the memory associated with an empty PAPI EventSet\&. .PP \fBParameters:\fP .RS 4 \fI*EventSet\fP A pointer to the integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP\&. The value pointed to by EventSet is then set to PAPI_NULL on success\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. Attempting to destroy a non-empty event set or passing in a null pointer to be destroyed\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_EBUG\fP Internal error, send mail to ptools-perfapi@ptools.org and complain\&. .RE .PP \fBExamples:\fP .RS 4 .PP .nf * // Free all memory and data structures, EventSet must be empty\&. * if ( PAPI_destroy_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .RE .PP .PP .nf @see PAPI_profil @n PAPI_create_eventset @n PAPI_add_event @n PAPI_stop.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_strerror.30000600003276200002170000000320512247131120016323 0ustar ralphundrgrad.TH "PAPI_strerror" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_strerror \- .PP Returns a string describing the PAPI error code\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br char * \fBPAPI_strerror( int errorCode )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIcode\fP -- the error code to interpret .RE .PP \fBReturn values:\fP .RS 4 \fI*error\fP -- a pointer to the error string\&. .br \fINULL\fP -- the input error code to \fBPAPI_strerror()\fP is invalid\&. .RE .PP \fBPAPI_strerror()\fP returns a pointer to the error message corresponding to the error code code\&. If the call fails the function returns the NULL pointer\&. This function is not implemented in Fortran\&. .PP \fBExample:\fP .RS 4 .PP .nf * int ret; * int EventSet = PAPI_NULL; * int native = 0x0; * char error_str[PAPI_MAX_STR_LEN]; * * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) * { * fprintf(stderr, "PAPI error %d: %s\n", ret, PAPI_strerror(retval)); * exit(1); * } * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) * { * PAPI_perror( "PAPI_add_event"); * fprintf(stderr,"PAPI_error %d: %s\n", ret, error_str); * exit(1); * } * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_perror\fP \fBPAPI_set_opt\fP \fBPAPI_get_opt\fP \fBPAPI_shutdown\fP \fBPAPI_set_debug\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_enum_event.30000600003276200002170000001051212247131120016605 0ustar ralphundrgrad.TH "PAPI_enum_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_enum_event \- .PP Enumerate PAPI preset or native events\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_enum_event( int * EventCode, int modifer ); Given a preset or native event code, PAPI_enum_event replaces the event code with the next available event in either the preset or native table. The modifier argument affects which events are returned. For all platforms and event types, a value of PAPI_ENUM_ALL (zero) directs the function to return all possible events. @n For preset events, a TRUE (non-zero) value currently directs the function to return event codes only for PAPI preset events available on this platform. This may change in the future. For native events, the effect of the modifier argument is different on each platform. See the discussion below for platform-specific definitions. @param *EventCode A defined preset or native event such as PAPI_TOT_INS. @param modifier Modifies the search logic. See below for full list. For native events, each platform behaves differently. See platform-specific documentation for details. @retval PAPI_ENOEVNT The next requested PAPI preset or native event is not available on the underlying hardware. @par Examples: .fi .PP .PP .nf * // Scan for all supported native events on this platform * printf( "Name\t\t\t Code\t Description\n" ); * do { * retval = PAPI_get_event_info( i, &info ); * if ( retval == PAPI_OK ) { * printf( "%-30s 0x%-10x\n%s\n", info\&.symbol, info\&.event_code, info\&.long_descr ); * } * } while ( PAPI_enum_event( &i, PAPI_ENUM_ALL ) == PAPI_OK ); * .fi .PP .PP \fBGeneric Modifiers\fP .RS 4 The following values are implemented for preset events .PD 0 .IP "\(bu" 2 PAPI_ENUM_EVENTS -- Enumerate all (default) .IP "\(bu" 2 PAPI_ENUM_FIRST -- Enumerate first event (preset or native) preset/native chosen based on type of EventCode .PP .RE .PP \fBNative Modifiers\fP .RS 4 The following values are implemented for native events .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_UMASKS -- Given an event, iterate through possible umasks one at a time .IP "\(bu" 2 PAPI_NTV_ENUM_UMASK_COMBOS -- Given an event, iterate through all possible combinations of umasks\&. This is not implemented on libpfm4\&. .PP .RE .PP \fBPreset Modifiers\fP .RS 4 The following values are implemented for preset events .PD 0 .IP "\(bu" 2 PAPI_PRESET_ENUM_AVAIL -- enumerate only available presets .IP "\(bu" 2 PAPI_PRESET_ENUM_MSC -- Miscellaneous preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_INS -- Instruction related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_IDL -- Stalled or Idle preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_BR -- Branch related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_CND -- Conditional preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_MEM -- Memory related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_CACH -- Cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L1 -- L1 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L2 -- L2 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L3 -- L3 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_TLB -- Translation Lookaside Buffer events .IP "\(bu" 2 PAPI_PRESET_ENUM_FP -- Floating Point related preset events .PP .RE .PP \fBITANIUM Modifiers\fP .RS 4 The following values are implemented for modifier on Itanium: .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_IARR - Enumerate IAR (instruction address ranging) events .IP "\(bu" 2 PAPI_NTV_ENUM_DARR - Enumerate DAR (data address ranging) events .IP "\(bu" 2 PAPI_NTV_ENUM_OPCM - Enumerate OPC (opcode matching) events .IP "\(bu" 2 PAPI_NTV_ENUM_IEAR - Enumerate IEAR (instr event address register) events .IP "\(bu" 2 PAPI_NTV_ENUM_DEAR - Enumerate DEAR (data event address register) events .PP .RE .PP \fBPOWER Modifiers\fP .RS 4 The following values are implemented for POWER .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_GROUPS - Enumerate groups to which an event belongs .PP .RE .PP \fBSee Also:\fP .RS 4 PAPI .br PAPIF .br \fBPAPI_enum_cmp_event\fP .br \fBPAPI_get_event_info\fP .br \fBPAPI_event_name_to_code\fP .br PAPI_preset .br PAPI_native .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_event_name_to_code.30000600003276200002170000000332112247131120020255 0ustar ralphundrgrad.TH "PAPI_event_name_to_code" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_event_name_to_code \- .PP Convert a name to a numeric hardware event code\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_event_name_to_code( char * EventName, int * EventCode ); PAPI_event_name_to_code is used to translate an ASCII PAPI event name into an integer PAPI event code. @param *EventCode The numeric code for the event. @param *EventName A string containing the event name as listed in PAPI_presets or discussed in PAPI_native. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOTPRESET The hardware event specified is not a valid PAPI preset. @retval PAPI_ENOINIT The PAPI library has not been initialized. @retval PAPI_ENOEVNT The hardware event is not available on the underlying hardware. @par Examples: .fi .PP .PP .nf * int EventCode, EventSet = PAPI_NULL; * // Convert to integer * if ( PAPI_event_name_to_code( "PAPI_TOT_INS", &EventCode ) != PAPI_OK ) * handle_error( 1 ); * // Create the EventSet * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, EventCode ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_code_to_name\fP .PP \fBPAPI_remove_event\fP .PP \fBPAPI_get_event_info\fP .PP \fBPAPI_enum_event\fP .PP \fBPAPI_add_event\fP .PP \fBPAPI_add_named_event\fP .PP PAPI_presets .PP PAPI_native .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_accum_counters.30000600003276200002170000000101412247131117017563 0ustar ralphundrgrad.TH "PAPIF_accum_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_accum_counters \- .PP Accumulate and reset counters\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_accum_counters\fP( C_LONG_LONG(*) values, C_INT array_len, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_accum_counters\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_unlock.30000600003276200002170000000112712247131120015735 0ustar ralphundrgrad.TH "PAPI_unlock" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_unlock \- .PP Unlock one of the mutex variables defined in \fBpapi\&.h\fP\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBParameters:\fP .RS 4 \fIlck\fP an integer value specifying one of the two user locks: PAPI_USR1_LOCK or PAPI_USR2_LOCK .RE .PP \fBPAPI_unlock()\fP unlocks the mutex acquired by a call to \fBPAPI_lock\fP \&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_hardware_info.30000600003276200002170000000120412247131117020221 0ustar ralphundrgrad.TH "PAPIF_get_hardware_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_hardware_info \- .PP get information about the system hardware .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_hardware_info\fP( C_INT ncpu, C_INT nnodes, C_INT totalcpus, .br C_INT vendor, C_STRING vendor_str, C_INT model, C_STRING model_str, .br C_FLOAT revision, C_FLOAT mhz ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_hardware_info\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_detach.30000600003276200002170000000357712247131120015705 0ustar ralphundrgrad.TH "PAPI_detach" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_detach \- .PP Detach PAPI event set from previously specified thread id and restore to executing thread\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_detach( int EventSet, unsigned long tid ); PAPI_detach is a wrapper function that calls PAPI_set_opt to allow PAPI to monitor performance counts on a thread other than the one currently executing. This is sometimes referred to as third party monitoring. PAPI_attach connects the specified EventSet to the specifed thread; PAPI_detach breaks that connection and restores the EventSet to the original executing thread. @param EventSet An integer handle for a PAPI EventSet as created by PAPI_create_eventset. @param tid A thread id as obtained from, for example, PAPI_list_threads or PAPI_thread_id. @retval PAPI_ECMP This feature is unsupported on this component. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOEVST The event set specified does not exist. @retval PAPI_EISRUN The event set is currently counting events. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * unsigned long pid; * pid = fork( ); * if ( pid <= 0 ) * exit( 1 ); * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * exit( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) * exit( 1 ); * // Attach this EventSet to the forked process * if ( PAPI_attach( EventSet, pid ) != PAPI_OK ) * exit( 1 ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt\fP .br \fBPAPI_list_threads\fP .br \fBPAPI_thread_id\fP .br \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_component_info_t.30000600003276200002170000001165012247131120020004 0ustar ralphundrgrad.TH "PAPI_component_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_component_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "char \fBname\fP [128]" .br .ti -1c .RI "char \fBshort_name\fP [64]" .br .ti -1c .RI "char \fBdescription\fP [128]" .br .ti -1c .RI "char \fBversion\fP [64]" .br .ti -1c .RI "char \fBsupport_version\fP [64]" .br .ti -1c .RI "char \fBkernel_version\fP [64]" .br .ti -1c .RI "char \fBdisabled_reason\fP [128]" .br .ti -1c .RI "int \fBdisabled\fP" .br .ti -1c .RI "int \fBCmpIdx\fP" .br .ti -1c .RI "int \fBnum_cntrs\fP" .br .ti -1c .RI "int \fBnum_mpx_cntrs\fP" .br .ti -1c .RI "int \fBnum_preset_events\fP" .br .ti -1c .RI "int \fBnum_native_events\fP" .br .ti -1c .RI "int \fBdefault_domain\fP" .br .ti -1c .RI "int \fBavailable_domains\fP" .br .ti -1c .RI "int \fBdefault_granularity\fP" .br .ti -1c .RI "int \fBavailable_granularities\fP" .br .ti -1c .RI "int \fBhardware_intr_sig\fP" .br .ti -1c .RI "int \fBcomponent_type\fP" .br .ti -1c .RI "int \fBreserved\fP [8]" .br .ti -1c .RI "unsigned int \fBhardware_intr\fP:1" .br .ti -1c .RI "unsigned int \fBprecise_intr\fP:1" .br .ti -1c .RI "unsigned int \fBposix1b_timers\fP:1" .br .ti -1c .RI "unsigned int \fBkernel_profile\fP:1" .br .ti -1c .RI "unsigned int \fBkernel_multiplex\fP:1" .br .ti -1c .RI "unsigned int \fBfast_counter_read\fP:1" .br .ti -1c .RI "unsigned int \fBfast_real_timer\fP:1" .br .ti -1c .RI "unsigned int \fBfast_virtual_timer\fP:1" .br .ti -1c .RI "unsigned int \fBattach\fP:1" .br .ti -1c .RI "unsigned int \fBattach_must_ptrace\fP:1" .br .ti -1c .RI "unsigned int \fBcntr_umasks\fP:1" .br .ti -1c .RI "unsigned int \fBcpu\fP:1" .br .ti -1c .RI "unsigned int \fBinherit\fP:1" .br .ti -1c .RI "unsigned int \fBreserved_bits\fP:12" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "unsigned int PAPI_component_info_t::attach" Supports attach .SS "unsigned int PAPI_component_info_t::attach_must_ptrace" Attach must first ptrace and stop the thread/process .SS "int PAPI_component_info_t::available_domains" Available domains .SS "int PAPI_component_info_t::available_granularities" Available granularities .SS "int PAPI_component_info_t::CmpIdx" Index into the vector array for this component; set at init time .SS "unsigned int PAPI_component_info_t::cntr_umasks" counters have unit masks .SS "int PAPI_component_info_t::component_type" Type of component .SS "unsigned int PAPI_component_info_t::cpu" Supports specifying cpu number to use with event set .SS "int PAPI_component_info_t::default_domain" The default domain when this component is used .SS "int PAPI_component_info_t::default_granularity" The default granularity when this component is used .SS "char PAPI_component_info_t::description[128]" Description of the component .SS "int PAPI_component_info_t::disabled" 0 if enabled, otherwise error code from initialization .SS "char PAPI_component_info_t::disabled_reason[128]" Reason for failure of initialization .SS "unsigned int PAPI_component_info_t::fast_counter_read" Supports a user level PMC read instruction .SS "unsigned int PAPI_component_info_t::fast_real_timer" Supports a fast real timer .SS "unsigned int PAPI_component_info_t::fast_virtual_timer" Supports a fast virtual timer .SS "unsigned int PAPI_component_info_t::hardware_intr" hw overflow intr, does not need to be emulated in software .SS "int PAPI_component_info_t::hardware_intr_sig" Signal used by hardware to deliver PMC events .SS "unsigned int PAPI_component_info_t::inherit" Supports child processes inheriting parents counters .SS "unsigned int PAPI_component_info_t::kernel_multiplex" In kernel multiplexing .SS "unsigned int PAPI_component_info_t::kernel_profile" Has kernel profiling support (buffered interrupts or sprofil-like) .SS "char PAPI_component_info_t::kernel_version[64]" Version of the kernel PMC support driver .SS "char PAPI_component_info_t::name[128]" Name of the component we're using .SS "int PAPI_component_info_t::num_cntrs" Number of hardware counters the component supports .SS "int PAPI_component_info_t::num_mpx_cntrs" Number of hardware counters the component or PAPI can multiplex supports .SS "int PAPI_component_info_t::num_native_events" Number of native events the component supports .SS "int PAPI_component_info_t::num_preset_events" Number of preset events the component supports .SS "unsigned int PAPI_component_info_t::posix1b_timers" Using POSIX 1b interval timers (timer_create) instead of setitimer .SS "unsigned int PAPI_component_info_t::precise_intr" Performance interrupts happen precisely .SS "char PAPI_component_info_t::short_name[64]" .PP .nf Short name of component, .fi .PP to be prepended to event names .SS "char PAPI_component_info_t::support_version[64]" Version of the support library .SS "char PAPI_component_info_t::version[64]" Version of this component .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_write.30000600003276200002170000000075412247131120015707 0ustar ralphundrgrad.TH "PAPIF_write" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_write \- .PP Write counter values into counters\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_write\fP( C_INT EventSet, C_LONG_LONG(*) values, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_write\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_num_components.30000600003276200002170000000104112247131120017501 0ustar ralphundrgrad.TH "PAPI_num_components" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_num_components \- .PP Get the number of components available on the system\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @return Number of components available on the system .fi .PP .PP .PP .nf // Query the library for a component count\&. printf("%d components installed\&., PAPI_num_components() ); * .fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_read.30000600003276200002170000000343412247131120015360 0ustar ralphundrgrad.TH "PAPI_read" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_read \- .PP Read hardware counters from an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_read(int EventSet, long_long * values )\fP; .RE .PP \fBPAPI_read()\fP copies the counters of the indicated event set into the provided array\&. .PP The counters continue counting after the read\&. .PP Note the differences between \fBPAPI_read()\fP and \fBPAPI_accum()\fP, specifically that \fBPAPI_accum()\fP resets the values array to zero\&. .PP \fBPAPI_read()\fP assumes an initialized PAPI library and a properly added event set\&. .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI Event Set as created by \fBPAPI_create_eventset()\fP .br \fI*values\fP -- an array to hold the counter values of the counting events .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ENOEVST\fP The event set specified does not exist\&. .RE .PP \fBExamples\fP .RS 4 .PP .nf * do_100events(); * if (PAPI_read(EventSet, values) != PAPI_OK) * handle_error(1); * // values[0] now equals 100 * do_100events(); * if (PAPI_accum(EventSet, values) != PAPI_OK) * handle_error(1); * // values[0] now equals 200 * values[0] = -100; * do_100events(); * if (PAPI_accum(EventSet, values) != PAPI_OK) * handle_error(1); * // values[0] now equals 0 * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_accum\fP .PP \fBPAPI_start\fP .PP \fBPAPI_stop\fP .PP \fBPAPI_reset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_cpu_option_t.30000600003276200002170000000060112247131120017140 0ustar ralphundrgrad.TH "PAPI_cpu_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_cpu_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "unsigned int \fBcpu_num\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_debug.30000600003276200002170000000076612247131120016521 0ustar ralphundrgrad.TH "PAPIF_set_debug" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_debug \- .PP Set the current debug level for error output from PAPI\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_debug( C_INT level, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_debug\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_event_info.30000600003276200002170000000203612247131120017435 0ustar ralphundrgrad.TH "PAPI_get_event_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_event_info \- .PP Get the event's name and description info\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBParameters:\fP .RS 4 \fIEventCode\fP event code (preset or native) .br \fIinfo\fP structure with the event information \fBPAPI_event_info_t\fP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOTPRESET\fP The PAPI preset mask was set, but the hardware event specified is not a valid PAPI preset\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP This function fills the event information into a structure\&. In Fortran, some fields of the structure are returned explicitly\&. This function works with existing PAPI preset and native event codes\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_name_to_code\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_cmp_opt.30000600003276200002170000000440212247131120016741 0ustar ralphundrgrad.TH "PAPI_get_cmp_opt" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_cmp_opt \- .PP Get component specific PAPI options\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBParameters:\fP .RS 4 \fIoption\fP is an input parameter describing the course of action\&. Possible values are defined in \fBpapi\&.h\fP and briefly described in the table below\&. The Fortran calls are implementations of specific options\&. .br \fIptr\fP is a pointer to a structure that acts as both an input and output parameter\&. .br \fIcidx\fP An integer identifier for a component\&. By convention, component 0 is always the cpu component\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .RE .PP \fBPAPI_get_opt()\fP and \fBPAPI_set_opt()\fP query or change the options of the PAPI library or a specific event set created by \fBPAPI_create_eventset\fP \&. Some options may require that the eventset be bound to a component before they can execute successfully\&. This can be done either by adding an event or by explicitly calling \fBPAPI_assign_eventset_component\fP \&. .PP The C interface for these functions passes a pointer to the \fBPAPI_option_t\fP structure\&. Not all options require or return information in this structure, and not all options are implemented for both get and set\&. Some options require a component index to be provided\&. These options are handled explicitly by the \fBPAPI_get_cmp_opt()\fP call for 'get' and implicitly through the option structure for 'set'\&. The Fortran interface is a series of calls implementing various subsets of the C interface\&. Not all options in C are available in Fortran\&. .PP \fBNote:\fP .RS 4 Some options, such as PAPI_DOMAIN and PAPI_MULTIPLEX, are also available as separate entry points in both C and Fortran\&. .RE .PP The reader is urged to see the example code in the PAPI distribution for usage of \fBPAPI_get_opt\fP\&. The file \fBpapi\&.h\fP contains definitions for the structures unioned in the \fBPAPI_option_t\fP structure\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_debug\fP \fBPAPI_set_multiplex\fP \fBPAPI_set_domain\fP \fBPAPI_option_t\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_component_index.30000600003276200002170000000160312247131120020471 0ustar ralphundrgrad.TH "PAPI_get_component_index" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_component_index \- .PP returns the component index for the named component .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval ENOCMP component does not exist @param name name of component to find index for @par Examples: .fi .PP .PP .nf int cidx; cidx = PAPI_get_component_index("cuda"); if (cidx==PAPI_OK) { printf("The CUDA component is cidx %d\n",cidx); } * .fi .PP \fBPAPI_get_component_index()\fP returns the component index of the named component\&. This is useful for finding out if a specified component exists\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_event_component\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_num_events.30000600003276200002170000000253012247131120016624 0ustar ralphundrgrad.TH "PAPI_num_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_num_events \- .PP Return the number of events in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP \fBPAPI_num_events()\fP returns the number of preset and/or native events contained in an event set\&. The event set should be created by \fBPAPI_create_eventset\fP \&. .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_num_events(int EventSet )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI event set created by \fBPAPI_create_eventset\fP\&. .br \fI*count\fP -- (Fortran only) On output the variable contains the number of events in the event set .RE .PP \fBReturn values:\fP .RS 4 \fIOn\fP success, this function returns the positive number of events in the event set\&. .br \fIPAPI_EINVAL\fP The event count is zero; only if code is compiled with debug enabled\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .RE .PP \fBExample\fP .RS 4 .PP .nf * // Count the events in our EventSet * printf(\"%d events found in EventSet\&.\\n\", PAPI_num_events(EventSet)); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_add_event\fP .PP \fBPAPI_create_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_stop_counters.30000600003276200002170000000271412247131120017354 0ustar ralphundrgrad.TH "PAPI_stop_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_stop_counters \- .PP Stop counting hardware events and reset values to zero\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_stop_counters( long long *values, int array_len ); .fi .PP .PP \fBParameters:\fP .RS 4 \fI*values\fP an array where to put the counter values .br \fIarray_len\fP the number of items in the *values array .RE .PP \fBPostcondition:\fP .RS 4 After this function is called, the values are reset to zero\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOTRUN\fP The EventSet is not started yet\&. .br \fIPAPI_ENOEVST\fP The EventSet has not been added yet\&. .RE .PP The \fBPAPI_stop_counters()\fP function stops the counters and copies the counts into the *values array\&. The counters must have been started by a previous call to \fBPAPI_start_counters()\fP\&. .PP .PP .nf int Events[2] = { PAPI_TOT_CYC, PAPI_TOT_INS }; long long values[2]; if ( PAPI_start_counters( Events, 2 ) != PAPI_OK ) handle_error(1); your_slow_code(); if ( PAPI_stop_counters( values, 2 ) != PAPI_OK ) handle_error(1); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_read_counters()\fP \fBPAPI_start_counters()\fP \fBPAPI_set_opt()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_start.30000600003276200002170000000074112247131120015706 0ustar ralphundrgrad.TH "PAPIF_start" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_start \- .PP Start counting hardware events in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_start( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_start\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_add_event.30000600003276200002170000000101012247131117016476 0ustar ralphundrgrad.TH "PAPIF_add_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_add_event \- .PP add PAPI preset or native hardware event to an event set .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_add_event( C_INT EventSet, C_INT EventCode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_add_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_dmem_info_t.30000600003276200002170000000150412247131120016721 0ustar ralphundrgrad.TH "PAPI_dmem_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_dmem_info_t \- .PP A pointer to the following is passed to \fBPAPI_get_dmem_info()\fP .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "long long \fBpeak\fP" .br .ti -1c .RI "long long \fBsize\fP" .br .ti -1c .RI "long long \fBresident\fP" .br .ti -1c .RI "long long \fBhigh_water_mark\fP" .br .ti -1c .RI "long long \fBshared\fP" .br .ti -1c .RI "long long \fBtext\fP" .br .ti -1c .RI "long long \fBlibrary\fP" .br .ti -1c .RI "long long \fBheap\fP" .br .ti -1c .RI "long long \fBlocked\fP" .br .ti -1c .RI "long long \fBstack\fP" .br .ti -1c .RI "long long \fBpagesize\fP" .br .ti -1c .RI "long long \fBpte\fP" .br .in -1c .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_unlock.30000600003276200002170000000074112247131120016044 0ustar ralphundrgrad.TH "PAPIF_unlock" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_unlock \- .PP Unlock one of the mutex variables defined in \fBpapi\&.h\fP\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_unlock( C_INT lock )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_unlock\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_granularity.30000600003276200002170000000105012247131117017751 0ustar ralphundrgrad.TH "PAPIF_get_granularity" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_granularity \- .PP Get the granularity setting for the specified EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_granularity( C_INT eventset, C_INT granularity, C_INT mode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_accum.30000600003276200002170000000331312247131120015531 0ustar ralphundrgrad.TH "PAPI_accum" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_accum \- .PP Accumulate and reset counters in an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_accum( int EventSet, long_long * values ); These calls assume an initialized PAPI library and a properly added event set. PAPI_accum adds the counters of the indicated event set into the array values. The counters are zeroed and continue counting after the operation. Note the differences between PAPI_read and PAPI_accum, specifically that PAPI_accum resets the values array to zero. @param EventSet an integer handle for a PAPI Event Set as created by PAPI_create_eventset @param *values an array to hold the counter values of the counting events @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ESYS A system or C library call failed inside PAPI, see the errno variable. @retval PAPI_ENOEVST The event set specified does not exist. @par Examples: .fi .PP .PP .nf * do_100events( ); * if ( PAPI_read( EventSet, values) != PAPI_OK ) * handle_error( 1 ); * // values[0] now equals 100 * do_100events( ); * if (PAPI_accum( EventSet, values ) != PAPI_OK ) * handle_error( 1 ); * // values[0] now equals 200 * values[0] = -100; * do_100events( ); * if (PAPI_accum( EventSet, values ) != PAPI_OK ) * handle_error( 1 ); * // values[0] now equals 0 * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPIF_accum\fP .PP \fBPAPI_start\fP .PP \fBPAPI_set_opt\fP .PP \fBPAPI_reset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_assign_eventset_component.30000600003276200002170000000340412247131120021725 0ustar ralphundrgrad.TH "PAPI_assign_eventset_component" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_assign_eventset_component \- .PP Assign a component index to an existing but empty EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n PAPI_assign_eventset_component( int EventSet, int cidx ); @param EventSet An integer identifier for an existing EventSet. @param cidx An integer identifier for a component. By convention, component 0 is always the cpu component. @retval PAPI_ENOCMP The argument cidx is not a valid component. @retval PAPI_ENOEVST The EventSet doesn't exist. @retval PAPI_ENOMEM Insufficient memory to complete the operation. PAPI_assign_eventset_component assigns a specific component index, as specified by cidx, to a new EventSet identified by EventSet, as obtained from PAPI_create_eventset. EventSets are ordinarily automatically bound to components when the first event is added. This routine is useful to explicitly bind an EventSet to a component before setting component related options. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Bind our EventSet to the cpu component * if ( PAPI_assign_eventset_component( EventSet, 0 ) != PAPI_OK ) * handle_error( 1 ); * // Convert our EventSet to multiplexing * if ( PAPI_set_multiplex( EventSet ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt\fP .br \fBPAPI_create_eventset\fP .br \fBPAPI_add_events\fP .br \fBPAPI_set_multiplex\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_unregister_thread.30000600003276200002170000000077612247131120020277 0ustar ralphundrgrad.TH "PAPIF_unregister_thread" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_unregister_thread \- .PP Notify PAPI that a thread has 'disappeared'\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_unregister_thread( C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_unregister_thread\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_shutdown.30000600003276200002170000000072312247131120016424 0ustar ralphundrgrad.TH "PAPIF_shutdown" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_shutdown \- .PP finish using PAPI and free all related resources\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_shutdown( )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_shutdown\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_register_thread.30000600003276200002170000000234512247131120017620 0ustar ralphundrgrad.TH "PAPI_register_thread" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_register_thread \- .PP Notify PAPI that a thread has 'appeared'\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_register_thread\fP (void); .RE .PP \fBPAPI_register_thread()\fP should be called when the user wants to force PAPI to initialize a thread that PAPI has not seen before\&. .PP Usually this is not necessary as PAPI implicitly detects the thread when an eventset is created or other thread local PAPI functions are called\&. However, it can be useful for debugging and performance enhancements in the run-time systems of performance tools\&. .PP \fBReturn values:\fP .RS 4 \fIPAPI_ENOMEM\fP Space could not be allocated to store the new thread information\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ECMP\fP Hardware counters for this thread could not be initialized\&. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_unregister_thread\fP .PP \fBPAPI_thread_id\fP .PP \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_register_thread.30000600003276200002170000000076312247131120017730 0ustar ralphundrgrad.TH "PAPIF_register_thread" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_register_thread \- .PP Notify PAPI that a thread has 'appeared'\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_register_thread( C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_register_thread\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_num_counters.30000600003276200002170000000310312247131120017157 0ustar ralphundrgrad.TH "PAPI_num_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_num_counters \- .PP Get the number of hardware counters available on the system\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_num_counters( void ); .fi .PP .PP \fBPostcondition:\fP .RS 4 Initializes the library to PAPI_HIGH_LEVEL_INITED if necessary\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP \fBpapi\&.h\fP is different from the version used to compile the PAPI library\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .RE .PP \fBExamples:\fP .RS 4 .PP .nf * int num_hwcntrs; * // The installation does not support PAPI * if ((num_hwcntrs = PAPI_num_counters()) < 0 ) * handle_error(1); * // The installation supports PAPI, but has no counters * if ((num_hwcntrs = PAPI_num_counters()) == 0 ) * fprintf(stderr,"Info:: This machine does not provide hardware counters\&.\n"); * .fi .PP .RE .PP \fBPAPI_num_counters()\fP returns the optimal length of the values array for the high level functions\&. This value corresponds to the number of hardware counters supported by the current CPU component\&. .PP \fBNote:\fP .RS 4 This function only works for the CPU component\&. To determine the number of counters on another component, use the low level \fBPAPI_num_cmp_hwctrs()\fP\&. .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_granularity.30000600003276200002170000000363712247131120017666 0ustar ralphundrgrad.TH "PAPI_set_granularity" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_granularity \- .PP Set the default counting granularity for eventsets bound to the cpu component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Prototype: \#include @n int PAPI_set_granularity( int granularity ); @param -- granularity one of the following constants as defined in the papi.h header file @arg PAPI_GRN_THR -- Count each individual thread @arg PAPI_GRN_PROC -- Count each individual process @arg PAPI_GRN_PROCG -- Count each individual process group @arg PAPI_GRN_SYS -- Count the current CPU @arg PAPI_GRN_SYS_CPU -- Count all CPUs individually @arg PAPI_GRN_MIN -- The finest available granularity @arg PAPI_GRN_MAX -- The coarsest available granularity .fi .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .RE .PP \fBPAPI_set_granularity\fP sets the default counting granularity for all new event sets created by \fBPAPI_create_eventset\fP\&. This call implicitly sets the granularity for the cpu component (component 0) and is included to preserve backward compatibility\&. .PP \fBExample:\fP .RS 4 .PP .nf int ret; // Initialize the library ret = PAPI_library_init(PAPI_VER_CURRENT); if (ret > 0 && ret != PAPI_VER_CURRENT) { fprintf(stderr,"PAPI library version mismatch!\n"); exit(1); } if (ret < 0) handle_error(ret); // Set the default granularity for the cpu component ret = PAPI_set_granularity(PAPI_GRN_PROC); if (ret != PAPI_OK) handle_error(ret); ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_cmp_granularity\fP \fBPAPI_set_domain\fP \fBPAPI_set_opt\fP \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_remove_event.30000600003276200002170000000101512247131120017242 0ustar ralphundrgrad.TH "PAPIF_remove_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_remove_event \- .PP Remove a hardware event from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_remove_event( C_INT EventSet, C_INT EventCode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_remove_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_num_cmp_hwctrs.30000600003276200002170000000105512247131120017600 0ustar ralphundrgrad.TH "PAPIF_num_cmp_hwctrs" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_num_cmp_hwctrs \- .PP Return the number of hardware counters on the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_num_cmp_hwctrs( C_INT cidx, C_INT num )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_num_hwctrs\fP .PP \fBPAPI_num_cmp_hwctrs\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_event_code_to_name.30000600003276200002170000000374412247131120020266 0ustar ralphundrgrad.TH "PAPI_event_code_to_name" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_event_code_to_name \- .PP Convert a numeric hardware event code to a name\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_event_code_to_name( int EventCode, char * EventName ); PAPI_event_code_to_name is used to translate a 32-bit integer PAPI event code into an ASCII PAPI event name. Either Preset event codes or Native event codes can be passed to this routine. Native event codes and names differ from platform to platform. @param EventCode The numeric code for the event. @param *EventName A string containing the event name as listed in PAPI_presets or discussed in PAPI_native. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOTPRESET The hardware event specified is not a valid PAPI preset. @retval PAPI_ENOEVNT The hardware event is not available on the underlying hardware. @par Examples: .fi .PP .PP .nf * int EventCode, EventSet = PAPI_NULL; * int Event, number; * char EventCodeStr[PAPI_MAX_STR_LEN]; * // Create the EventSet * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) * handle_error( 1 ); * number = 1; * if ( PAPI_list_events( EventSet, &Event, &number ) != PAPI_OK ) * handle_error(1); * // Convert integer code to name string * if ( PAPI_event_code_to_name( Event, EventCodeStr ) != PAPI_OK ) * handle_error( 1 ); * printf( "Event Name: %s\n", EventCodeStr ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_name_to_code\fP .PP \fBPAPI_remove_event\fP .PP \fBPAPI_get_event_info\fP .PP \fBPAPI_enum_event\fP .PP \fBPAPI_add_event\fP .PP PAPI_presets .PP PAPI_native .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_addr_range_option_t.30000600003276200002170000000201412247131120020437 0ustar ralphundrgrad.TH "PAPI_addr_range_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_addr_range_option_t \- .PP address range specification for range restricted counting if both are zero, range is disabled .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "caddr_t \fBstart\fP" .br .ti -1c .RI "caddr_t \fBend\fP" .br .ti -1c .RI "int \fBstart_off\fP" .br .ti -1c .RI "int \fBend_off\fP" .br .in -1c .SH "Field Documentation" .PP .SS "caddr_t PAPI_addr_range_option_t::end" user requested end address of an address range .SS "int PAPI_addr_range_option_t::end_off" hardware specified offset from end address .SS "int PAPI_addr_range_option_t::eventset" eventset to restrict .SS "caddr_t PAPI_addr_range_option_t::start" user requested start address of an address range .SS "int PAPI_addr_range_option_t::start_off" hardware specified offset from start address .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_start_counters.30000600003276200002170000000100712247131120017624 0ustar ralphundrgrad.TH "PAPIF_start_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_start_counters \- .PP Start counting hardware events\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_start_counters\fP( C_INT(*) events, C_INT array_len, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_start_counters\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_write.30000600003276200002170000000234712247131120015601 0ustar ralphundrgrad.TH "PAPI_write" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_write \- .PP Write counter values into counters\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBParameters:\fP .RS 4 \fIEventSet\fP an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fI*values\fP an array to hold the counter values of the counting events .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_ECMP\fP \fBPAPI_write()\fP is not implemented for this architecture\&. .br \fIPAPI_ESYS\fP The EventSet is currently counting events and the component could not change the values of the running counters\&. .RE .PP \fBPAPI_write()\fP writes the counter values provided in the array values into the event set EventSet\&. The virtual counters managed by the PAPI library will be set to the values provided\&. If the event set is running, an attempt will be made to write the values to the running counters\&. This operation is not permitted by all components and may result in a run-time error\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_read\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_preload_info_t.30000600003276200002170000000075312247131120017432 0ustar ralphundrgrad.TH "PAPI_preload_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_preload_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "char \fBlib_preload_env\fP [128]" .br .ti -1c .RI "char \fBlib_preload_sep\fP" .br .ti -1c .RI "char \fBlib_dir_env\fP [128]" .br .ti -1c .RI "char \fBlib_dir_sep\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_read_ts.30000600003276200002170000000101612247131120016166 0ustar ralphundrgrad.TH "PAPIF_read_ts" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_read_ts \- .PP Read hardware counters with a timestamp\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_read_ts\fP(C_INT EventSet, C_LONG_LONG(*) values, C_LONG_LONG(*) cycles, C_INT check) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_read_ts\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_debug_option_t.30000600003276200002170000000061212247131120017441 0ustar ralphundrgrad.TH "PAPI_debug_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_debug_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBlevel\fP" .br .ti -1c .RI "PAPI_debug_handler_t \fBhandler\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_disable_component.30000600003276200002170000000251212247131120020126 0ustar ralphundrgrad.TH "PAPI_disable_component" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_disable_component \- .PP disables the specified component .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval ENOCMP component does not exist @retval ENOINIT cannot disable as PAPI has already been initialized @param cidx component index of component to be disabled @par Examples: .fi .PP .PP .nf int cidx, result; cidx = PAPI_get_component_index("example"); if (cidx>=0) { result = PAPI_disable_component(cidx); if (result==PAPI_OK) printf("The example component is disabled\n"); } // \&.\&.\&. PAPI_library_init(); * .fi .PP \fBPAPI_disable_component()\fP allows the user to disable components before \fBPAPI_library_init()\fP time\&. This is useful if the user knows they do not wish to use events from that component and want to reduce the PAPI library overhead\&. .PP \fBPAPI_disable_component()\fP must be called before \fBPAPI_library_init()\fP\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_event_component\fP .PP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_stop.30000600003276200002170000000076212247131120015541 0ustar ralphundrgrad.TH "PAPIF_stop" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_stop \- .PP Stop counting hardware events in an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_stop\fP( C_INT EventSet, C_LONG_LONG(*) values, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_stop\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_create_eventset.30000600003276200002170000000335012247131120017622 0ustar ralphundrgrad.TH "PAPI_create_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_create_eventset \- .PP Create a new empty PAPI EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n PAPI_create_eventset( int * EventSet ); PAPI_create_eventset creates a new EventSet pointed to by EventSet, which must be initialized to PAPI_NULL before calling this routine. The user may then add hardware events to the event set by calling PAPI_add_event or similar routines. @note PAPI-C uses a late binding model to bind EventSets to components. When an EventSet is first created it is not bound to a component. This will cause some API calls that modify EventSet options to fail. An EventSet can be bound to a component explicitly by calling PAPI_assign_eventset_component or implicitly by calling PAPI_add_event or similar routines. @param *EventSet Address of an integer location to store the new EventSet handle. @exception PAPI_EINVAL The argument handle has not been initialized to PAPI_NULL or the argument is a NULL pointer. @exception PAPI_ENOMEM Insufficient memory to complete the operation. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_add_event\fP .br \fBPAPI_assign_eventset_component\fP .br \fBPAPI_destroy_eventset\fP .br \fBPAPI_cleanup_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_cleanup_eventset.30000600003276200002170000000262312247131120020010 0ustar ralphundrgrad.TH "PAPI_cleanup_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_cleanup_eventset \- .PP Empty and destroy an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_cleanup_eventset( int EventSet ); .fi .PP .PP \fBPAPI_cleanup_eventset\fP removes all events from a PAPI event set and turns off profiling and overflow for all events in the EventSet\&. This can not be called if the EventSet is not stopped\&. .PP \fBParameters:\fP .RS 4 \fIEventSet\fP An integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. Attempting to destroy a non-empty event set or passing in a null pointer to be destroyed\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_EBUG\fP Internal error, send mail to ptools-perfapi@ptools.org and complain\&. .RE .PP \fBExamples:\fP .RS 4 .PP .nf * // Remove all events in the eventset * if ( PAPI_cleanup_eventset( EventSet ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .RE .PP .PP .nf @see PAPI_profil @n PAPI_create_eventset @n PAPI_add_event @n PAPI_stop.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_exe_info_t.30000600003276200002170000000111312247131120016554 0ustar ralphundrgrad.TH "PAPI_exe_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_exe_info_t \- .PP get the executable's info .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "char \fBfullname\fP [1024]" .br .ti -1c .RI "\fBPAPI_address_map_t\fP \fBaddress_info\fP" .br .in -1c .SH "Field Documentation" .PP .SS "\fBPAPI_address_map_t\fP PAPI_exe_info_t::address_info" executable's address space info .SS "char PAPI_exe_info_t::fullname[1024]" path + name .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_thr_specific.30000600003276200002170000000364512247131120017766 0ustar ralphundrgrad.TH "PAPI_set_thr_specific" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_thr_specific \- .PP Store a pointer to a thread specific data structure\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par Prototype: \#include @n int PAPI_set_thr_specific( int tag, void *ptr ); @param tag An identifier, the value of which is either PAPI_USR1_TLS or PAPI_USR2_TLS. This identifier indicates which of several data structures associated with this thread is to be accessed. @param ptr A pointer to the memory containing the data structure. @retval PAPI_OK @retval PAPI_EINVAL The @em tag argument is out of range. In C, PAPI_set_thr_specific will save @em ptr into an array indexed by @em tag. There are 2 user available locations and @em tag can be either PAPI_USR1_TLS or PAPI_USR2_TLS. The array mentioned above is managed by PAPI and allocated to each thread which has called PAPI_thread_init. There is no Fortran equivalent function. @par Example: .fi .PP .PP .nf int ret; HighLevelInfo *state = NULL; ret = PAPI_thread_init(pthread_self); if (ret != PAPI_OK) handle_error(ret); // Do we have the thread specific data setup yet? ret = PAPI_get_thr_specific(PAPI_USR1_TLS, (void *) &state); if (ret != PAPI_OK || state == NULL) { state = (HighLevelInfo *) malloc(sizeof(HighLevelInfo)); if (state == NULL) return (PAPI_ESYS); memset(state, 0, sizeof(HighLevelInfo)); state->EventSet = PAPI_NULL; ret = PAPI_create_eventset(&state->EventSet); if (ret != PAPI_OK) return (PAPI_ESYS); ret = PAPI_set_thr_specific(PAPI_USR1_TLS, state); if (ret != PAPI_OK) return (ret); } * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_register_thread\fP \fBPAPI_thread_init\fP \fBPAPI_thread_id\fP \fBPAPI_get_thr_specific\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_query_event.30000600003276200002170000000275312247131120017016 0ustar ralphundrgrad.TH "PAPI_query_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_query_event \- .PP Query if PAPI event exists\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_query_event(int EventCode)\fP; .RE .PP \fBPAPI_query_event()\fP asks the PAPI library if the PAPI Preset event can be counted on this architecture\&. If the event CAN be counted, the function returns PAPI_OK\&. If the event CANNOT be counted, the function returns an error code\&. This function also can be used to check the syntax of native and user events\&. .PP \fBParameters:\fP .RS 4 \fIEventCode\fP -- a defined event such as PAPI_TOT_INS\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBExamples\fP .RS 4 .PP .nf * int retval; * // Initialize the library * retval = PAPI_library_init(PAPI_VER_CURRENT); * if (retval != PAPI_VER_CURRENT) { * fprintf(stderr,\"PAPI library init error!\\n\"); * exit(1); * } * if (PAPI_query_event(PAPI_TOT_INS) != PAPI_OK) { * fprintf(stderr,\"No instruction counter? How lame\&.\\n\"); * exit(1); * } * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_remove_event\fP .PP \fBPAPI_remove_events\fP .PP PAPI_presets .PP PAPI_native .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_thread_id.30000600003276200002170000000150412247131120016364 0ustar ralphundrgrad.TH "PAPI_thread_id" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_thread_id \- .PP Get the thread identifier of the current thread\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval PAPI_EMISC is returned if there are no threads registered. @retval -1 is returned if the thread id function returns an error. This function returns a valid thread identifier. It calls the function registered with PAPI through a call to PAPI_thread_init(). .fi .PP .PP .PP .nf unsigned long tid; if ((tid = PAPI_thread_id()) == (unsigned long int)-1 ) exit(1); printf("Initial thread id is: %lu\n", tid ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_hardware_info.30000600003276200002170000000233112247131120020107 0ustar ralphundrgrad.TH "PAPI_get_hardware_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_hardware_info \- .PP get information about the system hardware .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf In C, this function returns a pointer to a structure containing information about the hardware on which the program runs. In Fortran, the values of the structure are returned explicitly. @retval PAPI_EINVAL One or more of the arguments is invalid. .fi .PP .PP .PP .nf @note The C structure contains detailed information about cache and TLB sizes. This information is not available from Fortran. @par Examples: .fi .PP .PP .nf const PAPI_hw_info_t *hwinfo = NULL; if (PAPI_library_init(PAPI_VER_CURRENT) != PAPI_VER_CURRENT) exit(1); if ((hwinfo = PAPI_get_hardware_info()) == NULL) exit(1); printf("%d CPUs at %f Mhz\&.\en",hwinfo->totalcpus,hwinfo->mhz); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_hw_info_t\fP .PP \fBPAPI_get_executable_info\fP, \fBPAPI_get_opt\fP, \fBPAPI_get_dmem_info\fP, \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_lock.30000600003276200002170000000072712247131120015505 0ustar ralphundrgrad.TH "PAPIF_lock" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_lock \- .PP Lock one of two mutex variables defined in \fBpapi\&.h\fP\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_lock( C_INT lock )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_lock\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_real_usec.30000600003276200002170000000076312247131117017364 0ustar ralphundrgrad.TH "PAPIF_get_real_usec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_real_usec \- .PP Get real time counter value in microseconds\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_real_usec( C_LONG_LONG time )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_real_usec\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_granularity.30000600003276200002170000000105312247131120017762 0ustar ralphundrgrad.TH "PAPIF_set_granularity" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_granularity \- .PP Set the default counting granularity for eventsets bound to the cpu component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_granularity( C_INT granularity, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_granularity\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_virt_usec.30000600003276200002170000000300212247131120017276 0ustar ralphundrgrad.TH "PAPI_get_virt_usec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_virt_usec \- .PP get virtual time counter values in microseconds .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval PAPI_ECNFLCT If there is no master event set. This will happen if the library has not been initialized, or for threaded applications, if there has been no thread id function defined by the PAPI_thread_init function. @retval PAPI_ENOMEM For threaded applications, if there has not yet been any thread specific master event created for the current thread, and if the allocation of such an event set fails, the call will return PAPI_ENOMEM or PAPI_ESYS . This function returns the total number of virtual units from some arbitrary starting point. Virtual units accrue every time the process is running in user-mode on behalf of the process. Like the real time counters, this count is guaranteed to exist on every platform PAPI supports. However on some platforms, the resolution can be as bad as 1/Hz as defined by the operating system. @par Examples: .fi .PP .PP .nf s = PAPI_get_virt_cyc(); your_slow_code(); e = PAPI_get_virt_cyc(); printf("Process has run for cycles: %lld\en",e-s); * .fi .PP .PP \fBSee Also:\fP .RS 4 PAPIF .PP PAPI .PP PAPI .PP \fBPAPI_get_real_cyc\fP .PP \fBPAPI_get_virt_cyc\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_exe_info.30000600003276200002170000000130312247131117017205 0ustar ralphundrgrad.TH "PAPIF_get_exe_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_exe_info \- .PP get information about the dynamic memory usage of the current program .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_exe_info\fP( C_STRING fullname, C_STRING name, .br C_LONG_LONG text_start, C_LONG_LONG text_end, .br C_LONG_LONG data_start, C_LONG_LONG data_end, .br C_LONG_LONG bss_start, C_LONG_LONG bss_end, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_executable_info\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_option_t.30000600003276200002170000000251412247131120016276 0ustar ralphundrgrad.TH "PAPI_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_option_t \- .PP A pointer to the following is passed to PAPI_set/get_opt() .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "\fBPAPI_preload_info_t\fP \fBpreload\fP" .br .ti -1c .RI "\fBPAPI_debug_option_t\fP \fBdebug\fP" .br .ti -1c .RI "\fBPAPI_inherit_option_t\fP \fBinherit\fP" .br .ti -1c .RI "\fBPAPI_granularity_option_t\fP \fBgranularity\fP" .br .ti -1c .RI "\fBPAPI_granularity_option_t\fP \fBdefgranularity\fP" .br .ti -1c .RI "\fBPAPI_domain_option_t\fP \fBdomain\fP" .br .ti -1c .RI "\fBPAPI_domain_option_t\fP \fBdefdomain\fP" .br .ti -1c .RI "\fBPAPI_attach_option_t\fP \fBattach\fP" .br .ti -1c .RI "\fBPAPI_cpu_option_t\fP \fBcpu\fP" .br .ti -1c .RI "\fBPAPI_multiplex_option_t\fP \fBmultiplex\fP" .br .ti -1c .RI "\fBPAPI_itimer_option_t\fP \fBitimer\fP" .br .ti -1c .RI "\fBPAPI_hw_info_t\fP * \fBhw_info\fP" .br .ti -1c .RI "\fBPAPI_shlib_info_t\fP * \fBshlib_info\fP" .br .ti -1c .RI "\fBPAPI_exe_info_t\fP * \fBexe_info\fP" .br .ti -1c .RI "\fBPAPI_component_info_t\fP * \fBcmp_info\fP" .br .ti -1c .RI "\fBPAPI_addr_range_option_t\fP \fBaddr\fP" .br .ti -1c .RI "PAPI_user_defined_events_file_t \fBevents_file\fP" .br .in -1c .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_perror.30000600003276200002170000000265312247131120015760 0ustar ralphundrgrad.TH "PAPI_perror" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_perror \- .PP Produces a string on standard error, describing the last library error\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br void \fBPAPI_perror( char *s )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIs\fP -- Optional message to print before the string describing the last error message\&. .RE .PP The routine \fBPAPI_perror()\fP produces a message on the standard error output, describing the last error encountered during a call to PAPI\&. If s is not NULL, s is printed, followed by a colon and a space\&. Then the error message and a new-line are printed\&. .PP \fBExample:\fP .RS 4 .PP .nf * int ret; * int EventSet = PAPI_NULL; * int native = 0x0; * * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) * { * fprintf(stderr, \"PAPI error %d: %s\\n\", ret, PAPI_strerror(retval)); * exit(1); * } * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) * { * PAPI_perror( "PAPI_add_event" ); * exit(1); * } * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_strerror\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_sprofil.30000600003276200002170000001074112247131120016122 0ustar ralphundrgrad.TH "PAPI_sprofil" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_sprofil \- .PP Generate PC histogram data from multiple code regions where hardware counter overflow occurs\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_sprofil( PAPI_sprofil_t * prof, int profcnt, int EventSet, int EventCode, int threshold, int flags )\fP; .RE .PP \fBParameters:\fP .RS 4 \fI*prof\fP pointer to an array of \fBPAPI_sprofil_t\fP structures\&. Each copy of the structure contains the following: .PD 0 .IP "\(bu" 2 buf -- pointer to a buffer of bufsiz bytes in which the histogram counts are stored in an array of unsigned short, unsigned int, or unsigned long long values, or 'buckets'\&. The size of the buckets is determined by values in the flags argument\&. .IP "\(bu" 2 bufsiz -- the size of the histogram buffer in bytes\&. It is computed from the length of the code region to be profiled, the size of the buckets, and the scale factor as discussed below\&. .IP "\(bu" 2 offset -- the start address of the region to be profiled\&. .IP "\(bu" 2 scale -- broadly and historically speaking, a contraction factor that indicates how much smaller the histogram buffer is than the region to be profiled\&. More precisely, scale is interpreted as an unsigned 16-bit fixed-point fraction with the decimal point implied on the left\&. Its value is the reciprocal of the number of addresses in a subdivision, per counter of histogram buffer\&. .PP .br \fIprofcnt\fP number of structures in the prof array for hardware profiling\&. .br \fIEventSet\fP The PAPI EventSet to profile\&. This EventSet is marked as profiling-ready, but profiling doesn't actually start until a \fBPAPI_start()\fP call is issued\&. .br \fIEventCode\fP Code of the Event in the EventSet to profile\&. This event must already be a member of the EventSet\&. .br \fIthreshold\fP minimum number of events that must occur before the PC is sampled\&. If hardware overflow is supported for your component, this threshold will trigger an interrupt when reached\&. Otherwise, the counters will be sampled periodically and the PC will be recorded for the first sample that exceeds the threshold\&. If the value of threshold is 0, profiling will be disabled for this event\&. .br \fIflags\fP bit pattern to control profiling behavior\&. Defined values are given in a table in the documentation for PAPI_pofil .RE .PP \fBReturn values:\fP .RS 4 \fIReturn\fP values for \fBPAPI_sprofil()\fP are identical to those for \fBPAPI_profil\fP\&. Please refer to that page for further details\&. .RE .PP \fBPAPI_sprofil()\fP is a structure driven profiler that profiles one or more disjoint regions of code in a single call\&. It accepts a pointer to a preinitialized array of sprofil structures, and initiates profiling based on the values contained in the array\&. Each structure in the array defines the profiling parameters that are normally passed to \fBPAPI_profil()\fP\&. For more information on profiling, \fBPAPI_profil\fP .PP \fBExample:\fP .RS 4 .PP .nf * int retval; * unsigned long length; * PAPI_exe_info_t *prginfo; * unsigned short *profbuf1, *profbuf2, profbucket; * PAPI_sprofil_t sprof[3]; * * prginfo = PAPI_get_executable_info(); * if (prginfo == NULL) handle_error( NULL ); * length = (unsigned long)(prginfo->text_end - prginfo->text_start); * // Allocate 2 buffers of equal length * profbuf1 = (unsigned short *)malloc(length); * profbuf2 = (unsigned short *)malloc(length); * if ((profbuf1 == NULL) || (profbuf2 == NULL)) * handle_error( NULL ); * memset(profbuf1,0x00,length); * memset(profbuf2,0x00,length); * // First buffer * sprof[0]\&.pr_base = profbuf1; * sprof[0]\&.pr_size = length; * sprof[0]\&.pr_off = (caddr_t) DO_FLOPS; * sprof[0]\&.pr_scale = 0x10000; * // Second buffer * sprof[1]\&.pr_base = profbuf2; * sprof[1]\&.pr_size = length; * sprof[1]\&.pr_off = (caddr_t) DO_READS; * sprof[1]\&.pr_scale = 0x10000; * // Overflow bucket * sprof[2]\&.pr_base = profbucket; * sprof[2]\&.pr_size = 1; * sprof[2]\&.pr_off = 0; * sprof[2]\&.pr_scale = 0x0002; * retval = PAPI_sprofil(sprof, EventSet, PAPI_FP_INS, 1000000, * PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16)) != PAPI_OK) * if ( retval != PAPI_OK ) handle_error( retval ); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_overflow\fP .PP \fBPAPI_get_executable_info\fP .PP \fBPAPI_profil\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_cleanup_eventset.30000600003276200002170000000077012247131117020125 0ustar ralphundrgrad.TH "PAPIF_cleanup_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_cleanup_eventset \- .PP empty and destroy an EventSet .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_cleanup_eventset( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_cleanup_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_multiplex_option_t.30000600003276200002170000000064112247131120020400 0ustar ralphundrgrad.TH "PAPI_multiplex_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_multiplex_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBeventset\fP" .br .ti -1c .RI "int \fBns\fP" .br .ti -1c .RI "int \fBflags\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_library_init.30000600003276200002170000000333212247131120017131 0ustar ralphundrgrad.TH "PAPI_library_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_library_init \- .PP initialize the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @param version upon initialization, PAPI checks the argument against the internal value of PAPI_VER_CURRENT when the library was compiled. This guards against portability problems when updating the PAPI shared libraries on your system. @retval PAPI_EINVAL papi.h is different from the version used to compile the PAPI library. @retval PAPI_ENOMEM Insufficient memory to complete the operation. @retval PAPI_ECMP This component does not support the underlying hardware. @retval PAPI_ESYS A system or C library call failed inside PAPI, see the errno variable. PAPI_library_init() initializes the PAPI library. PAPI_is_initialized() check for initialization. It must be called before any low level PAPI functions can be used. If your application is making use of threads PAPI_thread_init must also be called prior to making any calls to the library other than PAPI_library_init() . @par Examples: .fi .PP .PP .nf * int retval; * retval = PAPI_library_init(PAPI_VER_CURRENT); * if (retval != PAPI_VER_CURRENT && retval > 0) { * fprintf(stderr,"PAPI library version mismatch!\en"); * exit(1); } * if (retval < 0) * handle_error(retval); * retval = PAPI_is_initialized(); * if (retval != PAPI_LOW_LEVEL_INITED) * handle_error(retval) * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_thread_init\fP PAPI .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_ipc.30000600003276200002170000000102212247131117015323 0ustar ralphundrgrad.TH "PAPIF_ipc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_ipc \- .PP Get instructions per cycle, real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_ipc( C_FLOAT real_time, C_FLOAT proc_time, C_LONG_LONG ins, C_FLOAT ipc, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_ipc\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_perror.30000600003276200002170000000076212247131120016065 0ustar ralphundrgrad.TH "PAPIF_perror" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_perror \- .PP Convert PAPI error codes to strings, and print error message to stderr\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_perror( C_STRING message )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_perror\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_opt.30000600003276200002170000001057112247131120016122 0ustar ralphundrgrad.TH "PAPI_set_opt" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_opt \- .PP Set PAPI library or event set options\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_set_opt( int option, PAPI_option_t * ptr )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIoption\fP Defines the option to be set\&. Possible values are briefly described in the table below\&. .br \fIptr\fP Pointer to a structure determined by the selected option\&. See \fBPAPI_option_t\fP for a description of possible structures\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP The specified option or parameter is invalid\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_ECMP\fP The option is not implemented for the current component\&. .br \fIPAPI_ENOINIT\fP PAPI has not been initialized\&. .br \fIPAPI_EINVAL_DOM\fP Invalid domain has been requested\&. .RE .PP \fBPAPI_set_opt()\fP changes the options of the PAPI library or a specific EventSet created by \fBPAPI_create_eventset\fP\&. Some options may require that the EventSet be bound to a component before they can execute successfully\&. This can be done either by adding an event or by explicitly calling \fBPAPI_assign_eventset_component\fP\&. .PP Ptr is a pointer to the \fBPAPI_option_t\fP structure, which is actually a union of different structures for different options\&. Not all options require or return information in these structures\&. Each requires different values to be set\&. Some options require a component index to be provided\&. These options are handled implicitly through the option structures\&. .PP \fBNote:\fP .RS 4 Some options, such as PAPI_DOMAIN and PAPI_MULTIPLEX are also available as separate entry points in both C and Fortran\&. .RE .PP The reader is encouraged to peruse the ctests code in the PAPI distribution for examples of usage of \fBPAPI_set_opt\fP\&. .PP \fBPossible values for the PAPI_set_opt option parameter\fP .RS 4 OPTION DEFINITION PAPI_DEFDOM Set default counting domain for newly created event sets. Requires a component index. PAPI_DEFGRN Set default counting granularity. Requires a component index. PAPI_DEBUG Set the PAPI debug state and the debug handler. The debug state is specified in ptr->debug.level. The debug handler is specified in ptr->debug.handler. For further information regarding debug states and the behavior of the handler, see PAPI_set_debug. PAPI_MULTIPLEX Enable specified EventSet for multiplexing. PAPI_DEF_ITIMER Set the type of itimer used in software multiplexing, overflowing and profiling. PAPI_DEF_MPX_NS Set the sampling time slice in nanoseconds for multiplexing and overflow. PAPI_DEF_ITIMER_NS See PAPI_DEF_MPX_NS. PAPI_ATTACH Attach EventSet specified in ptr->attach.eventset to thread or process id specified in in ptr->attach.tid. PAPI_CPU_ATTACH Attach EventSet specified in ptr->cpu.eventset to cpu specified in in ptr->cpu.cpu_num. PAPI_DETACH Detach EventSet specified in ptr->attach.eventset from any thread or process id. PAPI_DOMAIN Set domain for EventSet specified in ptr->domain.eventset. Will error if eventset is not bound to a component. PAPI_GRANUL Set granularity for EventSet specified in ptr->granularity.eventset. Will error if eventset is not bound to a component. PAPI_INHERIT Enable or disable inheritance for specified EventSet. PAPI_DATA_ADDRESS Set data address range to restrict event counting for EventSet specified in ptr->addr.eventset. Starting and ending addresses are specified in ptr->addr.start and ptr->addr.end, respectively. If exact addresses cannot be instantiated, offsets are returned in ptr->addr.start_off and ptr->addr.end_off. Currently implemented on Itanium only. PAPI_INSTR_ADDRESS Set instruction address range as described above. Itanium only. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_debug\fP .PP \fBPAPI_set_multiplex\fP .PP \fBPAPI_set_domain\fP .PP \fBPAPI_option_t\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_event_name_to_code.30000600003276200002170000000105312247131117020371 0ustar ralphundrgrad.TH "PAPIF_event_name_to_code" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_event_name_to_code \- .PP Convert a name to a numeric hardware event code\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_event_name_to_code( C_STRING EventName, C_INT EventCode, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_event_name_to_code\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_start_counters.30000600003276200002170000000365712247131120017533 0ustar ralphundrgrad.TH "PAPI_start_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_start_counters \- .PP Start counting hardware events\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_start_counters( int *events, int array_len ); .fi .PP .PP \fBParameters:\fP .RS 4 \fI*events\fP an array of codes for events such as PAPI_INT_INS or a native event code .br \fIarray_len\fP the number of items in the *events array .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_EISRUN\fP Counters have already been started, you must call \fBPAPI_stop_counters()\fP before you call this function again\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .br \fIPAPI_ECNFLCT\fP The underlying counter hardware cannot count this event and other events in the EventSet simultaneously\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBPAPI_start_counters()\fP starts counting the events named in the *events array\&. This function cannot be called if the counters have already been started\&. The user must call \fBPAPI_stop_counters()\fP to stop the events explicitly if he/she wants to call this function again\&. It is the user's responsibility to choose events that can be counted simultaneously by reading the vendor's documentation\&. The length of the *events array should be no longer than the value returned by \fBPAPI_num_counters()\fP\&. .PP .PP .nf if( PAPI_start_counters( Events, num_hwcntrs ) != PAPI_OK ) handle_error(1); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_stop_counters()\fP \fBPAPI_add_event()\fP \fBPAPI_create_eventset()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_destroy_eventset.30000600003276200002170000000077012247131117020167 0ustar ralphundrgrad.TH "PAPIF_destroy_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_destroy_eventset \- .PP empty and destroy an EventSet .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_destroy_eventset( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_destroy_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_epc.30000600003276200002170000000114412247131117015324 0ustar ralphundrgrad.TH "PAPIF_epc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_epc \- .PP Get named events per cycle, real and processor time, reference and core cycles\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_epc( C_STRING EventName, C_FLOAT real_time, C_FLOAT proc_time, C_LONG_LONG ref, C_LONG_LONG core, C_LONG_LONG evt, C_FLOAT epc, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_epc\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_remove_events.30000600003276200002170000000106212247131120017427 0ustar ralphundrgrad.TH "PAPIF_remove_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_remove_events \- .PP Remove an array of hardware event codes from a PAPI event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_remove_events\fP( C_INT EventSet, C_INT(*) EventCode, C_INT number, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_remove_events\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_enum_cmp_event.30000600003276200002170000001032712247131120017450 0ustar ralphundrgrad.TH "PAPI_enum_cmp_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_enum_cmp_event \- .PP Enumerate PAPI preset or native events for a given component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_enum_cmp_event( int *EventCode, int modifer, int cidx ); Given an event code, PAPI_enum_event replaces the event code with the next available event. The modifier argument affects which events are returned. For all platforms and event types, a value of PAPI_ENUM_ALL (zero) directs the function to return all possible events. @n For native events, the effect of the modifier argument may be different on each platform. See the discussion below for platform-specific definitions. @param *EventCode A defined preset or native event such as PAPI_TOT_INS. @param modifier Modifies the search logic. See below for full list. For native events, each platform behaves differently. See platform-specific documentation for details. @param cidx Specifies the component to search in @retval PAPI_ENOEVNT The next requested PAPI preset or native event is not available on the underlying hardware. @par Examples: .fi .PP .PP .nf * // Scan for all supported native events on the first component * printf( "Name\t\t\t Code\t Description\n" ); * do { * retval = PAPI_get_event_info( i, &info ); * if ( retval == PAPI_OK ) { * printf( "%-30s 0x%-10x\n%s\n", info\&.symbol, info\&.event_code, info\&.long_descr ); * } * } while ( PAPI_enum_cmp_event( &i, PAPI_ENUM_ALL, 0 ) == PAPI_OK ); * .fi .PP .PP \fBGeneric Modifiers\fP .RS 4 The following values are implemented for preset events .PD 0 .IP "\(bu" 2 PAPI_ENUM_EVENTS -- Enumerate all (default) .IP "\(bu" 2 PAPI_ENUM_FIRST -- Enumerate first event (preset or native) preset/native chosen based on type of EventCode .PP .RE .PP \fBNative Modifiers\fP .RS 4 The following values are implemented for native events .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_UMASKS -- Given an event, iterate through possible umasks one at a time .IP "\(bu" 2 PAPI_NTV_ENUM_UMASK_COMBOS -- Given an event, iterate through all possible combinations of umasks\&. This is not implemented on libpfm4\&. .PP .RE .PP \fBPreset Modifiers\fP .RS 4 The following values are implemented for preset events .PD 0 .IP "\(bu" 2 PAPI_PRESET_ENUM_AVAIL -- enumerate only available presets .IP "\(bu" 2 PAPI_PRESET_ENUM_MSC -- Miscellaneous preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_INS -- Instruction related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_IDL -- Stalled or Idle preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_BR -- Branch related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_CND -- Conditional preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_MEM -- Memory related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_CACH -- Cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L1 -- L1 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L2 -- L2 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_L3 -- L3 cache related preset events .IP "\(bu" 2 PAPI_PRESET_ENUM_TLB -- Translation Lookaside Buffer events .IP "\(bu" 2 PAPI_PRESET_ENUM_FP -- Floating Point related preset events .PP .RE .PP \fBITANIUM Modifiers\fP .RS 4 The following values are implemented for modifier on Itanium: .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_IARR - Enumerate IAR (instruction address ranging) events .IP "\(bu" 2 PAPI_NTV_ENUM_DARR - Enumerate DAR (data address ranging) events .IP "\(bu" 2 PAPI_NTV_ENUM_OPCM - Enumerate OPC (opcode matching) events .IP "\(bu" 2 PAPI_NTV_ENUM_IEAR - Enumerate IEAR (instr event address register) events .IP "\(bu" 2 PAPI_NTV_ENUM_DEAR - Enumerate DEAR (data event address register) events .PP .RE .PP \fBPOWER Modifiers\fP .RS 4 The following values are implemented for POWER .PD 0 .IP "\(bu" 2 PAPI_NTV_ENUM_GROUPS - Enumerate groups to which an event belongs .PP .RE .PP \fBSee Also:\fP .RS 4 PAPI .br PAPIF .br \fBPAPI_enum_event\fP .br \fBPAPI_get_event_info\fP .br \fBPAPI_event_name_to_code\fP .br PAPI_preset .br PAPI_native .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_state.30000600003276200002170000000442412247131120015565 0ustar ralphundrgrad.TH "PAPI_state" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_state \- .PP Return the counting state of an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_state( int EventSet, int * status )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fIstatus\fP -- an integer containing a boolean combination of one or more of the following nonzero constants as defined in the PAPI header file \fBpapi\&.h\fP: .PD 0 .IP "\(bu" 2 PAPI_STOPPED -- EventSet is stopped .IP "\(bu" 2 PAPI_RUNNING -- EventSet is running .IP "\(bu" 2 PAPI_PAUSED -- EventSet temporarily disabled by the library .IP "\(bu" 2 PAPI_NOT_INIT -- EventSet defined, but not initialized .IP "\(bu" 2 PAPI_OVERFLOWING -- EventSet has overflowing enabled .IP "\(bu" 2 PAPI_PROFILING -- EventSet has profiling enabled .IP "\(bu" 2 PAPI_MULTIPLEXING -- EventSet has multiplexing enabled .IP "\(bu" 2 PAPI_ACCUMULATING -- reserved for future use .IP "\(bu" 2 PAPI_HWPROFILING -- reserved for future use .PP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .RE .PP \fBPAPI_state()\fP returns the counting state of the specified event set\&. .PP \fBExample:\fP .RS 4 .PP .nf * int EventSet = PAPI_NULL; * int status = 0; * int ret; * * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) handle_error(ret); * * // Start counting * ret = PAPI_state(EventSet, &status); * if (ret != PAPI_OK) handle_error(ret); * printf("State is now %d\n",status); * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * ret = PAPI_state(EventSet, &status); * if (ret != PAPI_OK) handle_error(ret); * printf("State is now %d\n",status); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_stop\fP \fBPAPI_start\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_thread_id.30000600003276200002170000000073712247131120016501 0ustar ralphundrgrad.TH "PAPIF_thread_id" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_thread_id \- .PP Get the thread identifier of the current thread\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_thread_id( C_INT id )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_thread_id\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_event_component.30000600003276200002170000000133612247131120020506 0ustar ralphundrgrad.TH "PAPI_get_event_component" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_event_component \- .PP return component an event belongs to .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval ENOCMP component does not exist @param EventCode EventCode for which we want to know the component index @par Examples: .fi .PP .PP .nf int cidx,eventcode; cidx = PAPI_get_event_component(eventcode); * .fi .PP \fBPAPI_get_event_component()\fP returns the component an event belongs to\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_event_info\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_lock.30000600003276200002170000000176412247131120015401 0ustar ralphundrgrad.TH "PAPI_lock" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_lock \- .PP Lock one of two mutex variables defined in \fBpapi\&.h\fP\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP \fBPAPI_lock()\fP grabs access to one of the two PAPI mutex variables\&. This function is provided to the user to have a platform independent call to a (hopefully) efficiently implemented mutex\&. .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br void \fBPAPI_lock(int lock)\fP; .RE .PP \fBParameters:\fP .RS 4 \fIlock\fP -- an integer value specifying one of the two user locks: PAPI_USR1_LOCK or PAPI_USR2_LOCK .RE .PP \fBReturns:\fP .RS 4 There is no return value for this call\&. Upon return from \fBPAPI_lock\fP the current thread has acquired exclusive access to the specified PAPI mutex\&. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_unlock\fP .PP \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_virt_nsec.30000600003276200002170000000231612247131120017276 0ustar ralphundrgrad.TH "PAPI_get_virt_nsec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_virt_nsec \- .PP Get virtual time counter values in nanoseconds\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_ECNFLCT\fP If there is no master event set\&. This will happen if the library has not been initialized, or for threaded applications, if there has been no thread id function defined by the \fBPAPI_thread_init\fP function\&. .br \fIPAPI_ENOMEM\fP For threaded applications, if there has not yet been any thread specific master event created for the current thread, and if the allocation of such an event set fails, the call will return PAPI_ENOMEM or PAPI_ESYS \&. .RE .PP This function returns the total number of virtual units from some arbitrary starting point\&. Virtual units accrue every time the process is running in user-mode on behalf of the process\&. Like the real time counters, this count is guaranteed to exist on every platform PAPI supports\&. However on some platforms, the resolution can be as bad as 1/Hz as defined by the operating system\&. .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_epc.30000600003276200002170000000470212247131120015213 0ustar ralphundrgrad.TH "PAPI_epc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_epc \- .PP Simplified call to get arbitrary events per cycle, real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface: \fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_epc( int event, float *rtime, float *ptime, long long *ref, long long *core, long long *evt, float *epc )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIevent\fP event code to be measured (0 defaults to PAPI_TOT_INS) .br \fI*rtime\fP total realtime since the first call .br \fI*ptime\fP total process time since the first call .br \fI*ref\fP incremental reference clock cycles since the last call .br \fI*core\fP incremental core clock cycles since the last call .br \fI*evt\fP total events since the first call .br \fI*epc\fP incremental events per cycle since the last call .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP The counters were already started by something other than \fBPAPI_epc()\fP\&. .br \fIPAPI_ENOEVNT\fP One of the requested events does not exist\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .RE .PP The first call to \fBPAPI_epc()\fP will initialize the PAPI High Level interface, set up the counters to monitor the user specified event, PAPI_TOT_CYC, and PAPI_REF_CYC (if it exists) and start the counters\&. .PP Subsequent calls will read the counters and return total real time, total process time, total event counts since the start of the measurement and the core and reference cycle count and EPC rate since the latest call to \fBPAPI_epc()\fP\&. .PP A call to \fBPAPI_stop_counters()\fP will stop the counters from running and then calls such as \fBPAPI_start_counters()\fP or other rate calls can safely be used\&. .PP \fBPAPI_epc\fP can provide a more detailed look at algorithm efficiency in light of clock variability in modern cpus\&. MFLOPS is no longer an adequate description of peak performance if clock rates can arbitrarily speed up or slow down\&. By allowing a user specified event and reporting reference cycles, core cycles and real time, \fBPAPI_epc\fP provides the information to compute an accurate effective clock rate, and an accurate measure of computational throughput\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_flips()\fP .PP \fBPAPI_flops()\fP .PP \fBPAPI_ipc()\fP .PP \fBPAPI_stop_counters()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_sprofil_t.30000600003276200002170000000150212247131120016440 0ustar ralphundrgrad.TH "PAPI_sprofil_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_sprofil_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "void * \fBpr_base\fP" .br .ti -1c .RI "unsigned \fBpr_size\fP" .br .ti -1c .RI "caddr_t \fBpr_off\fP" .br .ti -1c .RI "unsigned \fBpr_scale\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "void* PAPI_sprofil_t::pr_base" buffer base .SS "caddr_t PAPI_sprofil_t::pr_off" pc start address (offset) .SS "unsigned PAPI_sprofil_t::pr_scale" pc scaling factor: fixed point fraction 0xffff ~= 1, 0x8000 == \&.5, 0x4000 == \&.25, etc\&. also, two extensions 0x1000 == 1, 0x2000 == 2 .SS "unsigned PAPI_sprofil_t::pr_size" buffer size .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_virt_cyc.30000600003276200002170000000263012247131120017123 0ustar ralphundrgrad.TH "PAPI_get_virt_cyc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_virt_cyc \- .PP get virtual time counter value in clock cycles .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @retval PAPI_ECNFLCT If there is no master event set. This will happen if the library has not been initialized, or for threaded applications, if there has been no thread id function defined by the PAPI_thread_init function. @retval PAPI_ENOMEM For threaded applications, if there has not yet been any thread specific master event created for the current thread, and if the allocation of such an event set fails, the call will return PAPI_ENOMEM or PAPI_ESYS . This function returns the total number of virtual units from some arbitrary starting point. Virtual units accrue every time the process is running in user-mode on behalf of the process. Like the real time counters, this count is guaranteed to exist on every platform PAPI supports. However on some platforms, the resolution can be as bad as 1/Hz as defined by the operating system. @par Examples: .fi .PP .PP .nf s = PAPI_get_virt_cyc(); your_slow_code(); e = PAPI_get_virt_cyc(); printf("Process has run for cycles: %lld\en",e-s); * .fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_executable_info.30000600003276200002170000000330712247131120020437 0ustar ralphundrgrad.TH "PAPI_get_executable_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_executable_info \- .PP Get the executable's address space info\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n const PAPI_exe_info_t *PAPI_get_executable_info( void ); This function returns a pointer to a structure containing information about the current program. @param fullname Fully qualified path + filename of the executable. @param name Filename of the executable with no path information. @param text_start, text_end Start and End addresses of program text segment. @param data_start, data_end Start and End addresses of program data segment. @param bss_start, bss_end Start and End addresses of program bss segment. @retval PAPI_EINVAL One or more of the arguments is invalid. @par Examples: .fi .PP .PP .nf * const PAPI_exe_info_t *prginfo = NULL; * if ( ( prginfo = PAPI_get_executable_info( ) ) == NULL ) * exit( 1 ); * printf( "Path+Program: %s\n", exeinfo->fullname ); * printf( "Program: %s\n", exeinfo->address_info\&.name ); * printf( "Text start: %p, Text end: %p\n", exeinfo->address_info\&.text_start, exeinfo->address_info\&.text_end) ; * printf( "Data start: %p, Data end: %p\n", exeinfo->address_info\&.data_start, exeinfo->address_info\&.data_end ); * printf( "Bss start: %p, Bss end: %p\n", exeinfo->address_info\&.bss_start, exeinfo->address_info\&.bss_end ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_opt\fP .PP \fBPAPI_get_hardware_info\fP .PP \fBPAPI_exe_info_t\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_disable_component_by_name.30000600003276200002170000000231512247131120021621 0ustar ralphundrgrad.TH "PAPI_disable_component_by_name" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_disable_component_by_name \- .PP disables the named component .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf \retval ENOCMP component does not exist \retval ENOINIT unable to disable the component, the library has already been initialized \param component_name name of the component to disable. \par Example: .fi .PP .PP .nf int result; result = PAPI_disable_component_by_name("example"); if (result==PAPI_OK) printf("component \"example\" has been disabled\n"); //\&.\&.\&. PAPI_library_init(PAPI_VER_CURRENT); * .fi .PP \fBPAPI_disable_component_by_name()\fP allows the user to disable a component before \fBPAPI_library_init()\fP time\&. This is useful if the user knows they do not with to use events from that component and want to reduce the PAPI library overhead\&. .PP \fBPAPI_disable_component_by_name()\fP must be called before \fBPAPI_library_init()\fP\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_library_init\fP .PP \fBPAPI_disable_component\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_flops.30000600003276200002170000000376512247131120015577 0ustar ralphundrgrad.TH "PAPI_flops" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_flops \- .PP Simplified call to get Mflops/s (floating point operation rate), real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface: \fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_flops( float *rtime, float *ptime, long long *flpops, float *mflops )\fP; .RE .PP \fBParameters:\fP .RS 4 \fI*rtime\fP total realtime since the first call .br \fI*ptime\fP total process time since the first call .br \fI*flpops\fP total floating point operations since the first call .br \fI*mflops\fP incremental (Mega) floating point operations per seconds since the last call .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP The counters were already started by something other than \fBPAPI_flops()\fP\&. .br \fIPAPI_ENOEVNT\fP The floating point operations event does not exist\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .RE .PP The first call to \fBPAPI_flops()\fP will initialize the PAPI High Level interface, set up the counters to monitor the PAPI_FP_OPS event and start the counters\&. .PP Subsequent calls will read the counters and return total real time, total process time, total floating point operations since the start of the measurement and the Mflop/s rate since latest call to \fBPAPI_flops()\fP\&. A call to \fBPAPI_stop_counters()\fP will stop the counters from running and then calls such as \fBPAPI_start_counters()\fP or other rate calls can safely be used\&. .PP \fBPAPI_flops\fP returns information related to theoretical floating point operations rather than simple instructions\&. It uses the PAPI_FP_OPS event which attempts to 'correctly' account for, e\&.g\&., FMA undercounts and FP Store overcounts, etc\&. .PP \fBSee Also:\fP .RS 4 \fBPAPI_flips()\fP .PP \fBPAPI_ipc()\fP .PP \fBPAPI_epc()\fP .PP \fBPAPI_stop_counters()\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_profil.30000600003276200002170000001627712247131120015751 0ustar ralphundrgrad.TH "PAPI_profil" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_profil \- .PP Generate a histogram of hardware counter overflows vs\&. PC addresses\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_profil\fP(void *buf, unsigned bufsiz, unsigned long offset, unsigned scale, int EventSet, int EventCode, int threshold, int flags ); .RE .PP \fBFortran Interface\fP .RS 4 The profiling routines have no Fortran interface\&. .RE .PP \fBParameters:\fP .RS 4 \fI*buf\fP -- pointer to a buffer of bufsiz bytes in which the histogram counts are stored in an array of unsigned short, unsigned int, or unsigned long long values, or 'buckets'\&. The size of the buckets is determined by values in the flags argument\&. .br \fIbufsiz\fP -- the size of the histogram buffer in bytes\&. It is computed from the length of the code region to be profiled, the size of the buckets, and the scale factor as discussed above\&. .br \fIoffset\fP -- the start address of the region to be profiled\&. .br \fIscale\fP -- broadly and historically speaking, a contraction factor that indicates how much smaller the histogram buffer is than the region to be profiled\&. More precisely, scale is interpreted as an unsigned 16-bit fixed-point fraction with the decimal point implied on the left\&. Its value is the reciprocal of the number of addresses in a subdivision, per counter of histogram buffer\&. Below is a table of representative values for scale\&. .br \fIEventSet\fP -- The PAPI EventSet to profile\&. This EventSet is marked as profiling-ready, but profiling doesn't actually start until a \fBPAPI_start()\fP call is issued\&. .br \fIEventCode\fP -- Code of the Event in the EventSet to profile\&. This event must already be a member of the EventSet\&. .br \fIthreshold\fP -- minimum number of events that must occur before the PC is sampled\&. If hardware overflow is supported for your component, this threshold will trigger an interrupt when reached\&. Otherwise, the counters will be sampled periodically and the PC will be recorded for the first sample that exceeds the threshold\&. If the value of threshold is 0, profiling will be disabled for this event\&. .br \fIflags\fP -- bit pattern to control profiling behavior\&. Defined values are shown in the table above\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_ECNFLCT\fP The underlying counter hardware can not count this event and other events in the EventSet simultaneously\&. .br \fIPAPI_ENOEVNT\fP The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBPAPI_profil()\fP provides hardware event statistics by profiling the occurence of specified hardware counter events\&. It is designed to mimic the UNIX SVR4 profil call\&. .PP The statistics are generated by creating a histogram of hardware counter event overflows vs\&. program counter addresses for the current process\&. The histogram is defined for a specific region of program code to be profiled, and the identified region is logically broken up into a set of equal size subdivisions, each of which corresponds to a count in the histogram\&. .PP With each hardware event overflow, the current subdivision is identified and its corresponding histogram count is incremented\&. These counts establish a relative measure of how many hardware counter events are occuring in each code subdivision\&. .PP The resulting histogram counts for a profiled region can be used to identify those program addresses that generate a disproportionately high percentage of the event of interest\&. .PP Events to be profiled are specified with the EventSet and EventCode parameters\&. More than one event can be simultaneously profiled by calling \fBPAPI_profil()\fP several times with different EventCode values\&. Profiling can be turned off for a given event by calling \fBPAPI_profil()\fP with a threshold value of 0\&. .PP \fBRepresentative values for the scale variable\fP .RS 4 HEX DECIMAL DEFININTION 0x20000 131072 Maps precisely one instruction address to a unique bucket in buf. 0x10000 65536 Maps precisely two instruction addresses to a unique bucket in buf. 0x0FFFF 65535 Maps approximately two instruction addresses to a unique bucket in buf. 0x08000 32768 Maps every four instruction addresses to a bucket in buf. 0x04000 16384 Maps every eight instruction addresses to a bucket in buf. 0x00002 2 Maps all instruction addresses to the same bucket in buf. 0x00001 1 Undefined. 0x00000 0 Undefined. .RE .PP Historically, the scale factor was introduced to allow the allocation of buffers smaller than the code size to be profiled\&. Data and instruction sizes were assumed to be multiples of 16-bits\&. These assumptions are no longer necessarily true\&. \fBPAPI_profil()\fP has preserved the traditional definition of scale where appropriate, but deprecated the definitions for 0 and 1 (disable scaling) and extended the range of scale to include 65536 and 131072 to allow for exactly two addresses and exactly one address per profiling bucket\&. .PP The value of bufsiz is computed as follows: .PP bufsiz = (end - start)*(bucket_size/2)*(scale/65536) where .PD 0 .IP "\(bu" 2 bufsiz - the size of the buffer in bytes .IP "\(bu" 2 end, start - the ending and starting addresses of the profiled region .IP "\(bu" 2 bucket_size - the size of each bucket in bytes; 2, 4, or 8 as defined in flags .PP \fBDefined bits for the flags variable:\fP .RS 4 .PD 0 .IP "\(bu" 2 PAPI_PROFIL_POSIX Default type of profiling, similar to profil (3)\&. .br .IP "\(bu" 2 PAPI_PROFIL_RANDOM Drop a random 25% of the samples\&. .br .IP "\(bu" 2 PAPI_PROFIL_WEIGHTED Weight the samples by their value\&. .br .IP "\(bu" 2 PAPI_PROFIL_COMPRESS Ignore samples as values in the hash buckets get big\&. .br .IP "\(bu" 2 PAPI_PROFIL_BUCKET_16 Use unsigned short (16 bit) buckets, This is the default bucket\&. .br .IP "\(bu" 2 PAPI_PROFIL_BUCKET_32 Use unsigned int (32 bit) buckets\&. .br .IP "\(bu" 2 PAPI_PROFIL_BUCKET_64 Use unsigned long long (64 bit) buckets\&. .br .IP "\(bu" 2 PAPI_PROFIL_FORCE_SW Force software overflow in profiling\&. .br .PP .RE .PP \fBExample\fP .RS 4 .PP .nf * int retval; * unsigned long length; * PAPI_exe_info_t *prginfo; * unsigned short *profbuf; * * if ((prginfo = PAPI_get_executable_info()) == NULL) * handle_error(1); * * length = (unsigned long)(prginfo->text_end - prginfo->text_start); * * profbuf = (unsigned short *)malloc(length); * if (profbuf == NULL) * handle_error(1); * memset(profbuf,0x00,length); * * if ((retval = PAPI_profil(profbuf, length, start, 65536, EventSet, * PAPI_FP_INS, 1000000, PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16)) * != PAPI_OK) * handle_error(retval); * .fi .PP .RE .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_overflow\fP .PP \fBPAPI_sprofil\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_add_events.30000600003276200002170000000547612247131120016571 0ustar ralphundrgrad.TH "PAPI_add_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_add_events \- .PP add multiple PAPI presets or native hardware events to an event set .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_add_events( int EventSet, int * EventCodes, int number ); PAPI_add_event adds one event to a PAPI Event Set. PAPI_add_events does the same, but for an array of events. @n A hardware event can be either a PAPI preset or a native hardware event code. For a list of PAPI preset events, see PAPI_presets or run the avail test case in the PAPI distribution. PAPI presets can be passed to PAPI_query_event to see if they exist on the underlying architecture. For a list of native events available on current platform, run native_avail test case in the PAPI distribution. For the encoding of native events, see PAPI_event_name_to_code to learn how to generate native code for the supported native event on the underlying architecture. @param EventSet An integer handle for a PAPI Event Set as created by PAPI_create_eventset. @param *EventCode An array of defined events. @param number An integer indicating the number of events in the array *EventCode. It should be noted that PAPI_add_events can partially succeed, exactly like PAPI_remove_events. @retval Positive-Integer The number of consecutive elements that succeeded before the error. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOMEM Insufficient memory to complete the operation. @retval PAPI_ENOEVST The event set specified does not exist. @retval PAPI_EISRUN The event set is currently counting events. @retval PAPI_ECNFLCT The underlying counter hardware can not count this event and other events in the event set simultaneously. @retval PAPI_ENOEVNT The PAPI preset is not available on the underlying hardware. @retval PAPI_EBUG Internal error, please send mail to the developers. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * unsigned int native = 0x0; * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) * handle_error( 1 ); * // Add native event PM_CYC to EventSet * if ( PAPI_event_name_to_code( "PM_CYC", &native ) != PAPI_OK ) * handle_error( 1 ); * if ( PAPI_add_event( EventSet, native ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP .PP .nf @see PAPI_cleanup_eventset @n PAPI_destroy_eventset @n PAPI_event_code_to_name @n PAPI_remove_events @n PAPI_query_event @n PAPI_presets @n PAPI_native @n PAPI_remove_event.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_shlib_info_t.30000600003276200002170000000061012247131120017075 0ustar ralphundrgrad.TH "PAPI_shlib_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_shlib_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "\fBPAPI_address_map_t\fP * \fBmap\fP" .br .ti -1c .RI "int \fBcount\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_create_eventset.30000600003276200002170000000076712247131117017747 0ustar ralphundrgrad.TH "PAPIF_create_eventset" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_create_eventset \- .PP create a new empty PAPI EventSet .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_create_eventset( C_INT EventSet, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_create_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_read_counters.30000600003276200002170000000334612247131120017304 0ustar ralphundrgrad.TH "PAPI_read_counters" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_read_counters \- .PP Read and reset counters\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_read_counters( long long *values, int array_len ); .fi .PP .PP \fBParameters:\fP .RS 4 \fI*values\fP an array to hold the counter values of the counting events .br \fIarry_len\fP the number of items in the *events array .RE .PP \fBPrecondition:\fP .RS 4 These calls assume an initialized PAPI library and a properly added event set\&. .RE .PP \fBPostcondition:\fP .RS 4 The counters are reset and left running after the call\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .RE .PP \fBPAPI_read_counters()\fP copies the event counters into the array *values\&. .PP .PP .nf do_100events(); if ( PAPI_read_counters( values, num_hwcntrs ) != PAPI_OK ) handlw_error(1); // values[0] now equals 100 do_100events(); if ( PAPI_accum_counters( values, num_hwcntrs ) != PAPI_OK ) handle_error(1); // values[0] now equals 200 values[0] = -100; do_100events(); if ( PAPI_accum_counters(values, num_hwcntrs ) != PAPI_OK ) handle_error(); // values[0] now equals 0 * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt()\fP \fBPAPI_start_counters()\fP .RE .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br PAPIF_read_counters( C_LONG_LONG(*) values, C_INT array_len, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_read_counters\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_domain.30000600003276200002170000000353312247131120016567 0ustar ralphundrgrad.TH "PAPI_set_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_domain \- .PP Set the default counting domain for new event sets bound to the cpu component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Prototype: \#include @n int PAPI_set_domain( int domain ); @param domain one of the following constants as defined in the papi.h header file @arg PAPI_DOM_USER User context counted @arg PAPI_DOM_KERNEL Kernel/OS context counted @arg PAPI_DOM_OTHER Exception/transient mode counted @arg PAPI_DOM_SUPERVISOR Supervisor/hypervisor context counted @arg PAPI_DOM_ALL All above contexts counted @arg PAPI_DOM_MIN The smallest available context @arg PAPI_DOM_MAX The largest available context .fi .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .RE .PP \fBPAPI_set_domain\fP sets the default counting domain for all new event sets created by \fBPAPI_create_eventset\fP in all threads\&. This call implicitly sets the domain for the cpu component (component 0) and is included to preserve backward compatibility\&. .PP \fBExample:\fP .RS 4 .PP .nf int ret; // Initialize the library ret = PAPI_library_init(PAPI_VER_CURRENT); if (ret > 0 && ret != PAPI_VER_CURRENT) { fprintf(stderr,"PAPI library version mismatch!\n"); exit(1); } if (ret < 0) handle_error(ret); // Set the default domain for the cpu component ret = PAPI_set_domain(PAPI_DOM_KERNEL); if (ret != PAPI_OK) handle_error(ret); ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_cmp_domain\fP \fBPAPI_set_granularity\fP \fBPAPI_set_opt\fP \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_add_named_event.30000600003276200002170000000105312247131117017651 0ustar ralphundrgrad.TH "PAPIF_add_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_add_named_event \- .PP add PAPI preset or native hardware event to an event set by name .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_add_named_event( C_INT EventSet, C_STRING EventName, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_add_named_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_overflow.30000600003276200002170000001265712247131120016317 0ustar ralphundrgrad.TH "PAPI_overflow" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_overflow \- .PP Set up an event set to begin registering overflows\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP \fBPAPI_overflow()\fP marks a specific EventCode in an EventSet to generate an overflow signal after every threshold events are counted\&. More than one event in an event set can be used to trigger overflows\&. In such cases, the user must call this function once for each overflowing event\&. To turn off overflow on a specified event, call this function with a threshold value of 0\&. .PP Overflows can be implemented in either software or hardware, but the scope is the entire event set\&. PAPI defaults to hardware overflow if it is available\&. In the case of software overflow, a periodic timer interrupt causes PAPI to compare the event counts against the threshold values and call the overflow handler if one or more events have exceeded their threshold\&. In the case of hardware overflow, the counters are typically set to the negative of the threshold value and count up to 0\&. This zero-crossing triggers a hardware interrupt that calls the overflow handler\&. Because of this counter interrupt, the counter values for overflowing counters may be very small or even negative numbers, and cannot be relied upon as accurate\&. In such cases the overflow handler can approximate the counts by supplying the threshold value whenever an overflow occurs\&. .PP _papi_overflow_handler() is a placeholder for a user-defined function to process overflow events\&. A pointer to this function is passed to the \fBPAPI_overflow\fP routine, where it is invoked whenever a software or hardware overflow occurs\&. This handler receives the EventSet of the overflowing event, the Program Counter address when the interrupt occured, an overflow_vector that can be processed to determined which event(s) caused the overflow, and a pointer to the machine context, which can be used in a platform-specific manor to extract register information about what was happening when the overflow occured\&. .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_overflow\fP (int EventSet, int EventCode, int threshold, int flags, PAPI_overflow_handler_t handler ); .br .br (*PAPI_overflow_handler_t) _papi_overflow_handler (int EventSet, void *address, long_long overflow_vector, void *context ); .RE .PP \fBFortran Interface:\fP .RS 4 Not implemented .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle to a PAPI event set as created by \fBPAPI_create_eventset\fP .br \fIEventCode\fP -- the preset or native event code to be set for overflow detection\&. This event must have already been added to the EventSet\&. .br \fIthreshold\fP -- the overflow threshold value for this EventCode\&. .br \fIflags\fP -- bitmap that controls the overflow mode of operation\&. Set to PAPI_OVERFLOW_FORCE_SW to force software overflowing, even if hardware overflow support is available\&. If hardware overflow support is available on a given system, it will be the default mode of operation\&. There are situations where it is advantageous to use software overflow instead\&. Although software overflow is inherently less accurate, with more latency and processing overhead, it does allow for overflowing on derived events, and for the accurate recording of overflowing event counts\&. These two features are typically not available with hardware overflow\&. Only one type of overflow is allowed per event set, so setting one event to hardware overflow and another to forced software overflow will result in an error being returned\&. .br \fIhandler\fP -- pointer to the user supplied handler function to call upon overflow .br \fIaddress\fP -- the Program Counter address at the time of the overflow .br \fIoverflow_vector\fP -- a long long word containing flag bits to indicate which hardware counter(s) caused the overflow .br \fI*context\fP -- pointer to a machine specific structure that defines the register context at the time of overflow\&. This parameter is often unused and can be ignored in the user function\&. .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP On success, \fBPAPI_overflow\fP returns PAPI_OK\&. .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. Most likely a bad threshold value\&. .br \fIPAPI_ENOMEM\fP Insufficient memory to complete the operation\&. .br \fIPAPI_ENOEVST\fP The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP The EventSet is currently counting events\&. .br \fIPAPI_ECNFLCT\fP The underlying counter hardware cannot count this event and other events in the EventSet simultaneously\&. Also can happen if you are trying to overflow both by hardware and by forced software at the same time\&. .br \fIPAPI_ENOEVNT\fP The PAPI event is not available on the underlying hardware\&. .RE .PP \fBExample\fP .RS 4 .PP .nf * // Define a simple overflow handler: * void handler(int EventSet, void *address, long_long overflow_vector, void *context) * { * fprintf(stderr,\"Overflow at %p! bit=0x%llx \\n\", * address,overflow_vector); * } * * // Call PAPI_overflow for an EventSet containing PAPI_TOT_INS, * // setting the threshold to 100000\&. Use the handler defined above\&. * retval = PAPI_overflow(EventSet, PAPI_TOT_INS, 100000, 0, handler); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_overflow_event_index\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_add_events.30000600003276200002170000000105112247131117016666 0ustar ralphundrgrad.TH "PAPIF_add_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_add_events \- .PP add multiple PAPI presets or native hardware events to an event set .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_add_events\fP( C_INT EventSet, C_INT(*) EventCodes, C_INT number, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_add_events\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_itimer_option_t.30000600003276200002170000000070412247131120017646 0ustar ralphundrgrad.TH "PAPI_itimer_option_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_itimer_option_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBitimer_num\fP" .br .ti -1c .RI "int \fBitimer_sig\fP" .br .ti -1c .RI "int \fBns\fP" .br .ti -1c .RI "int \fBflags\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_read.30000600003276200002170000000075512247131120015471 0ustar ralphundrgrad.TH "PAPIF_read" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_read \- .PP Read hardware counters from an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_read\fP(C_INT EventSet, C_LONG_LONG(*) values, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_read\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_is_initialized.30000600003276200002170000000073712247131120017556 0ustar ralphundrgrad.TH "PAPIF_is_initialized" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_is_initialized \- .PP Check for initialization\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_is_initialized( C_INT level )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_is_initialized\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_mh_level_t.30000600003276200002170000000064112247131120016560 0ustar ralphundrgrad.TH "PAPI_mh_level_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_mh_level_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "\fBPAPI_mh_tlb_info_t\fP \fBtlb\fP [6]" .br .ti -1c .RI "\fBPAPI_mh_cache_info_t\fP \fBcache\fP [6]" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_clockrate.30000600003276200002170000000112712247131117017364 0ustar ralphundrgrad.TH "PAPIF_get_clockrate" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_clockrate \- .PP Get the clockrate in MHz for the current cpu\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_domain( C_INT cr )\fP .RE .PP \fBNote:\fP .RS 4 This is a Fortran only interface that returns a value from the \fBPAPI_get_opt\fP call\&. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_preload.30000600003276200002170000000115412247131117017043 0ustar ralphundrgrad.TH "PAPIF_get_preload" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_preload \- .PP Get the LD_PRELOAD environment variable\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_preload( C_STRING lib_preload_env, C_INT check )\fP .RE .PP \fBNote:\fP .RS 4 This is a Fortran only interface that returns a value from the \fBPAPI_get_opt\fP call\&. .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_add_named_event.30000600003276200002170000000454612247131120017547 0ustar ralphundrgrad.TH "PAPI_add_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_add_named_event \- .PP add PAPI preset or native hardware event by name to an EventSet .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_add_named_event( int EventSet, char *EventName ); PAPI_add_named_event adds one event to a PAPI EventSet. @n A hardware event can be either a PAPI preset or a native hardware event code. For a list of PAPI preset events, see PAPI_presets or run the avail test case in the PAPI distribution. PAPI presets can be passed to PAPI_query_event to see if they exist on the underlying architecture. For a list of native events available on current platform, run the papi_native_avail utility in the PAPI distribution. @param EventSet An integer handle for a PAPI Event Set as created by PAPI_create_eventset. @param EventCode A defined event such as PAPI_TOT_INS. @retval Positive-Integer The number of consecutive elements that succeeded before the error. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOINIT The PAPI library has not been initialized. @retval PAPI_ENOMEM Insufficient memory to complete the operation. @retval PAPI_ENOEVST The event set specified does not exist. @retval PAPI_EISRUN The event set is currently counting events. @retval PAPI_ECNFLCT The underlying counter hardware can not count this event and other events in the event set simultaneously. @retval PAPI_ENOEVNT The PAPI preset is not available on the underlying hardware. @retval PAPI_EBUG Internal error, please send mail to the developers. @par Examples: .fi .PP .PP .nf * char EventName = "PAPI_TOT_INS"; * int EventSet = PAPI_NULL; * unsigned int native = 0x0; * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_named_event( EventSet, EventName ) != PAPI_OK ) * handle_error( 1 ); * // Add native event PM_CYC to EventSet * if ( PAPI_add_named_event( EventSet, "PM_CYC" ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP .PP .nf @see PAPI_add_event @n PAPI_query_named_event @n PAPI_remove_named_event.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_cmp_granularity.30000600003276200002170000000437012247131120020520 0ustar ralphundrgrad.TH "PAPI_set_cmp_granularity" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_cmp_granularity \- .PP Set the default counting granularity for eventsets bound to the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Prototype: \#include @n int PAPI_set_cmp_granularity( int granularity, int cidx ); @param granularity one of the following constants as defined in the papi.h header file @arg PAPI_GRN_THR Count each individual thread @arg PAPI_GRN_PROC Count each individual process @arg PAPI_GRN_PROCG Count each individual process group @arg PAPI_GRN_SYS Count the current CPU @arg PAPI_GRN_SYS_CPU Count all CPUs individually @arg PAPI_GRN_MIN The finest available granularity @arg PAPI_GRN_MAX The coarsest available granularity @param cidx An integer identifier for a component. By convention, component 0 is always the cpu component. .fi .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOCMP\fP The argument cidx is not a valid component\&. .RE .PP \fBPAPI_set_cmp_granularity\fP sets the default counting granularity for all new event sets, and requires an explicit component argument\&. Event sets that are already in existence are not affected\&. .PP To change the granularity of an existing event set, please see \fBPAPI_set_opt\fP\&. The reader should note that the granularity of an event set affects only the mode in which the counter continues to run\&. .PP \fBExample:\fP .RS 4 .PP .nf int ret; // Initialize the library ret = PAPI_library_init(PAPI_VER_CURRENT); if (ret > 0 && ret != PAPI_VER_CURRENT) { fprintf(stderr,"PAPI library version mismatch!\n"); exit(1); } if (ret < 0) handle_error(ret); // Set the default granularity for the cpu component ret = PAPI_set_cmp_granularity(PAPI_GRN_PROC, 0); if (ret != PAPI_OK) handle_error(ret); ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_granularity\fP \fBPAPI_set_domain\fP \fBPAPI_set_opt\fP \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_mpx_info_t.30000600003276200002170000000135212247131120016604 0ustar ralphundrgrad.TH "PAPI_mpx_info_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_mpx_info_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBtimer_sig\fP" .br .ti -1c .RI "int \fBtimer_num\fP" .br .ti -1c .RI "int \fBtimer_us\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Field Documentation" .PP .SS "int PAPI_mpx_info_t::timer_num" Number of the itimer or POSIX 1 timer used by the multiplex timer: PAPI_ITIMER .SS "int PAPI_mpx_info_t::timer_sig" Signal number used by the multiplex timer, 0 if not: PAPI_SIGNAL .SS "int PAPI_mpx_info_t::timer_us" uS between switching of sets: PAPI_MPX_DEF_US .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_real_cyc.30000600003276200002170000000143612247131120017065 0ustar ralphundrgrad.TH "PAPI_get_real_cyc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_real_cyc \- .PP get real time counter value in clock cycles Returns the total real time passed since some arbitrary starting point\&. The time is returned in clock cycles\&. This call is equivalent to wall clock time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par Examples: .fi .PP .PP .nf s = PAPI_get_real_cyc(); your_slow_code(); e = PAPI_get_real_cyc(); printf("Wallclock cycles: %lld\en",e-s); * .fi .PP .PP \fBSee Also:\fP .RS 4 PAPIF PAPI \fBPAPI_get_virt_usec\fP \fBPAPI_get_virt_cyc\fP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_virt_usec.30000600003276200002170000000076612247131117017430 0ustar ralphundrgrad.TH "PAPIF_get_virt_usec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_virt_usec \- .PP Get virtual time counter value in microseconds\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_virt_usec( C_LONG_LONG time )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_virt_usec\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_library_init.30000600003276200002170000000073212247131120017240 0ustar ralphundrgrad.TH "PAPIF_library_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_library_init \- .PP Initialize the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_library_init( C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_get_real_cyc.30000600003276200002170000000076312247131117017203 0ustar ralphundrgrad.TH "PAPIF_get_real_cyc" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_get_real_cyc \- .PP Get real time counter value in clock cycles\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_get_real_cyc( C_LONG_LONG real_cyc )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_get_real_cyc\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_num_events.30000600003276200002170000000075512247131120016741 0ustar ralphundrgrad.TH "PAPIF_num_events" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_num_events \- .PP Enumerate PAPI preset or native events\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_num_events(C_INT EventSet, C_INT count)\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_num_events\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_set_event_domain.30000600003276200002170000000106512247131120020074 0ustar ralphundrgrad.TH "PAPIF_set_event_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_set_event_domain \- .PP Set the default counting domain for specified EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Prototype:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_set_event_domain( C_INT EventSet, C_INT domain, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_domain\fP .PP \fBPAPI_set_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_cmp_domain.30000600003276200002170000000504412247131120017425 0ustar ralphundrgrad.TH "PAPI_set_cmp_domain" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_cmp_domain \- .PP Set the default counting domain for new event sets bound to the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Prototype: \#include @n int PAPI_set_cmp_domain( int domain, int cidx ); @param domain one of the following constants as defined in the papi.h header file @arg PAPI_DOM_USER User context counted @arg PAPI_DOM_KERNEL Kernel/OS context counted @arg PAPI_DOM_OTHER Exception/transient mode counted @arg PAPI_DOM_SUPERVISOR Supervisor/hypervisor context counted @arg PAPI_DOM_ALL All above contexts counted @arg PAPI_DOM_MIN The smallest available context @arg PAPI_DOM_MAX The largest available context @arg PAPI_DOM_HWSPEC Something other than CPU like stuff. Individual components can decode low order bits for more meaning @param cidx An integer identifier for a component. By convention, component 0 is always the cpu component. .fi .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP One or more of the arguments is invalid\&. .br \fIPAPI_ENOCMP\fP The argument cidx is not a valid component\&. .RE .PP \fBPAPI_set_cmp_domain\fP sets the default counting domain for all new event sets in all threads, and requires an explicit component argument\&. Event sets that are already in existence are not affected\&. To change the domain of an existing event set, please see \fBPAPI_set_opt\fP\&. The reader should note that the domain of an event set affects only the mode in which the counter continues to run\&. Counts are still aggregated for the current process, and not for any other processes in the system\&. Thus when requesting PAPI_DOM_KERNEL , the user is asking for events that occur on behalf of the process, inside the kernel\&. .PP \fBExample:\fP .RS 4 .PP .nf int ret; // Initialize the library ret = PAPI_library_init(PAPI_VER_CURRENT); if (ret > 0 && ret != PAPI_VER_CURRENT) { fprintf(stderr,"PAPI library version mismatch!\n"); exit(1); } if (ret < 0) handle_error(ret); // Set the default domain for the cpu component ret = PAPI_set_cmp_domain(PAPI_DOM_KERNEL,0); if (ret != PAPI_OK) handle_error(ret); ret = PAPI_create_eventset(&EventSet); if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_domain\fP \fBPAPI_set_granularity\fP \fBPAPI_set_opt\fP \fBPAPI_get_opt\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_accum.30000600003276200002170000000076312247131117015653 0ustar ralphundrgrad.TH "PAPIF_accum" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_accum \- .PP accumulate and reset counters in an event set .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_accum\fP( C_INT EventSet, C_LONG_LONG(*) values, C_INT check ) .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_accum\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_real_usec.30000600003276200002170000000145712247131120017251 0ustar ralphundrgrad.TH "PAPI_get_real_usec" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_real_usec \- .PP get real time counter value in microseconds .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf This function returns the total real time passed since some arbitrary starting point. The time is returned in microseconds. This call is equivalent to wall clock time. @par Examples: .fi .PP .PP .nf s = PAPI_get_real_cyc(); your_slow_code(); e = PAPI_get_real_cyc(); printf("Wallclock cycles: %lld\en",e-s); * .fi .PP .PP \fBSee Also:\fP .RS 4 PAPIF .PP PAPI .PP \fBPAPI_get_virt_usec\fP .PP \fBPAPI_get_virt_cyc\fP .PP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_query_named_event.30000600003276200002170000000100512247131120020255 0ustar ralphundrgrad.TH "PAPIF_query_named_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_query_named_event \- .PP Query if named PAPI event exists\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_query_named_event(C_STRING EventName, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_query_named_event\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_get_shared_lib_info.30000600003276200002170000000164712247131120020417 0ustar ralphundrgrad.TH "PAPI_get_shared_lib_info" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_get_shared_lib_info \- .PP Get address info about the shared libraries used by the process\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP In C, this function returns a pointer to a structure containing information about the shared library used by the program\&. There is no Fortran equivalent call\&. .PP \fBNote:\fP .RS 4 This data will be incorporated into the \fBPAPI_get_executable_info\fP call in the future\&. \fBPAPI_get_shared_lib_info\fP will be deprecated and should be used with caution\&. .RE .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_shlib_info_t\fP .PP \fBPAPI_get_hardware_info\fP .PP \fBPAPI_get_executable_info\fP .PP \fBPAPI_get_dmem_info\fP .PP \fBPAPI_get_opt\fP \fBPAPI_library_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_state.30000600003276200002170000000075012247131120015671 0ustar ralphundrgrad.TH "PAPIF_state" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_state \- .PP Return the counting state of an EventSet\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_state(C_INT EventSet, C_INT status, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_state\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_start.30000600003276200002170000000401012247131120015571 0ustar ralphundrgrad.TH "PAPI_start" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_start \- .PP Start counting hardware events in an event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_start( int EventSet )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP -- an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP -- One or more of the arguments is invalid\&. .br \fIPAPI_ESYS\fP -- A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ENOEVST\fP -- The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP -- The EventSet is currently counting events\&. .br \fIPAPI_ECNFLCT\fP -- The underlying counter hardware can not count this event and other events in the EventSet simultaneously\&. .br \fIPAPI_ENOEVNT\fP -- The PAPI preset is not available on the underlying hardware\&. .RE .PP \fBPAPI_start\fP starts counting all of the hardware events contained in the previously defined EventSet\&. All counters are implicitly set to zero before counting\&. Assumes an initialized PAPI library and a properly added event set\&. .PP \fBExample:\fP .RS 4 .PP .nf * int EventSet = PAPI_NULL; * long long values[2]; * int ret; * * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Add Total Instructions Executed to our EventSet * ret = PAPI_add_event(EventSet, PAPI_TOT_INS); * if (ret != PAPI_OK) handle_error(ret); * * // Start counting * ret = PAPI_start(EventSet); * if (ret != PAPI_OK) handle_error(ret); * poorly_tuned_function(); * ret = PAPI_stop(EventSet, values); * if (ret != PAPI_OK) handle_error(ret); * printf("%lld\\n",values[0]); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_create_eventset\fP \fBPAPI_add_event\fP \fBPAPI_stop\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_flips.30000600003276200002170000000110712247131117015671 0ustar ralphundrgrad.TH "PAPIF_flips" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_flips \- .PP Simplified call to get Mflips/s (floating point instruction rate), real and processor time\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_flips( C_FLOAT real_time, C_FLOAT proc_time, C_LONG_LONG flpins, C_FLOAT mflips, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_flips\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_unregister_thread.30000600003276200002170000000246012247131120020161 0ustar ralphundrgrad.TH "PAPI_unregister_thread" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_unregister_thread \- .PP Notify PAPI that a thread has 'disappeared'\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBReturn values:\fP .RS 4 \fIPAPI_ENOMEM\fP Space could not be allocated to store the new thread information\&. .br \fIPAPI_ESYS\fP A system or C library call failed inside PAPI, see the errno variable\&. .br \fIPAPI_ECMP\fP Hardware counters for this thread could not be initialized\&. .RE .PP \fBPAPI_unregister_thread\fP should be called when the user wants to shutdown a particular thread and free the associated thread ID\&. THIS IS IMPORTANT IF YOUR THREAD LIBRARY REUSES THE SAME THREAD ID FOR A NEW KERNEL LWP\&. OpenMP does this\&. OpenMP parallel regions, if separated by a call to omp_set_num_threads() will often kill off the underlying kernel LWPs and then start new ones for the next region\&. However, omp_get_thread_id() does not reflect this, as the thread IDs for the new LWPs will be the same as the old LWPs\&. PAPI needs to know that the underlying LWP has changed so it can set up the counters for that new thread\&. This is accomplished by calling this function\&. .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_set_multiplex.30000600003276200002170000000552612247131120017347 0ustar ralphundrgrad.TH "PAPI_set_multiplex" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_set_multiplex \- .PP Convert a standard event set to a multiplexed event set\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_set_multiplex( int EventSet )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIEventSet\fP an integer handle for a PAPI event set as created by \fBPAPI_create_eventset\fP .RE .PP \fBReturn values:\fP .RS 4 \fIPAPI_OK\fP .br \fIPAPI_EINVAL\fP -- One or more of the arguments is invalid, or the EventSet is already multiplexed\&. .br \fIPAPI_ENOCMP\fP -- The EventSet specified is not yet bound to a component\&. .br \fIPAPI_ENOEVST\fP -- The EventSet specified does not exist\&. .br \fIPAPI_EISRUN\fP -- The EventSet is currently counting events\&. .br \fIPAPI_ENOMEM\fP -- Insufficient memory to complete the operation\&. .RE .PP \fBPAPI_set_multiplex\fP converts a standard PAPI event set created by a call to \fBPAPI_create_eventset\fP into an event set capable of handling multiplexed events\&. This must be done after calling \fBPAPI_multiplex_init\fP, and either \fBPAPI_add_event\fP or \fBPAPI_assign_eventset_component\fP, but prior to calling \fBPAPI_start()\fP\&. .PP Events can be added to an event set either before or after converting it into a multiplexed set, but the conversion must be done prior to using it as a multiplexed set\&. .PP \fBNote:\fP .RS 4 Multiplexing can't be enabled until PAPI knows which component is targeted\&. Due to the late binding nature of PAPI event sets, this only happens after adding an event to an event set or explicitly binding the component with a call to \fBPAPI_assign_eventset_component\fP\&. .RE .PP \fBExample:\fP .RS 4 .PP .nf * int EventSet = PAPI_NULL; * int ret; * * // Create an empty EventSet * ret = PAPI_create_eventset(&EventSet); * if (ret != PAPI_OK) handle_error(ret); * * // Bind it to the CPU component * ret = PAPI_assign_eventset_component(EventSet, 0); * if (ret != PAPI_OK) handle_error(ret); * * // Check current multiplex status * ret = PAPI_get_multiplex(EventSet); * if (ret == TRUE) printf("This event set is ready for multiplexing\n\&.") * if (ret == FALSE) printf("This event set is not enabled for multiplexing\n\&.") * if (ret < 0) handle_error(ret); * * // Turn on multiplexing * ret = PAPI_set_multiplex(EventSet); * if ((ret == PAPI_EINVAL) && (PAPI_get_multiplex(EventSet) == TRUE)) * printf("This event set already has multiplexing enabled\n"); * else if (ret != PAPI_OK) handle_error(ret); * .fi .PP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_multiplex_init\fP .PP \fBPAPI_get_multiplex\fP .PP \fBPAPI_set_opt\fP .PP \fBPAPI_create_eventset\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_attach.30000600003276200002170000000352212247131120015707 0ustar ralphundrgrad.TH "PAPI_attach" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_attach \- .PP Attach PAPI event set to the specified thread id\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_attach( int EventSet, unsigned long tid ); PAPI_attach is a wrapper function that calls PAPI_set_opt to allow PAPI to monitor performance counts on a thread other than the one currently executing. This is sometimes referred to as third party monitoring. PAPI_attach connects the specified EventSet to the specifed thread; PAPI_detach breaks that connection and restores the EventSet to the original executing thread. @param EventSet An integer handle for a PAPI EventSet as created by PAPI_create_eventset. @param tid A thread id as obtained from, for example, PAPI_list_threads or PAPI_thread_id. @retval PAPI_ECMP This feature is unsupported on this component. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOEVST The event set specified does not exist. @retval PAPI_EISRUN The event set is currently counting events. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * unsigned long pid; * pid = fork( ); * if ( pid <= 0 ) * exit( 1 ); * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * exit( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) * exit( 1 ); * // Attach this EventSet to the forked process * if ( PAPI_attach( EventSet, pid ) != PAPI_OK ) * exit( 1 ); * .fi .PP .PP \fBSee Also:\fP .RS 4 \fBPAPI_set_opt\fP .PP \fBPAPI_list_threads\fP .PP \fBPAPI_thread_id\fP .PP \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_all_thr_spec_t.30000600003276200002170000000064612247131120017431 0ustar ralphundrgrad.TH "PAPI_all_thr_spec_t" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_all_thr_spec_t \- .SH SYNOPSIS .br .PP .SS "Data Fields" .in +1c .ti -1c .RI "int \fBnum\fP" .br .ti -1c .RI "PAPI_thread_id_t * \fBid\fP" .br .ti -1c .RI "void ** \fBdata\fP" .br .in -1c .SH "Detailed Description" .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPIF_thread_init.30000600003276200002170000000100212247131120017032 0ustar ralphundrgrad.TH "PAPIF_thread_init" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPIF_thread_init \- .PP Initialize thread support in the PAPI library\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP \fBFortran Interface:\fP .RS 4 #include 'fpapi\&.h' .br \fBPAPIF_thread_init( C_INT FUNCTION handle, C_INT check )\fP .RE .PP \fBSee Also:\fP .RS 4 \fBPAPI_thread_init\fP .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_add_event.30000600003276200002170000000506512247131120016400 0ustar ralphundrgrad.TH "PAPI_add_event" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_add_event \- .PP add PAPI preset or native hardware event to an event set .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP .PP .nf @par C Interface: \#include @n int PAPI_add_event( int EventSet, int EventCode ); PAPI_add_event adds one event to a PAPI Event Set. @n A hardware event can be either a PAPI preset or a native hardware event code. For a list of PAPI preset events, see PAPI_presets or run the avail test case in the PAPI distribution. PAPI presets can be passed to PAPI_query_event to see if they exist on the underlying architecture. For a list of native events available on current platform, run the papi_native_avail utility in the PAPI distribution. For the encoding of native events, see PAPI_event_name_to_code to learn how to generate native code for the supported native event on the underlying architecture. @param EventSet An integer handle for a PAPI Event Set as created by PAPI_create_eventset. @param EventCode A defined event such as PAPI_TOT_INS. @retval Positive-Integer The number of consecutive elements that succeeded before the error. @retval PAPI_EINVAL One or more of the arguments is invalid. @retval PAPI_ENOMEM Insufficient memory to complete the operation. @retval PAPI_ENOEVST The event set specified does not exist. @retval PAPI_EISRUN The event set is currently counting events. @retval PAPI_ECNFLCT The underlying counter hardware can not count this event and other events in the event set simultaneously. @retval PAPI_ENOEVNT The PAPI preset is not available on the underlying hardware. @retval PAPI_EBUG Internal error, please send mail to the developers. @par Examples: .fi .PP .PP .nf * int EventSet = PAPI_NULL; * unsigned int native = 0x0; * if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) * handle_error( 1 ); * // Add Total Instructions Executed to our EventSet * if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) * handle_error( 1 ); * // Add native event PM_CYC to EventSet * if ( PAPI_event_name_to_code( "PM_CYC", &native ) != PAPI_OK ) * handle_error( 1 ); * if ( PAPI_add_event( EventSet, native ) != PAPI_OK ) * handle_error( 1 ); * .fi .PP .PP .PP .nf @see PAPI_cleanup_eventset @n PAPI_destroy_eventset @n PAPI_event_code_to_name @n PAPI_remove_events @n PAPI_query_event @n PAPI_presets @n PAPI_native @n PAPI_remove_event.fi .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man3/PAPI_num_cmp_hwctrs.30000600003276200002170000000411212247131120017467 0ustar ralphundrgrad.TH "PAPI_num_cmp_hwctrs" 3 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME PAPI_num_cmp_hwctrs \- .PP Return the number of hardware counters for the specified component\&. .SH SYNOPSIS .br .PP .SH "Detailed Description" .PP \fBPAPI_num_cmp_hwctrs()\fP returns the number of counters present in the specified component\&. By convention, component 0 is always the cpu\&. .PP On some components, especially for CPUs, the value returned is a theoretical maximum for estimation purposes only\&. It might not be possible to easily create an EventSet that contains the full number of events\&. This can be due to a variety of reasons: 1)\&. Some CPUs (especially Intel and POWER) have the notion of fixed counters that can only measure one thing, usually cycles\&. 2)\&. Some CPUs have very explicit rules about which event can run in which counter\&. In this case it might not be possible to add a wanted event even if counters are free\&. 3)\&. Some CPUs halve the number of counters available when running with SMT (multiple CPU threads) enabled\&. 4)\&. Some operating systems 'steal' a counter to use for things such as NMI Watchdog timers\&. The only sure way to see if events will fit is to attempt adding events to an EventSet, and doing something sensible if an error is generated\&. .PP \fBPAPI_library_init()\fP must be called in order for this function to return anything greater than 0\&. .PP \fBC Interface:\fP .RS 4 #include <\fBpapi\&.h\fP> .br int \fBPAPI_num_cmp_hwctrs(int cidx )\fP; .RE .PP \fBParameters:\fP .RS 4 \fIcidx\fP -- An integer identifier for a component\&. By convention, component 0 is always the cpu component\&. .RE .PP \fBExample\fP .RS 4 .PP .nf * // Query the cpu component for the number of counters\&. * printf(\"%d hardware counters found\&.\\n\", PAPI_num_cmp_hwctrs(0)); * .fi .PP .RE .PP \fBReturns:\fP .RS 4 On success, this function returns a value greater than zero\&. .br A zero result usually means the library has not been initialized\&. .RE .PP .SH "Author" .PP Generated automatically by Doxygen for PAPI from the source code\&. papi-5.3.0/man/man1/0000700003276200002170000000000012247131117013546 5ustar ralphundrgradpapi-5.3.0/man/man1/papi_version.10000600003276200002170000000105312247131117016327 0ustar ralphundrgrad.TH "papi_version" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_version \- papi_version utility\&. .PP file version\&.c .SH "Name" .PP papi_version - provides version information for papi\&. .SH "Synopsis" .PP papi_version .SH "Description" .PP papi_version is a PAPI utility program that reports version information about the current PAPI installation\&. .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_native_avail.10000600003276200002170000000355012247131117017310 0ustar ralphundrgrad.TH "papi_native_avail" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_native_avail \- papi_native_avail utility\&. .PP file native_avail\&.c .SH "NAME" .PP papi_native_avail - provides detailed information for PAPI native events\&. .SH "Synopsis" .PP .SH "Description" .PP papi_native_avail is a PAPI utility program that reports information about the native events available on the current platform\&. A native event is an event specific to a specific hardware platform\&. On many platforms, a specific native event may have a number of optional settings\&. In such cases, the native event and the valid settings are presented, rather than every possible combination of those settings\&. For each native event, a name, a description, and specific bit patterns are provided\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 --help, -h print this help message .IP "\(bu" 2 -d display detailed information about native events .IP "\(bu" 2 -e EVENTNAME display detailed information about named native event .IP "\(bu" 2 -i EVENTSTR include only event names that contain EVENTSTR .IP "\(bu" 2 -x EVENTSTR exclude any event names that contain EVENTSTR .IP "\(bu" 2 --noumasks suppress display of Unit Mask information .PP .PP Processor-specific options .PD 0 .IP "\(bu" 2 --darr display events supporting Data Address Range Restriction .IP "\(bu" 2 --dear display Data Event Address Register events only .IP "\(bu" 2 --iarr display events supporting Instruction Address Range Restriction .IP "\(bu" 2 --iear display Instruction Event Address Register events only .IP "\(bu" 2 --opcm display events supporting OpCode Matching .IP "\(bu" 2 --nogroups suppress display of Event grouping information .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_error_codes.10000600003276200002170000000147712247131117017162 0ustar ralphundrgrad.TH "papi_error_codes" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_error_codes \- papi_error_codes utility\&. .PP file error_codes\&.c .SH "NAME" .PP papi_error_codes - lists all currently defined PAPI error codes\&. .SH "Synopsis" .PP papi_error_codes .SH "Description" .PP papi_error_codes is a PAPI utility program that displays all defined error codes from papi\&.h and their error strings from papi_data\&.h\&. If an error string is not defined, a warning is generated\&. This can help trap newly defined error codes for which error strings are not yet defined\&. .SH "Options" .PP This utility has no command line options\&. .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_avail.10000600003276200002170000000223712247131117015743 0ustar ralphundrgrad.TH "papi_avail" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_avail \- papi_avail utility\&. .PP file avail\&.c .SH "Name" .PP papi_avail - provides availability and detail information for PAPI preset events\&. .SH "Synopsis" .PP papi_avail [-adht] [-e event] .SH "Description" .PP papi_avail is a PAPI utility program that reports information about the current PAPI installation and supported preset events\&. Using the -e option, it will also display information about specific native events\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -a Display only the available PAPI preset events\&. .IP "\(bu" 2 -d Display PAPI preset event information in a more detailed format\&. .IP "\(bu" 2 -h Display help information about this utility\&. .IP "\(bu" 2 -t Display the PAPI preset event information in a tabular format\&. This is the default\&. .IP "\(bu" 2 -e < event > Display detailed event information for the named event\&. This event can be either a preset or a native event\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_component_avail.10000600003276200002170000000132012247131117020015 0ustar ralphundrgrad.TH "papi_component_avail" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_component_avail \- papi_component_avail utility\&. .PP file component\&.c .SH "NAME" .PP papi_native_avail - provides detailed information for PAPI native events\&. .SH "Synopsis" .PP .SH "Description" .PP papi_component_avail is a PAPI utility program that reports information about the components papi was built with\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -h help message .IP "\(bu" 2 -d provide detailed information about each component\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_mem_info.10000600003276200002170000000141312247131117016433 0ustar ralphundrgrad.TH "papi_mem_info" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_mem_info \- papi_mem_info utility\&. .PP file mem_info\&.c .SH "NAME" .PP papi_mem_info - provides information on the memory architecture of the current processor\&. .SH "Synopsis" .PP .SH "Description" .PP papi_mem_info is a PAPI utility program that reports information about the cache memory architecture of the current processor, including number, types, sizes and associativities of instruction and data caches and Translation Lookaside Buffers\&. .SH "Options" .PP This utility has no command line options\&. .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_xml_event_info.10000600003276200002170000000204012247131117017653 0ustar ralphundrgrad.TH "papi_xml_event_info" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_xml_event_info \- papi_xml_event_info utility\&. .PP file event_info\&.c .SH "NAME" .PP papi_xml_event_info - provides detailed information for PAPI events in XML format .SH "Synopsis" .PP .SH "Description" .PP papi_native_avail is a PAPI utility program that reports information about the events available on the current platform in an XML format\&. .PP It will attempt to create an EventSet with each event in it, which can be slow\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -h print help message .IP "\(bu" 2 -p print only preset events .IP "\(bu" 2 -n print only native events .IP "\(bu" 2 -c COMPONENT print only events from component number COMPONENT event1, event2, \&.\&.\&. Print only events that can be created in the same event set with the events event1, event2, etc\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_decode.10000600003276200002170000000317612247131117016075 0ustar ralphundrgrad.TH "papi_decode" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_decode \- papi_decode utility\&. .PP file decode\&.c .SH "NAME" .PP papi_decode - provides availability and detail information for PAPI preset events\&. .SH "Synopsis" .PP papi_decode [-ah] .SH "Description" .PP papi_decode is a PAPI utility program that converts the PAPI presets for the existing library into a comma separated value format that can then be viewed or modified in spreadsheet applications or text editors, and can be supplied to PAPI_encode_events (3) as a way of adding or modifying event definitions for specialized applications\&. The format for the csv output consists of a line of field names, followed by a blank line, followed by one line of comma separated values for each event contained in the preset table\&. A portion of this output (for Pentium 4) is shown below: .PP .nf * name,derived,postfix,short_descr,long_descr,note,[native,\&.\&.\&.] * PAPI_L1_ICM,NOT_DERIVED,,"L1I cache misses","Level 1 instruction cache misses",,BPU_fetch_request_TCMISS * PAPI_L2_TCM,NOT_DERIVED,,"L2 cache misses","Level 2 cache misses",,BSQ_cache_reference_RD_2ndL_MISS_WR_2ndL_MISS * PAPI_TLB_DM,NOT_DERIVED,,"Data TLB misses","Data translation lookaside buffer misses",,page_walk_type_DTMISS * .fi .PP .SH "Options" .PP .PD 0 .IP "\(bu" 2 -a Convert only the available PAPI preset events\&. .IP "\(bu" 2 -h Display help information about this utility\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_event_chooser.10000600003276200002170000000137612247131117017515 0ustar ralphundrgrad.TH "papi_event_chooser" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_event_chooser \- papi_event_chooser utility\&. .PP file event_chooser\&.c .SH "NAME" .PP papi_event_chooser - given a list of named events, lists other events that can be counted with them\&. .SH "Synopsis" .PP papi_event_chooser NATIVE | PRESET < event > < event > \&.\&.\&. .SH "Description" .PP papi_event_chooser is a PAPI utility program that reports information about the current PAPI installation and supported preset events\&. .SH "Options" .PP This utility has no command line options\&. .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_hybrid_native_avail.10000600003276200002170000000410212247131117020643 0ustar ralphundrgrad.TH "papi_hybrid_native_avail" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_hybrid_native_avail \- papi_hybrid_native_avail utility\&. .PP file hybrid_native_avail\&.c .SH "NAME" .PP papi_hybrid_native_avail - provides detailed information for PAPI native events\&. .SH "Synopsis" .PP .SH "Description" .PP papi_hybrid_native_avail is a PAPI utility program that reports information about the native events available on the current platform or on an attached MIC card\&. A native event is an event specific to a specific hardware platform\&. On many platforms, a specific native event may have a number of optional settings\&. In such cases, the native event and the valid settings are presented, rather than every possible combination of those settings\&. For each native event, a name, a description, and specific bit patterns are provided\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 --help, -h print this help message .IP "\(bu" 2 -d display detailed information about native events .IP "\(bu" 2 -e EVENTNAME display detailed information about named native event .IP "\(bu" 2 -i EVENTSTR include only event names that contain EVENTSTR .IP "\(bu" 2 -x EVENTSTR exclude any event names that contain EVENTSTR .IP "\(bu" 2 --noumasks suppress display of Unit Mask information .IP "\(bu" 2 --mic < index > report events on the specified target MIC device .PP .PP Processor-specific options .PD 0 .IP "\(bu" 2 --darr display events supporting Data Address Range Restriction .IP "\(bu" 2 --dear display Data Event Address Register events only .IP "\(bu" 2 --iarr display events supporting Instruction Address Range Restriction .IP "\(bu" 2 --iear display Instruction Event Address Register events only .IP "\(bu" 2 --opcm display events supporting OpCode Matching .IP "\(bu" 2 --nogroups suppress display of Event grouping information .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. .PP Modified by Gabriel Marin gmarin@icl.utk.edu to use offloading\&. papi-5.3.0/man/man1/papi_command_line.10000600003276200002170000000162112247131117017270 0ustar ralphundrgrad.TH "papi_command_line" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_command_line \- executes PAPI preset or native events from the command line\&. .SH "Synopsis" .PP papi_command_line < event > < event > \&.\&.\&. .SH "Description" .PP papi_command_line is a PAPI utility program that adds named events from the command line to a PAPI EventSet and does some work with that EventSet\&. This serves as a handy way to see if events can be counted together, and if they give reasonable results for known work\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -u Display output values as unsigned integers .IP "\(bu" 2 -x Display output values as hexadecimal .IP "\(bu" 2 -h Display help information about this utility\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_clockres.10000600003276200002170000000135312247131117016452 0ustar ralphundrgrad.TH "papi_clockres" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_clockres \- The papi_clockres utility\&. .PP file clockres\&.c .SH "Name" .PP papi_clockres - measures and reports clock latency and resolution for PAPI timers\&. .SH "Synopsis" .PP .SH "Description" .PP papi_clockres is a PAPI utility program that measures and reports the latency and resolution of the four PAPI timer functions: PAPI_get_real_cyc(), PAPI_get_virt_cyc(), PAPI_get_real_usec() and PAPI_get_virt_usec()\&. .SH "Options" .PP This utility has no command line options\&. .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_multiplex_cost.10000600003276200002170000000232412247131117017717 0ustar ralphundrgrad.TH "papi_multiplex_cost" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_multiplex_cost \- papi_multiplex_cost utility\&. .PP file multiplex\&.c .SH "NAME" .PP papi_multiplex_cost - computes execution time costs for basic PAPI operations on multiplexed EventSets\&. .SH "Synopsis" .PP papi_cost [-m, --min < min >] [-x, --max < max >] [-k,-s] .SH "Description" .PP papi_multiplex_cost is a PAPI utility program that computes the min / max / mean / std\&. deviation of execution times for PAPI start/stop pairs and for PAPI reads on multiplexed eventsets\&. This information provides the basic operating cost to a user's program for collecting hardware counter data\&. Command line options control display capabilities\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -m < Min number of events to test > .IP "\(bu" 2 -x < Max number of events to test > .IP "\(bu" 2 -k, Do not time kernel multiplexing .IP "\(bu" 2 -s, Do not ime software multiplexed EventSets .IP "\(bu" 2 -t THREASHOLD, Test with THRESHOLD iterations of counting loop\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/man/man1/papi_cost.10000600003276200002170000000247012247131117015616 0ustar ralphundrgrad.TH "papi_cost" 1 "Mon Nov 18 2013" "Version 5.3.0.0" "PAPI" \" -*- nroff -*- .ad l .nh .SH NAME papi_cost \- papi_cost utility\&. .PP file cost\&.c .SH "NAME" .PP papi_cost - computes execution time costs for basic PAPI operations\&. .SH "Synopsis" .PP papi_cost [-dhs] [-b bins] [-t threshold] .SH "Description" .PP papi_cost is a PAPI utility program that computes the min / max / mean / std\&. deviation of execution times for PAPI start/stop pairs and for PAPI reads\&. This information provides the basic operating cost to a user's program for collecting hardware counter data\&. Command line options control display capabilities\&. .SH "Options" .PP .PD 0 .IP "\(bu" 2 -b < bins > Define the number of bins into which the results are partitioned for display\&. The default is 100\&. .IP "\(bu" 2 -d Display a graphical distribution of costs in a vertical histogram\&. .IP "\(bu" 2 -h Display help information about this utility\&. .IP "\(bu" 2 -s Show the number of iterations in each of the first 10 standard deviations above the mean\&. .IP "\(bu" 2 -t < threshold > Set the threshold for the number of iterations to measure costs\&. The default is 100,000\&. .PP .SH "Bugs" .PP There are no known bugs in this utility\&. If you find a bug, it should be reported to the PAPI Mailing List at ptools-perfapi@ptools.org\&. papi-5.3.0/papi.spec0000600003276200002170000000536612247131120013752 0ustar ralphundrgradSummary: Performance Application Programming Interface Name: papi Version: 5.3.0.0 Release: 1%{?dist} License: BSD Group: Development/System URL: http://icl.cs.utk.edu/papi/ Source0: http://icl.cs.utk.edu/projects/papi/downloads/%{name}-%{version}.tar.gz BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root BuildRequires: ncurses-devel BuildRequires: gcc-gfortran BuildRequires: kernel-headers >= 2.6.32 BuildRequires: chrpath #Right now libpfm does not know anything about s390 and will fail ExcludeArch: s390 s390x %description PAPI provides a programmer interface to monitor the performance of running programs. %package devel Summary: Header files for the compiling programs with PAPI Group: Development/System Requires: papi = %{version}-%{release} %description devel PAPI-devel includes the C header files that specify the PAPI userspace libraries and interfaces. This is required for rebuilding any program that uses PAPI. %prep %setup -q %build cd src %configure --with-static-lib=no --with-shared-lib=yes --with-shlib #DBG workaround to make sure libpfm just uses the normal CFLAGS DBG="" make #%check #cd src #make fulltest %install rm -rf $RPM_BUILD_ROOT cd src make DESTDIR=$RPM_BUILD_ROOT install chrpath --delete $RPM_BUILD_ROOT%{_libdir}/*.so* # Remove the static libraries. Static libraries are undesirable: # https://fedoraproject.org/wiki/Packaging/Guidelines#Packaging_Static_Libraries rm -rf $RPM_BUILD_ROOT%{_libdir}/*.a %post -p /sbin/ldconfig %postun -p /sbin/ldconfig %clean rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root,-) %{_bindir}/* %{_libdir}/*.so.* /usr/share/papi %doc INSTALL.txt README LICENSE.txt RELEASENOTES.txt %files devel %defattr(-,root,root,-) %{_includedir}/*.h %{_includedir}/perfmon %{_libdir}/*.so %doc %{_mandir}/man3/* %doc %{_mandir}/man1/* %changelog * Tue Jan 31 2012 Dan Terpstra - 4.2.1 - Rebase to papi-4.2.1 * Wed Dec 8 2010 Dan Terpstra - 4.1.2-1 - Rebase to papi-4.1.2 * Mon Jun 8 2010 William Cohen - 4.1.0-1 - Rebase to papi-4.1.0 * Mon May 17 2010 William Cohen - 4.0.0-5 - Test run with upstream cvs version. * Wed Feb 10 2010 William Cohen - 4.0.0-4 - Resolves: rhbz562935 Rebase to papi-4.0.0 (correct ExcludeArch). * Wed Feb 10 2010 William Cohen - 4.0.0-3 - Resolves: rhbz562935 Rebase to papi-4.0.0 (bump nvr). * Wed Feb 10 2010 William Cohen - 4.0.0-2 - correct the ctests/shlib test - have PAPI_set_multiplex() return proper value - properly handle event unit masks - correct PAPI_name_to_code() to match events - Resolves: rhbz562935 Rebase to papi-4.0.0 * Wed Jan 13 2010 William Cohen - 4.0.0-1 - Generate papi.spec file for papi-4.0.0. papi-5.3.0/ChangeLogP4121.txt0000600003276200002170000000017312247131117015162 0ustar ralphundrgrad2011-01-20 * src/papi_events.csv: Remove HW_INT:RCV event that was mistakenly enabled for Westmere papi-5.3.0/src/0000700003276200002170000000000012247131324012726 5ustar ralphundrgradpapi-5.3.0/src/papivi.h0000600003276200002170000007241612247131124014401 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ /* * File: papivi.h * CVS: $Id$ * Author: dan terpstra * terpstra@cs.utk.edu * Mods: your name here * yourname@cs.esu.edu * * Include this file INSTEAD OF "papi.h" in your application code * to provide semitransparent version independent PAPI support. * Follow the rules described below and elsewhere to facilitate * this support. * */ #ifndef _PAPIVI #define _PAPIVI #include "papi.h" /*************************************************************************** * If PAPI_VERSION is not defined, then papi.h is for PAPI 2. * The preprocessor block below contains the definitions, data structures, * macros and code needed to emulate much of the PAPI 3 interface in code * linking to the PAPI 2 library. ****************************************************************************/ #ifndef PAPI_VERSION #define PAPI_VERSION_NUMBER(maj,min,rev) (((maj)<<16) | ((min)<<8) | (rev)) #define PAPI_VERSION_MAJOR(x) (((x)>>16) & 0xffff) #define PAPI_VERSION_MINOR(x) (((x)>>8) & 0xff) #define PAPI_VERSION_REVISION(x) ((x) & 0xff) /* This is the PAPI version on which we are running */ #define PAPI_VERSION PAPI_VERSION_NUMBER(2,3,4) /* This is the PAPI 3 version with which we are compatible */ #define PAPI_VI_VERSION PAPI_VERSION_NUMBER(3,0,6) /* PAPI 3 has an error code not defined for PAPI 2 */ #define PAPI_EPERM PAPI_EMISC /* You lack the necessary permissions */ /* * These are defined in papi_internal.h for PAPI 2. * They need to be exposed for version independent PAPI code to work. */ //#define PRESET_MASK 0x80000000 #define PAPI_PRESET_MASK 0x80000000 //#define PRESET_AND_MASK 0x7FFFFFFF #define PAPI_PRESET_AND_MASK 0x7FFFFFFF #define PAPI_NATIVE_MASK 0x40000000 #define PAPI_NATIVE_AND_MASK 0x3FFFFFFF /* * Some PAPI 3 definitions for PAPI_{set,get}_opt() map * onto single definitions in PAPI 2. The new definitions * (shown below) should be used to guarantee PAPI 3 compatibility. */ #define PAPI_CLOCKRATE PAPI_GET_CLOCKRATE #define PAPI_MAX_HWCTRS PAPI_GET_MAX_HWCTRS #define PAPI_HWINFO PAPI_GET_HWINFO #define PAPI_EXEINFO PAPI_GET_EXEINFO #define PAPI_MAX_CPUS PAPI_GET_MAX_CPUS #define PAPI_CPUS PAPI_GET_CPUS #define PAPI_THREADS PAPI_GET_THREADS /* * PAPI 2 defined only one string length. * PAPI 3 defines three. This insures limited compatibility. */ #define PAPI_MIN_STR_LEN PAPI_MAX_STR_LEN #define PAPI_HUGE_STR_LEN PAPI_MAX_STR_LEN /* * PAPI 2 always profiles into 16-bit buckets. * PAPI 3 supports multiple bucket sizes. * Exercise caution if these defines appear in your code. * There is a potential for data overflow in PAPI 2. */ #define PAPI_PROFIL_BUCKET_16 0 #define PAPI_PROFIL_BUCKET_32 0 #define PAPI_PROFIL_BUCKET_64 0 /* * PAPI 3 defines a new eventcode that can often be emulated * successfully on PAPI 2. PAPI 3 also deprecates two eventcodes * found in PAPI 2: * PAPI_IPS (instructions per second) * PAPI_FLOPS (floating point instructions per second) * Don't use these eventcodes in version independent code */ #define PAPI_FP_OPS PAPI_FP_INS /* * Two new data structures are introduced in PAPI 3 that are * required to support the functionality of: * PAPI_get_event_info() and * PAPI_get_executable_info() * These structures are reproduced below. * They MUST stay synchronized with their counterparts in papi.h */ #define PAPI_MAX_INFO_TERMS 8 typedef struct event_info { unsigned int event_code; unsigned int count; char symbol[PAPI_MAX_STR_LEN + 3]; char short_descr[PAPI_MIN_STR_LEN]; char long_descr[PAPI_HUGE_STR_LEN]; char derived[PAPI_MIN_STR_LEN]; char postfix[PAPI_MIN_STR_LEN]; unsigned int code[PAPI_MAX_INFO_TERMS]; char name[PAPI_MAX_INFO_TERMS] [PAPI_MIN_STR_LEN]; char note[PAPI_HUGE_STR_LEN]; } PAPI_event_info_t; /* Possible values for the 'modifier' parameter of the PAPI_enum_event call. This enumeration is new in PAPI 3. It will act as a nop in PAPI 2, but must be defined for code compatibility. */ enum { PAPI_ENUM_ALL = 0, /* Always enumerate all events */ PAPI_PRESET_ENUM_AVAIL, /* Enumerate events that exist here */ /* PAPI PRESET section */ PAPI_PRESET_ENUM_INS, /* Instruction related preset events */ PAPI_PRESET_ENUM_BR, /* branch related preset events */ PAPI_PRESET_ENUM_MEM, /* memory related preset events */ PAPI_PRESET_ENUM_TLB, /* Translation Lookaside Buffer events */ PAPI_PRESET_ENUM_FP, /* Floating Point related preset events */ /* Pentium 4 specific section */ PAPI_PENT4_ENUM_GROUPS = 0x100, /* 45 groups + custom + user */ PAPI_PENT4_ENUM_COMBOS, /* all combinations of mask bits for given group */ PAPI_PENT4_ENUM_BITS, /* all individual bits for given group */ /* POWER 4 specific section */ PAPI_PWR4_ENUM_GROUPS = 0x200 /* Enumerate groups an event belongs to */ }; typedef struct _papi_address_map { char mapname[PAPI_HUGE_STR_LEN]; caddr_t text_start; /* Start address of program text segment */ caddr_t text_end; /* End address of program text segment */ caddr_t data_start; /* Start address of program data segment */ caddr_t data_end; /* End address of program data segment */ caddr_t bss_start; /* Start address of program bss segment */ caddr_t bss_end; /* End address of program bss segment */ } PAPI_address_map_t; /* * PAPI 3 beta 3 introduces new structures for static memory description. * These include structures for tlb and cache description, a structure * to describe a level in the memory hierarchy, and a structure * to describe all levels of the hierarchy. * These structures, and the requisite data types are defined below. */ /* All sizes are in BYTES */ /* Except tlb size, which is in entries */ #define PAPI_MAX_MEM_HIERARCHY_LEVELS 3 #define PAPI_MH_TYPE_EMPTY 0x0 #define PAPI_MH_TYPE_INST 0x1 #define PAPI_MH_TYPE_DATA 0x2 #define PAPI_MH_TYPE_UNIFIED PAPI_MH_TYPE_INST|PAPI_MH_TYPE_DATA typedef struct _papi_mh_tlb_info { int type; /* Empty, unified, data, instr */ int num_entries; int associativity; } PAPI_mh_tlb_info_t; typedef struct _papi_mh_cache_info { int type; /* Empty, unified, data, instr */ int size; int line_size; int num_lines; int associativity; } PAPI_mh_cache_info_t; typedef struct _papi_mh_level_info { PAPI_mh_tlb_info_t tlb[2]; PAPI_mh_cache_info_t cache[2]; } PAPI_mh_level_t; typedef struct _papi_mh_info { /* mh for mem hierarchy maybe? */ int levels; PAPI_mh_level_t level[PAPI_MAX_MEM_HIERARCHY_LEVELS]; } PAPI_mh_info_t; /* * Three data structures are modified in PAPI 3 * These modifications are * required to support the functionality of: * PAPI_get_hardware_info() and * PAPI_get_executable_info() * These structures are reproduced below. * They MUST stay synchronized with their counterparts in papi.h * To avoid namespace collisions, these structures have been renamed * to PAPIvi_xxx, and must also be renamed in your code. */ typedef struct _papi3_hw_info { int ncpu; /* Number of CPUs in an SMP Node */ int nnodes; /* Number of Nodes in the entire system */ int totalcpus; /* Total number of CPUs in the entire system */ int vendor; /* Vendor number of CPU */ char vendor_string[PAPI_MAX_STR_LEN]; /* Vendor string of CPU */ int model; /* Model number of CPU */ char model_string[PAPI_MAX_STR_LEN]; /* Model string of CPU */ float revision; /* Revision of CPU */ float mhz; /* Cycle time of this CPU, *may* be estimated at init time with a quick timing routine */ PAPI_mh_info_t mem_hierarchy; } PAPIvi_hw_info_t; typedef struct _papi3_preload_option { char lib_preload_env[PAPI_MAX_STR_LEN]; /* Model string of CPU */ char lib_preload_sep; char lib_dir_env[PAPI_MAX_STR_LEN]; char lib_dir_sep; } PAPIvi_preload_option_t; typedef struct _papi3_program_info { char fullname[PAPI_MAX_STR_LEN]; /* path+name */ char name[PAPI_MAX_STR_LEN]; /* name */ PAPI_address_map_t address_info; PAPIvi_preload_option_t preload_info; } PAPIvi_exe_info_t; /* * The Low Level API * Functions in this API are classified in 4 basic categories: * Modified: 13 functions * New: 8 functions * Unchanged: 32 functions * Deprecated: 9 functions * * Each of these categories is discussed further below. */ /* * Modified functions are further divided into 4 subcategories: * Dereferencing changes: 6 functions * These functions simply substitute an EventSet value for * a pointer to an EventSet. In the case of PAPI_remove_event{s}() * there is also a name change. * Name changes: 1 function * This is a simple name change with no change in functionality. * Parameter changes: 4 functions * Several functions have changed functionality reflected in changed * parameters: * PAPI_{un}lock() supports multiple locks in PAPI 3 * PAPI_profil() supports multiple bucket sizes in PAPI 3 * PAPI_thread_init() removes an unused parameter in PAPI 3 * New functionality: 2 functions * These functions support new data in revised data structures * The code implemented here maps the old structures to the new * where possible. */ /* Modified Functons: Dereferencing changes */ #define PAPIvi_add_event(EventSet, Event) \ PAPI_add_event(&EventSet, Event) #define PAPIvi_add_events(EventSet, Events, number) \ PAPI_add_events(&EventSet, Events, number) #define PAPIvi_cleanup_eventset(EventSet) \ PAPI_cleanup_eventset(&EventSet) #define PAPIvi_remove_event(EventSet, EventCode) \ PAPI_rem_event(&EventSet, EventCode) #define PAPIvi_remove_events(EventSet, Events, number) \ PAPI_rem_events(&EventSet, Events, number) #define PAPIvi_set_multiplex(EventSet) \ PAPI_set_multiplex(&EventSet) /* Modified Functons: Name changes */ #define PAPIvi_is_initialized \ PAPI_initialized /* Modified Functons: Parameter changes */ #define PAPIvi_lock(lck) \ PAPI_lock() #define PAPIvi_profil(buf, bufsiz, offset, scale, EventSet, EventCode, threshold, flags) \ PAPI_profil((unsigned short *)buf, bufsiz, (unsigned long)offset, scale, EventSet, EventCode, threshold, flags) #define PAPIvi_thread_init(id_fn) \ PAPI_thread_init(id_fn, 0) #define PAPIvi_unlock(lck) \ PAPI_unlock() /* Modified Functons: New functionality */ static const PAPIvi_exe_info_t * PAPIvi_get_executable_info( void ) { static PAPIvi_exe_info_t prginfo3; const PAPI_exe_info_t *prginfo2 = PAPI_get_executable_info( ); if ( prginfo2 == NULL ) return ( NULL ); strcpy( prginfo3.fullname, prginfo2->fullname ); strcpy( prginfo3.name, prginfo2->name ); prginfo3.address_info.mapname[0] = 0; prginfo3.address_info.text_start = prginfo2->text_start; prginfo3.address_info.text_end = prginfo2->text_end; prginfo3.address_info.data_start = prginfo2->data_start; prginfo3.address_info.data_end = prginfo2->data_end; prginfo3.address_info.bss_start = prginfo2->bss_start; prginfo3.address_info.bss_end = prginfo2->bss_end; strcpy( prginfo3.preload_info.lib_preload_env, prginfo2->lib_preload_env ); return ( &prginfo3 ); } static const PAPIvi_hw_info_t * PAPIvi_get_hardware_info( void ) { static PAPIvi_hw_info_t papi3_hw_info; const PAPI_hw_info_t *papi2_hw_info = PAPI_get_hardware_info( ); const PAPI_mem_info_t *papi2_mem_info = PAPI_get_memory_info( ); /* Copy the basic hardware info (same in both structures */ memcpy( &papi3_hw_info, papi2_hw_info, sizeof ( PAPI_hw_info_t ) ); memset( &papi3_hw_info.mem_hierarchy, 0, sizeof ( PAPI_mh_info_t ) ); /* check for a unified tlb */ if ( papi2_mem_info->total_tlb_size && papi2_mem_info->itlb_size == 0 && papi2_mem_info->dtlb_size == 0 ) { papi3_hw_info.mem_hierarchy.level[0].tlb[0].type = PAPI_MH_TYPE_UNIFIED; papi3_hw_info.mem_hierarchy.level[0].tlb[0].num_entries = papi2_mem_info->total_tlb_size; } else { if ( papi2_mem_info->itlb_size ) { papi3_hw_info.mem_hierarchy.level[0].tlb[0].type = PAPI_MH_TYPE_INST; papi3_hw_info.mem_hierarchy.level[0].tlb[0].num_entries = papi2_mem_info->itlb_size; papi3_hw_info.mem_hierarchy.level[0].tlb[0].associativity = papi2_mem_info->itlb_assoc; } if ( papi2_mem_info->dtlb_size ) { papi3_hw_info.mem_hierarchy.level[0].tlb[1].type = PAPI_MH_TYPE_DATA; papi3_hw_info.mem_hierarchy.level[0].tlb[1].num_entries = papi2_mem_info->dtlb_size; papi3_hw_info.mem_hierarchy.level[0].tlb[1].associativity = papi2_mem_info->dtlb_assoc; } } /* check for a unified level 1 cache */ if ( papi2_mem_info->total_L1_size ) papi3_hw_info.mem_hierarchy.levels = 1; if ( papi2_mem_info->total_L1_size && papi2_mem_info->L1_icache_size == 0 && papi2_mem_info->L1_dcache_size == 0 ) { papi3_hw_info.mem_hierarchy.level[0].cache[0].type = PAPI_MH_TYPE_UNIFIED; papi3_hw_info.mem_hierarchy.level[0].cache[0].size = papi2_mem_info->total_L1_size << 10; } else { if ( papi2_mem_info->L1_icache_size ) { papi3_hw_info.mem_hierarchy.level[0].cache[0].type = PAPI_MH_TYPE_INST; papi3_hw_info.mem_hierarchy.level[0].cache[0].size = papi2_mem_info->L1_icache_size << 10; papi3_hw_info.mem_hierarchy.level[0].cache[0].associativity = papi2_mem_info->L1_icache_assoc; papi3_hw_info.mem_hierarchy.level[0].cache[0].num_lines = papi2_mem_info->L1_icache_lines; papi3_hw_info.mem_hierarchy.level[0].cache[0].line_size = papi2_mem_info->L1_icache_linesize; } if ( papi2_mem_info->L1_dcache_size ) { papi3_hw_info.mem_hierarchy.level[0].cache[1].type = PAPI_MH_TYPE_DATA; papi3_hw_info.mem_hierarchy.level[0].cache[1].size = papi2_mem_info->L1_dcache_size << 10; papi3_hw_info.mem_hierarchy.level[0].cache[1].associativity = papi2_mem_info->L1_dcache_assoc; papi3_hw_info.mem_hierarchy.level[0].cache[1].num_lines = papi2_mem_info->L1_dcache_lines; papi3_hw_info.mem_hierarchy.level[0].cache[1].line_size = papi2_mem_info->L1_dcache_linesize; } } /* check for level 2 cache info */ if ( papi2_mem_info->L2_cache_size ) { papi3_hw_info.mem_hierarchy.levels = 2; papi3_hw_info.mem_hierarchy.level[1].cache[0].type = PAPI_MH_TYPE_UNIFIED; papi3_hw_info.mem_hierarchy.level[1].cache[0].size = papi2_mem_info->L2_cache_size << 10; papi3_hw_info.mem_hierarchy.level[1].cache[0].associativity = papi2_mem_info->L2_cache_assoc; papi3_hw_info.mem_hierarchy.level[1].cache[0].num_lines = papi2_mem_info->L2_cache_lines; papi3_hw_info.mem_hierarchy.level[1].cache[0].line_size = papi2_mem_info->L2_cache_linesize; } /* check for level 3 cache info */ if ( papi2_mem_info->L3_cache_size ) { papi3_hw_info.mem_hierarchy.levels = 3; papi3_hw_info.mem_hierarchy.level[2].cache[0].type = PAPI_MH_TYPE_UNIFIED; papi3_hw_info.mem_hierarchy.level[2].cache[0].size = papi2_mem_info->L3_cache_size << 10; papi3_hw_info.mem_hierarchy.level[2].cache[0].associativity = papi2_mem_info->L3_cache_assoc; papi3_hw_info.mem_hierarchy.level[2].cache[0].num_lines = papi2_mem_info->L3_cache_lines; papi3_hw_info.mem_hierarchy.level[2].cache[0].line_size = papi2_mem_info->L3_cache_linesize; } return ( &papi3_hw_info ); } /* * New functions are either supported or unsupported. * Of the three supported functions, two replaced deprecated functions * to describe events, and one is simply a convenience function. * The five unsupported new functions include three related to thread * functionality, a convenience function to return the number of events * in an event set, and a function to query information about shared libraries. */ /* New Supported Functions */ static int PAPIvi_enum_event( int *EventCode, int modifier ) { int i = *EventCode; const PAPI_preset_info_t *presets = PAPI_query_all_events_verbose( ); i &= PAPI_PRESET_AND_MASK; while ( ++i < PAPI_MAX_PRESET_EVENTS ) { if ( ( !modifier ) || ( presets[i].avail ) ) { *EventCode = i | PAPI_PRESET_MASK; if ( presets[i].event_name != NULL ) return ( PAPI_OK ); else return ( PAPI_ENOEVNT ); } } return ( PAPI_ENOEVNT ); } static int PAPIvi_get_event_info( int EventCode, PAPI_event_info_t * info ) { int i; const PAPI_preset_info_t *info2 = PAPI_query_all_events_verbose( ); i = EventCode & PAPI_PRESET_AND_MASK; if ( ( i >= PAPI_MAX_PRESET_EVENTS ) || ( info2[i].event_name == NULL ) ) return ( PAPI_ENOTPRESET ); info->event_code = info2[i].event_code; info->count = info2[i].avail; if ( info2[i].flags & PAPI_DERIVED ) { info->count++; strcpy( info->derived, "DERIVED" ); } if ( info2[i].event_name == NULL ) info->symbol[0] = 0; else strcpy( info->symbol, info2[i].event_name ); if ( info2[i].event_label == NULL ) info->short_descr[0] = 0; else strcpy( info->short_descr, info2[i].event_label ); if ( info2[i].event_descr == NULL ) info->long_descr[0] = 0; else strcpy( info->long_descr, info2[i].event_descr ); if ( info2[i].event_note == NULL ) info->note[0] = 0; else strcpy( info->note, info2[i].event_note ); return ( PAPI_OK ); } /* static int PAPI_get_multiplex(int EventSet) { PAPI_option_t popt; int retval; popt.multiplex.eventset = EventSet; retval = PAPI_get_opt(PAPI_GET_MULTIPLEX, &popt); if (retval < 0) retval = 0; return retval; } */ /* New Unsupported Functions */ #define PAPIvi_get_shared_lib_info \ PAPI_get_shared_lib_info #define PAPIvi_get_thr_specific(tag, ptr) \ PAPI_get_thr_specific(tag, ptr) #define PAPIvi_num_events(EventSet) \ PAPI_num_events(EventSet) #define PAPIvi_register_thread \ PAPI_register_thread #define PAPIvi_set_thr_specific(tag, ptr) \ PAPI_set_thr_specific(tag, ptr) /* * Over half of the functions in the Low Level API remain unchanged * These are included in the macro list in case they do change in future * revisions, and to simplify the naming conventions for writing * version independent PAPI code. */ #define PAPIvi_accum(EventSet, values) \ PAPI_accum(EventSet, values) #define PAPIvi_create_eventset(EventSet) \ PAPI_create_eventset(EventSet) #define PAPIvi_destroy_eventset(EventSet) \ PAPI_destroy_eventset(EventSet) #define PAPIvi_event_code_to_name(EventCode, out) \ PAPI_event_code_to_name(EventCode, out) #define PAPIvi_event_name_to_code(in, out) \ PAPI_event_name_to_code(in, out) #define PAPIvi_get_dmem_info(option) \ PAPI_get_dmem_info(option) #define PAPIvi_get_opt(option, ptr) \ PAPI_get_opt(option, ptr) #define PAPIvi_get_real_cyc \ PAPI_get_real_cyc #define PAPIvi_get_real_usec \ PAPI_get_real_usec #define PAPIvi_get_virt_cyc \ PAPI_get_virt_cyc #define PAPIvi_get_virt_usec \ PAPI_get_virt_usec #define PAPIvi_library_init(version) \ PAPI_library_init(version) #define PAPIvi_list_events(EventSet, Events, number) \ PAPI_list_events(EventSet, Events, number) #define PAPIvi_multiplex_init \ PAPI_multiplex_init #define PAPIvi_num_hwctrs \ PAPI_num_hwctrs #define PAPIvi_overflow(EventSet, EventCode, threshold, flags, handler) \ PAPI_overflow(EventSet, EventCode, threshold, flags, handler) #define PAPIvi_perror( s ) \ PAPI_perror( s ) #define PAPIvi_query_event(EventCode) \ PAPI_query_event(EventCode) #define PAPIvi_read(EventSet, values) \ PAPI_read(EventSet, values) #define PAPIvi_reset(EventSet) \ PAPI_reset(EventSet) #define PAPIvi_set_debug(level) \ PAPI_set_debug(level) #define PAPIvi_set_domain(domain) \ PAPI_set_domain(domain) #define PAPIvi_set_granularity(granularity) \ PAPI_set_granularity(granularity) #define PAPIvi_set_opt(option, ptr) \ PAPI_set_opt(option, ptr) #define PAPIvi_shutdown \ PAPI_shutdown #define PAPIvi_sprofil(prof, profcnt, EventSet, EventCode, threshold, flags) \ PAPI_sprofil(prof, profcnt, EventSet, EventCode, threshold, flags) #define PAPIvi_start(EventSet) \ PAPI_start(EventSet) #define PAPIvi_state(EventSet, status) \ PAPI_state(EventSet, status) #define PAPIvi_stop(EventSet, values) \ PAPI_stop(EventSet, values) #define PAPIvi_strerror(err) \ PAPI_strerror(err) #define PAPIvi_thread_id \ PAPI_thread_id #define PAPIvi_write(EventSet, values) \ PAPI_write(EventSet, values) /* * Of the nine functions deprecated from PAPI 2 to PAPI 3, * three (PAPI_add_pevent, PAPI_restore, and PAPI_save) were * never implemented, and four dealt with describing events. * Two remain: * PAPI_get_overflow_address() must still be used in version specific overflow handlers * PAPI_profil_hw() was rarely used, and only on platforms supporting hardware overflow. * The prototypes of these functions are shown below for completeness. */ /* int PAPI_add_pevent(int *EventSet, int code, void *inout); void *PAPI_get_overflow_address(void *context); int PAPI_profil_hw(unsigned short *buf, unsigned bufsiz, unsigned long offset, \ unsigned scale, int EventSet, int EventCode, int threshold, int flags); const PAPI_preset_info_t *PAPI_query_all_events_verbose(void); int PAPI_describe_event(char *name, int *EventCode, char *description); int PAPI_label_event(int EventCode, char *label); int PAPI_query_event_verbose(int EventCode, PAPI_preset_info_t *info); int PAPI_restore(void); int PAPI_save(void); */ /* * The High Level API * There are 8 functions in this API. * 6 are unchanged, and 2 are new. * Of the new functions, one is emulated and one is unsupported. */ /* Unchanged Functions */ #define PAPIvi_accum_counters(values, array_len) \ PAPI_accum_counters(values, array_len) #define PAPIvi_num_counters \ PAPI_num_counters #define PAPIvi_read_counters(values, array_len) \ PAPI_read_counters(values, array_len) #define PAPIvi_start_counters(Events, array_len) \ PAPI_start_counters(Events, array_len) #define PAPIvi_stop_counters(values, array_len) \ PAPI_stop_counters(values, array_len) #define PAPIvi_flops(rtime, ptime, flpops, mflops) \ PAPI_flops(rtime, ptime, flpops, mflops) /* New Supported Functions */ #define PAPIvi_flips(rtime, ptime, flpins, mflips) \ PAPI_flops(rtime, ptime, flpins, mflips) /* New Unupported Functions */ #define PAPIvi_ipc(rtime, ptime, ins, ipc) \ PAPI_ipc(rtime, ptime, ins, ipc) /******************************************************************************* * If PAPI_VERSION is defined, and the MAJOR version number is 3, * then papi.h is for PAPI 3. * The preprocessor block below contains definitions and macros needed to * allow version independent linking to the PAPI 3 library. * Other than a handful of definitions to support calls to PAPI_{get,set}_opt(), * this layer simply converts version independent names to PAPI 3 library calls. ********************************************************************************/ #elif (PAPI_VERSION_MAJOR(PAPI_VERSION) == 3) /* * The following option definitions reflect the fact that PAPI 2 had separate * definitions for options to PAPI_set_opt and PAPI_get_opt, while PAPI 3 has * only a single set for both. By using the older naming convention, you can * create platform independent code for these calls. */ #define PAPI_SET_DEBUG PAPI_DEBUG #define PAPI_GET_DEBUG PAPI_DEBUG #define PAPI_SET_MULTIPLEX PAPI_MULTIPLEX #define PAPI_GET_MULTIPLEX PAPI_MULTIPLEX #define PAPI_SET_DEFDOM PAPI_DEFDOM #define PAPI_GET_DEFDOM PAPI_DEFDOM #define PAPI_SET_DOMAIN PAPI_DOMAIN #define PAPI_GET_DOMAIN PAPI_DOMAIN #define PAPI_SET_DEFGRN PAPI_DEFGRN #define PAPI_GET_DEFGRN PAPI_DEFGRN #define PAPI_SET_GRANUL PAPI_GRANUL #define PAPI_GET_GRANUL PAPI_GRANUL #define PAPI_SET_INHERIT PAPI_INHERIT #define PAPI_GET_INHERIT PAPI_INHERIT #define PAPI_GET_NUMCTRS PAPI_NUMCTRS #define PAPI_SET_NUMCTRS PAPI_NUMCTRS #define PAPI_SET_PROFIL PAPI_PROFIL #define PAPI_GET_PROFIL PAPI_PROFIL /* * These macros are simple pass-throughs to PAPI 3 structures */ #define PAPIvi_hw_info_t PAPI_hw_info_t #define PAPIvi_exe_info_t PAPI_exe_info_t /* * The following macros are simple pass-throughs to PAPI 3 library calls */ /* The Low Level API */ #define PAPIvi_accum(EventSet, values) \ PAPI_accum(EventSet, values) #define PAPIvi_add_event(EventSet, Event) \ PAPI_add_event(EventSet, Event) #define PAPIvi_add_events(EventSet, Events, number) \ PAPI_add_events(EventSet, Events, number) #define PAPIvi_cleanup_eventset(EventSet) \ PAPI_cleanup_eventset(EventSet) #define PAPIvi_create_eventset(EventSet) \ PAPI_create_eventset(EventSet) #define PAPIvi_destroy_eventset(EventSet) \ PAPI_destroy_eventset(EventSet) #define PAPIvi_enum_event(EventCode, modifier) \ PAPI_enum_event(EventCode, modifier) #define PAPIvi_event_code_to_name(EventCode, out) \ PAPI_event_code_to_name(EventCode, out) #define PAPIvi_event_name_to_code(in, out) \ PAPI_event_name_to_code(in, out) #define PAPIvi_get_dmem_info(option) \ PAPI_get_dmem_info(option) #define PAPIvi_get_event_info(EventCode, info) \ PAPI_get_event_info(EventCode, info) #define PAPIvi_get_executable_info \ PAPI_get_executable_info #define PAPIvi_get_hardware_info \ PAPI_get_hardware_info #define PAPIvi_get_multiplex(EventSet) \ PAPI_get_multiplex(EventSet) #define PAPIvi_get_opt(option, ptr) \ PAPI_get_opt(option, ptr) #define PAPIvi_get_real_cyc \ PAPI_get_real_cyc #define PAPIvi_get_real_usec \ PAPI_get_real_usec #define PAPIvi_get_shared_lib_info \ PAPI_get_shared_lib_info #define PAPIvi_get_thr_specific(tag, ptr) \ PAPI_get_thr_specific(tag, ptr) #define PAPIvi_get_virt_cyc \ PAPI_get_virt_cyc #define PAPIvi_get_virt_usec \ PAPI_get_virt_usec #define PAPIvi_is_initialized \ PAPI_is_initialized #define PAPIvi_library_init(version) \ PAPI_library_init(version) #define PAPIvi_list_events(EventSet, Events, number) \ PAPI_list_events(EventSet, Events, number) #define PAPIvi_lock(lck) \ PAPI_lock(lck) #define PAPIvi_multiplex_init \ PAPI_multiplex_init #define PAPIvi_num_hwctrs \ PAPI_num_hwctrs #define PAPIvi_num_events(EventSet) \ PAPI_num_events(EventSet) #define PAPIvi_overflow(EventSet, EventCode, threshold, flags, handler) \ PAPI_overflow(EventSet, EventCode, threshold, flags, handler) #define PAPIvi_perror( s ) \ PAPI_perror( s ) #define PAPIvi_profil(buf, bufsiz, offset, scale, EventSet, EventCode, threshold, flags) \ PAPI_profil(buf, bufsiz, offset, scale, EventSet, EventCode, threshold, flags) #define PAPIvi_query_event(EventCode) \ PAPI_query_event(EventCode) #define PAPIvi_read(EventSet, values) \ PAPI_read(EventSet, values) #define PAPIvi_register_thread \ PAPI_register_thread #define PAPIvi_remove_event(EventSet, EventCode) \ PAPI_remove_event(EventSet, EventCode) #define PAPIvi_remove_events(EventSet, Events, number) \ PAPI_remove_events(EventSet, Events, number) #define PAPIvi_reset(EventSet) \ PAPI_reset(EventSet) #define PAPIvi_set_debug(level) \ PAPI_set_debug(level) #define PAPIvi_set_domain(domain) \ PAPI_set_domain(domain) #define PAPIvi_set_granularity(granularity) \ PAPI_set_granularity(granularity) #define PAPIvi_set_multiplex(EventSet) \ PAPI_set_multiplex(EventSet) #define PAPIvi_set_opt(option, ptr) \ PAPI_set_opt(option, ptr) #define PAPIvi_set_thr_specific(tag, ptr) \ PAPI_set_thr_specific(tag, ptr) #define PAPIvi_shutdown \ PAPI_shutdown #define PAPIvi_sprofil(prof, profcnt, EventSet, EventCode, threshold, flags) \ PAPI_sprofil(prof, profcnt, EventSet, EventCode, threshold, flags) #define PAPIvi_start(EventSet) \ PAPI_start(EventSet) #define PAPIvi_state(EventSet, status) \ PAPI_state(EventSet, status) #define PAPIvi_stop(EventSet, values) \ PAPI_stop(EventSet, values) #define PAPIvi_strerror(err) \ PAPI_strerror(err) #define PAPIvi_thread_id \ PAPI_thread_id #define PAPIvi_thread_init(id_fn) \ PAPI_thread_init(id_fn) #define PAPIvi_unlock(lck) \ PAPI_unlock(lck) #define PAPIvi_write(EventSet, values) \ PAPI_write(EventSet, values) /* The High Level API */ #define PAPIvi_accum_counters(values, array_len) \ PAPI_accum_counters(values, array_len) #define PAPIvi_num_counters \ PAPI_num_counters #define PAPIvi_read_counters(values, array_len) \ PAPI_read_counters(values, array_len) #define PAPIvi_start_counters(Events, array_len) \ PAPI_start_counters(Events, array_len) #define PAPIvi_stop_counters(values, array_len) \ PAPI_stop_counters(values, array_len) #define PAPIvi_flips(rtime, ptime, flpins, mflips) \ PAPI_flips(rtime, ptime, flpins, mflips) #define PAPIvi_flops(rtime, ptime, flpops, mflops) \ PAPI_flops(rtime, ptime, flpops, mflops) #define PAPIvi_ipc(rtime, ptime, ins, ipc) \ PAPI_ipc(rtime, ptime, ins, ipc) /******************************************************************************* * If PAPI_VERSION is defined, and the MAJOR version number is not 3, then we * generate an error message. * This block allows us to support future version with a * version independent syntax. ********************************************************************************/ #else #error Compiling against a not yet released PAPI version #endif #endif /* _PAPIVI */ papi-5.3.0/src/examples/0000700003276200002170000000000012247131121014537 5ustar ralphundrgradpapi-5.3.0/src/examples/PAPI_get_executable_info.c0000600003276200002170000000341412247131121021513 0ustar ralphundrgrad/***************************************************************************** * This is an example using the low level function PAPI_get_executable_info * * get the executable address space information. This function returns a * * pointer to a structure containing address information about the current * * program. * ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ int main() { int i,tmp=0; int retval; const PAPI_exe_info_t *prginfo = NULL; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } for(i=0;i<1000;i++) tmp=tmp+i; /* PAPI_get_executable_info returns a NULL if there is an error */ if ((prginfo = PAPI_get_executable_info()) == NULL) { printf("PAPI_get_executable_info error! \n"); exit(1); } printf("Start text addess of user program is at %p\n", prginfo->address_info.text_start); printf("End text address of user program is at %p\n", prginfo->address_info.text_end); exit(0); } papi-5.3.0/src/examples/PAPI_state.c0000600003276200002170000000503512247131121016641 0ustar ralphundrgrad/***************************************************************************** * We use PAPI_state to get the counting state of an EventSet.This function * * returns the state of the entire EventSet. * *****************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int main() { int retval; int status = 0; int EventSet = PAPI_NULL; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(-1); } /*Creating the Eventset */ if((retval = PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to our EventSet */ if ((retval=PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); if ((retval=PAPI_state(EventSet, &status)) != PAPI_OK) ERROR_RETURN(retval); printstate(status); /* Start counting */ if ((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); if (PAPI_state(EventSet, &status) != PAPI_OK) ERROR_RETURN(retval); printstate(status); /* free the resources used by PAPI */ PAPI_shutdown(); exit(0); } int printstate(int status) { if(status & PAPI_STOPPED) printf("Eventset is currently stopped or inactive \n"); if(status & PAPI_RUNNING) printf("Eventset is currently running \n"); if(status & PAPI_PAUSED) printf("Eventset is currently Paused \n"); if(status & PAPI_NOT_INIT) printf(" Eventset defined but not initialized \n"); if(status & PAPI_OVERFLOWING) printf(" Eventset has overflowing enabled \n"); if(status & PAPI_PROFILING) printf(" Eventset has profiling enabled \n"); if(status & PAPI_MULTIPLEXING) printf(" Eventset has multiplexing enabled \n"); return 0; } papi-5.3.0/src/examples/locks_pthreads.c0000600003276200002170000000632412247131121017717 0ustar ralphundrgrad/**************************************************************************** * This program shows how to use PAPI_register_thread, PAPI_lock, * * PAPI_unlock, PAPI_set_thr_specific, PAPI_get_thr_specific. * * Warning: Don't use PAPI_lock and PAPI_unlock on platforms on which the * * locking mechanisms are not implemented. * ****************************************************************************/ #include #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } #define LOOPS 100000 #define SLEEP_VALUE 20000 int count; int rank; void *Master(void *arg) { int i, retval, tmp; int *pointer, * pointer2; tmp = 20; pointer = &tmp; /* register the thread */ if ( (retval=PAPI_register_thread())!= PAPI_OK ) ERROR_RETURN(retval); /* save the pointer for late use */ if ( (retval=PAPI_set_thr_specific(1,pointer))!= PAPI_OK ) ERROR_RETURN(retval); /* change the value of tmp */ tmp = 15; usleep(SLEEP_VALUE); PAPI_lock(PAPI_USR1_LOCK); /* Make sure Slaves are not sleeping */ for (i = 0; i < LOOPS; i++) { count = 2 * count - i; } PAPI_unlock(PAPI_USR1_LOCK); /* retrieve the pointer saved by PAPI_set_thr_specific */ if ( (retval=PAPI_get_thr_specific(1, (void *)&pointer2)) != PAPI_OK ) ERROR_RETURN(retval); /* the output value should be 15 */ printf("Thread specific data is %d \n", *pointer2); pthread_exit(NULL); } void *Slave(void *arg) { int i; PAPI_lock(PAPI_USR2_LOCK); PAPI_lock(PAPI_USR1_LOCK); for (i = 0; i < LOOPS; i++) { count += i; } PAPI_unlock(PAPI_USR1_LOCK); PAPI_unlock(PAPI_USR2_LOCK); pthread_exit(NULL); } int main(int argc, char **argv) { pthread_t master; pthread_t slave1; int result_m, result_s, rc, i; int retval; /* Setup a random number so compilers can't optimize it out */ count = rand(); result_m = count; rank = 0; for (i = 0; i < LOOPS; i++) { result_m = 2 * result_m - i; } result_s = result_m; for (i = 0; i < LOOPS; i++) { result_s += i; } if ((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) { printf("Library initialization error! \n"); exit(-1); } if ((retval = PAPI_thread_init(&pthread_self)) != PAPI_OK) ERROR_RETURN(retval); if ((retval = PAPI_set_debug(PAPI_VERB_ECONT)) != PAPI_OK) ERROR_RETURN(retval); PAPI_lock(PAPI_USR2_LOCK); rc = pthread_create(&master, NULL, Master, NULL); if (rc) { retval = PAPI_ESYS; ERROR_RETURN(retval); } rc = pthread_create(&slave1, NULL, Slave, NULL); if (rc) { retval = PAPI_ESYS; ERROR_RETURN(retval); } pthread_join(master, NULL); printf("Master: Expected: %d Recieved: %d\n", result_m, count); if (result_m != count) ERROR_RETURN(1); PAPI_unlock(PAPI_USR2_LOCK); pthread_join(slave1, NULL); printf("Slave: Expected: %d Recieved: %d\n", result_s, count); if (result_s != count) ERROR_RETURN(1); exit(0); } papi-5.3.0/src/examples/Makefile.IRIX640000600003276200002170000000074212247131121017130 0ustar ralphundrgradPAPIINC = .. PAPILIB = ../libpapi.a CC = gcc CFLAGS = -I$(PAPIINC) LDFLAGS = $(PAPILIB) TARGETS = PAPI_set_domain sprofile multiplex PAPI_state PAPI_reset PAPI_profil PAPI_perror PAPI_get_virt_cyc PAPI_get_real_cyc PAPI_get_opt PAPI_hw_info PAPI_get_executable_info PAPI_ipc PAPI_flops PAPI_flips PAPI_overflow PAPI_add_remove_event high_level PAPI_add_remove_events all: $(TARGETS) $(TARGETS): $$@.c $(CC) $? -o $@ $(CFLAGS) $(LDFLAGS) clean: rm -f *.o $(TARGETS) papi-5.3.0/src/examples/PAPI_ipc.c0000600003276200002170000000324012247131121016270 0ustar ralphundrgrad/***************************************************************************** * This example demonstrates the usage of the high level function PAPI_ipc * * which measures the number of instructions executed per cpu cycle * *****************************************************************************/ /***************************************************************************** * The first call to PAPI_ipc initializes the PAPI library, set up the * * counters to monitor PAPI_TOT_INS and PAPI_TOT_CYC events, and start the * * counters. Subsequent calls will read the counters and return total real * * time, total process time, total instructions, and the instructions per * * cycle rate since the last call to PAPI_ipc. * *****************************************************************************/ #include #include #include "papi.h" main() { float real_time, proc_time,ipc; long long ins; float real_time_i, proc_time_i, ipc_i; long long ins_i; int retval; if((retval=PAPI_ipc(&real_time_i,&proc_time_i,&ins_i,&ipc_i)) < PAPI_OK) { printf("Could not initialise PAPI_ipc \n"); printf("retval: %d\n", retval); exit(1); } your_slow_code(); if((retval=PAPI_ipc( &real_time, &proc_time, &ins, &ipc)) #include #include "papi.h" /* This needs to be included every time you use PAPI */ int main() { int retval; int EventSet = PAPI_NULL; char error_str[PAPI_MAX_STR_LEN]; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { exit(1); } if ((retval = PAPI_create_eventset(&EventSet)) != PAPI_OK) { fprintf(stderr, "PAPI error %d: %s\n",retval,PAPI_strerror(retval)); exit(1); } /* Add Total Instructions Executed to our EventSet */ if ((retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) { PAPI_perror( "PAPI_add_event" ); exit(1); } /* Start counting */ if ((retval = PAPI_start(EventSet)) != PAPI_OK) { PAPI_perror( "PAPI_start" ); exit(1); } /* We are trying to start the counter which has already been started, and this will give an error which will be passed to PAPI_perror via retval and the function will then display the error string on the screen. */ if ((retval = PAPI_start(EventSet)) != PAPI_OK) { PAPI_perror( "PAPI_start" ); } /* The function PAPI_strerror returns the corresponding error string from the error code */ if ((retval = PAPI_start(EventSet)) != PAPI_OK) { printf("%s\n",PAPI_strerror(retval)); } /* finish using PAPI and free all related resources (this is optional, you don't have to use it */ PAPI_shutdown (); exit(0); } papi-5.3.0/src/examples/PAPI_overflow.c0000600003276200002170000001046612247131121017370 0ustar ralphundrgrad/***************************************************************************** * This example shows how to use PAPI_overflow to set up an event set to * * begin registering overflows. ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #include #define OVER_FMT "handler(%d ) Overflow at %p! bit=0x%llx \n" #define THRESHOLD 100000 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int total = 0; /* we use total to track the amount of overflows that occured */ /* THis is the handler called by PAPI_overflow*/ void handler(int EventSet, void *address, long long overflow_vector, void *context) { fprintf(stderr, OVER_FMT, EventSet, address, overflow_vector); total++; } int main () { int EventSet = PAPI_NULL; /* must be set to null before calling PAPI_create_eventset */ char errstring[PAPI_MAX_STR_LEN]; long long (values[2])[2]; int retval, i; double tmp = 0; int PAPI_event; /* a place holder for an event preset */ char event_name[PAPI_MAX_STR_LEN]; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if ((retval = PAPI_library_init (PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) { printf("Library initialization error! \n"); exit(1); } /* Here we create the eventset */ if ((retval=PAPI_create_eventset (&EventSet)) != PAPI_OK) ERROR_RETURN(retval); PAPI_event = PAPI_TOT_INS; /* Here we are querying for the existence of the PAPI presets */ if (PAPI_query_event (PAPI_TOT_INS) != PAPI_OK) { PAPI_event = PAPI_TOT_CYC; if ((retval=PAPI_query_event (PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); printf ("PAPI_TOT_INS not available on this platform."); printf (" so subst PAPI_event with PAPI_TOT_CYC !\n\n"); } /* PAPI_event_code_to_name is used to convert a PAPI preset from its integer value to its string name. */ if ((retval = PAPI_event_code_to_name (PAPI_event, event_name)) != PAPI_OK) ERROR_RETURN(retval); /* add event to the event set */ if ((retval = PAPI_add_event (EventSet, PAPI_event)) != PAPI_OK) ERROR_RETURN(retval); /* register overflow and set up threshold */ /* The threshold "THRESHOLD" was set to 100000 */ if ((retval = PAPI_overflow (EventSet, PAPI_event, THRESHOLD, 0, handler)) != PAPI_OK) ERROR_RETURN(retval); printf ("Here are the addresses at which overflows occured and overflow vectors \n"); printf ("--------------------------------------------------------------\n"); /* Start counting */ if ( (retval=PAPI_start (EventSet)) != PAPI_OK) ERROR_RETURN(retval); for (i = 0; i < 2000000; i++) { tmp = 1.01 + tmp; tmp++; } /* Stops the counters and reads the counter values into the values array */ if ( (retval=PAPI_stop (EventSet, values[0])) != PAPI_OK) ERROR_RETURN(retval); printf ("The total no of overflows was %d\n", total); /* clear the overflow status */ if ((retval = PAPI_overflow (EventSet, PAPI_event, 0, 0, handler)) != PAPI_OK) ERROR_RETURN(retval); /************************************************************************ * PAPI_cleanup_eventset can only be used after the counter has been * * stopped then it remove all events in the eventset * ************************************************************************/ if ( (retval=PAPI_cleanup_eventset (EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Free all memory and data structures, EventSet must be empty. */ if ( (retval=PAPI_destroy_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* free the resources used by PAPI */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/add_event/0000700003276200002170000000000012247131121016470 5ustar ralphundrgradpapi-5.3.0/src/examples/add_event/Papi_add_env_event.c0000600003276200002170000001130412247131121022407 0ustar ralphundrgrad/* * This example shows how to use PAPI_library_init, PAPI_create_eventset, * PAPI_add_event, * PAPI_start and PAPI_stop. These 5 functions * will allow a user to do most of the performance information gathering * that they would need. PAPI_read could also be used if you don't want * to stop the EventSet from running but only check the counts. * * Also, we will use PAPI_perror for * error information. * * In addition, a new call was created called PAPI_add_env_event * that allows a user to setup environment variable to read * which event should be monitored this allows different events * to be monitored at runtime without recompiling, the syntax * is as follows: * PAPI_add_env_event(int *EventSet, int *Event, char *env_variable); * EventSet is the same as in PAPI_add_event * Event is the default event to monitor if the environment variable * does not exist and differs from PAPI_add_event as it is * a pointer. * env_varialbe is the name of the environment variable to look for * the event code, this can be a name, number or hex, for example * PAPI_L1_DCM could be defined in the environment variable as * all of the following: PAPI_L1_DCM, 0x80000000, or -2147483648 * * To use only add_event you would change the calls to * PAPI_add_env_event(int *EventSet, int *Event, char *env_variable); * to PAPI_add_event(int *EventSet, int Event); * * We will also use PAPI_event_code_to_name since the event may have * changed. * Author: Kevin London * email: london@cs.utk.edu */ #include #include #include "papi.h" /* This needs to be included anytime you use PAPI */ int PAPI_add_env_event(int *EventSet, int *Event, char *env_variable); int main(){ int retval,i; int EventSet=PAPI_NULL; int event_code=PAPI_TOT_INS; /* By default monitor total instructions */ char errstring[PAPI_MAX_STR_LEN]; char event_name[PAPI_MAX_STR_LEN]; float a[1000],b[1000],c[1000]; long long values; /* This initializes the library and checks the version number of the * header file, to the version of the library, if these don't match * then it is likely that PAPI won't work correctly. */ if ((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ){ /* This call loads up what the error means into errstring * if retval == PAPI_ESYS then it might be beneficial * to call perror as well to see what system call failed */ PAPI_perror("PAPI_library_init"); exit(-1); } /* Create space for the EventSet */ if ( (retval=PAPI_create_eventset( &EventSet ))!=PAPI_OK){ PAPI_perror(retval, errstring, PAPI_MAX_STR_LEN); exit(-1); } /* After this call if the environment variable PAPI_EVENT is set, * event_code may contain something different than total instructions. */ if ( (retval=PAPI_add_env_event(&EventSet, &event_code, "PAPI_EVENT"))!=PAPI_OK){ PAPI_perror("PAPI_add_env_event"); exit(-1); } /* Now lets start counting */ if ( (retval = PAPI_start(EventSet)) != PAPI_OK ){ PAPI_perror("PAPI_start"); exit(-1); } /* Some work to take up some time, the PAPI_start/PAPI_stop (and/or * PAPI_read) should surround what you want to monitor. */ for ( i=0;i<1000;i++){ a[i] = b[i]-c[i]; c[i] = a[i]*1.2; } if ( (retval = PAPI_stop(EventSet, &values) ) != PAPI_OK ){ PAPI_perror("PAPI_stop"); exit(-1); } if ( (retval=PAPI_event_code_to_name( event_code, event_name))!=PAPI_OK){ PAPI_perror("PAPI_event_code_to_name"); exit(-1); } printf("Ending values for %s: %lld\n", event_name,values); /* Remove PAPI instrumentation, this is necessary on platforms * that need to release shared memory segments and is always * good practice. */ PAPI_shutdown(); exit(0); } int PAPI_add_env_event(int *EventSet, int *EventCode, char *env_variable){ int real_event=*EventCode; char *eventname; int retval; if ( env_variable != NULL ){ if ( (eventname=getenv(env_variable)) ) { if ( eventname[0] == 'P' ) { /* Use the PAPI name */ retval=PAPI_event_name_to_code(eventname, &real_event ); if ( retval != PAPI_OK ) real_event = *EventCode; } else{ if ( strlen(eventname)>1 && eventname[1]=='x') sscanf(eventname, "%#x", &real_event); else real_event = atoi(eventname); } } } if ( (retval = PAPI_add_event( *EventSet, real_event))!= PAPI_OK ){ if ( real_event != *EventCode ) { if ( (retval = PAPI_add_event( *EventSet, *EventCode)) == PAPI_OK ){ real_event = *EventCode; } } } *EventCode = real_event; return retval; } papi-5.3.0/src/examples/PAPI_get_opt.c0000600003276200002170000000604312247131121017162 0ustar ralphundrgrad/***************************************************************************** * This is an example using the low level function PAPI_get_opt to query the * * option settings of the PAPI library or a specific eventset created by the * * PAPI_create_eventset function. PAPI_set_opt is used on the otherhand to * * set PAPI library or event set options. * *****************************************************************************/ #include #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int poorly_tuned_function() { float tmp; int i; for(i=1; i<2000; i++) { tmp=(tmp+100)/i; } return 0; } int main() { int num, retval, EventSet = PAPI_NULL; PAPI_option_t options; long long values[2]; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /*PAPI_get_opt returns a negative number if there is an error */ /* This call returns the maximum available hardware counters */ if((num = PAPI_get_opt(PAPI_MAX_HWCTRS,NULL)) <= 0) ERROR_RETURN(num); printf("This machine has %d counters.\n",num); if ((retval=PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Set the domain of this EventSet to counter user and kernel modes for this process. */ memset(&options,0x0,sizeof(options)); options.domain.eventset = EventSet; /* Default domain is PAPI_DOM_USER */ options.domain.domain = PAPI_DOM_ALL; /* this sets the options for the domain */ if ((retval=PAPI_set_opt(PAPI_DOMAIN, &options)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed event to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Cycles Executed event to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); poorly_tuned_function(); /* Stop counting */ if((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf(" Total instructions: %lld Total Cycles: %lld \n", values[0], values[1]); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_add_remove_event.c0000600003276200002170000000720612247131121021031 0ustar ralphundrgrad/***************************************************************************** * This example shows how to use PAPI_add_event, PAPI_start, PAPI_read, * * PAPI_stop and PAPI_remove_event. * ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define NUM_EVENTS 2 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int main() { int EventSet = PAPI_NULL; int tmp, i; /*must be initialized to PAPI_NULL before calling PAPI_create_event*/ long long values[NUM_EVENTS]; /*This is where we store the values we read from the eventset */ /* We use number to keep track of the number of events in the EventSet */ int retval, number; char errstring[PAPI_MAX_STR_LEN]; /*************************************************************************** * This part initializes the library and compares the version number of the* * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ***************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) ERROR_RETURN(retval); /* Creating the eventset */ if ( (retval = PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Cycles event to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* get the number of events in the event set */ number = 0; if ( (retval = PAPI_list_events(EventSet, NULL, &number)) != PAPI_OK) ERROR_RETURN(retval); printf("There are %d events in the event set\n", number); /* Start counting */ if ( (retval = PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* you can replace your code here */ tmp=0; for (i = 0; i < 2000000; i++) { tmp = i + tmp; } /* read the counter values and store them in the values array */ if ( (retval=PAPI_read(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf("The total instructions executed for the first loop are %lld \n", values[0] ); printf("The total cycles executed for the first loop are %lld \n",values[1]); /* our slow code again */ tmp=0; for (i = 0; i < 2000000; i++) { tmp = i + tmp; } /* Stop counting and store the values into the array */ if ( (retval = PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf("Total instructions executed are %lld \n", values[0] ); printf("Total cycles executed are %lld \n",values[1]); /* Remove event: We are going to take the PAPI_TOT_INS from the eventset */ if( (retval = PAPI_remove_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); printf("Removing PAPI_TOT_INS from the eventset\n"); /* Now we list how many events are left on the event set */ number = 0; if ((retval=PAPI_list_events(EventSet, NULL, &number))!= PAPI_OK) ERROR_RETURN(retval); printf("There is only %d event left in the eventset now\n", number); /* free the resources used by PAPI */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/Makefile0000600003276200002170000000251112247131121016200 0ustar ralphundrgradPAPIINC = .. PAPILIB = ../libpapi.a CC = gcc CFLAGS += -I$(PAPIINC) OS = $(shell uname) TARGETS_NTHD = PAPI_set_domain sprofile multiplex PAPI_state PAPI_reset PAPI_profil PAPI_perror PAPI_get_virt_cyc PAPI_get_real_cyc PAPI_get_opt PAPI_hw_info PAPI_get_executable_info PAPI_ipc PAPI_flops PAPI_flips PAPI_overflow PAPI_add_remove_event high_level PAPI_add_remove_events TARGETS_PTHREAD = locks_pthreads overflow_pthreads ifeq ($(OS), SunOS) LDFLAGS = $(PAPILIB) -lcpc LDFLAGS_PTHREAD = $(PAPILIB) -lpthread -lcpc TARGETS = $(TARGETS_NTHD) $(TARGETS_PTHREAD) else ifeq ($(OS), AIX) CC = xlc LDFLAGS = $(PAPILIB) -lpmapi LDFLAGS_PTHREAD = $(PAPILIB) -lpthread -lpmapi TARGETS = $(TARGETS_NTHD) $(TARGETS_PTHREAD) else ifeq ($(OS), OSF1) LDFLAGS = $(PAPILIB) -lrt LDFLAGS_PTHREAD = $(PAPILIB) -lpthread -lrt TARGETS = $(TARGETS_NTHD) else ifeq ($(OS), Linux) TARGETS = $(TARGETS_NTHD) $(TARGETS_PTHREAD) else TARGETS = $(TARGETS_NTHD) endif LDFLAGS = $(PAPILIB) LDFLAGS_PTHREAD = $(PAPILIB) -lpthread endif endif endif all: $(TARGETS) $(TARGETS_NTHD): %:%.o $(CC) -o $@ $(CFLAGS) $^ $(LDFLAGS) $(TARGETS_PTHREAD): %:%.o $(CC) -o $@ $(CFLAGS) $^ $(LDFLAGS_PTHREAD) clean: $(RM) *.o $(TARGETS) papi-5.3.0/src/examples/README0000600003276200002170000000144312247131121015423 0ustar ralphundrgrad/* * File: papi/src/examples/README * Author: Min Zhou * min@cs.utk.edu * Mods: * */ This directory contains: Makefile example Makefile for platforms that support GNU make Makefile.AIX example Makefile for AIX; Makefile.IRIX64 example Makefile for IRIX64; Makefile.OSF1 example Makefile for OSF1; *.c various example programs run_examples.sh shell script to test the example programs NOTE: not all the example program can be run successfully due to the availability of the events. For example, PAPI_FP_INS is a derived event in power3 and UltraSparc III, so overflow_pthreads can not be run successfully in these platforms. But these programs should help you understand how to use the PAPI functions. papi-5.3.0/src/examples/PAPI_get_virt_cyc.c0000600003276200002170000000331112247131121020175 0ustar ralphundrgrad/****************************************************************************** * This is an example to show how to use low level function PAPI_get_virt_cyc * * and PAPI_get_virt_usec. * ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ int i; double tmp; int your_slow_code() { for(i=1; i<200000; i++) { tmp= (tmp+i)/2; } return 0; } int main() { long long s,s1, e, e1; int retval; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /* Here you get initial cycles and time */ /* No error checking is done here because this function call is always successful */ s = PAPI_get_virt_cyc(); your_slow_code(); /*Here you get final cycles and time */ e = PAPI_get_virt_cyc(); s1= PAPI_get_virt_usec(); your_slow_code(); e1= PAPI_get_virt_usec(); printf("Virtual cycles : %lld\nVirtual time(ms): %lld\n",e-s,e1-s1); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_hw_info.c0000600003276200002170000000325112247131121017150 0ustar ralphundrgrad/**************************************************************************** * This is a simple low level example for getting information on the system * * hardware. This function PAPI_get_hardware_info(), returns a pointer to a * * structure of type PAPI_hw_info_t, which contains number of CPUs, nodes, * * vendor number/name for CPU, CPU revision, clock speed. * ****************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ int main() { const PAPI_hw_info_t *hwinfo = NULL; int retval; /*************************************************************************** * This part initializes the library and compares the version number of the* * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ***************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /* Get hardware info*/ if ((hwinfo = PAPI_get_hardware_info()) == NULL) { printf("PAPI_get_hardware_info error! \n"); exit(1); } /* when there is an error, PAPI_get_hardware_info returns NULL */ printf("%d CPU at %f Mhz.\n",hwinfo->totalcpus,hwinfo->mhz); printf(" model string is %s \n", hwinfo->model_string); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_flips.c0000600003276200002170000000505612247131121016641 0ustar ralphundrgrad/***************************************************************************** * This example demonstrates the usage of the high level function PAPI_flips * * which measures the number of floating point instructions executed and the * * MegaFlop rate(defined as the number of floating point instructions per * * microsecond). To use PAPI_flips you need to have floating point * * instructions event supported by the platform. * *****************************************************************************/ /***************************************************************************** * The first call to PAPI_flips initializes the PAPI library, set up the * * counters to monitor PAPI_FP_INS and PAPI_TOT_CYC events, and start the * * counters. Subsequent calls will read the counters and return total real * * time, total process time, total floating point instructions, and the * * Mflins/s rate since the last call to PAPI_flips. * *****************************************************************************/ #include #include #include "papi.h" main() { float real_time, proc_time,mflips; long long flpins; float ireal_time, iproc_time, imflips; long long iflpins; int retval; /*********************************************************************** * if PAPI_FP_INS is a derived event in your platform, then your * * platform must have at least three counters to support PAPI_flips, * * because PAPI needs one counter to cycles. So in UltraSparcIII, even * * the platform supports PAPI_FP_INS, but UltraSparcIII only have two * * available hardware counters and PAPI_FP_INS is a derived event in * * this platform, so PAPI_flops returns an error. * ***********************************************************************/ if((retval=PAPI_flips(&ireal_time,&iproc_time,&iflpins,&imflips)) < PAPI_OK) { printf("Could not initialise PAPI_flips \n"); printf("Your platform may not support floating point instruction event.\n"); printf("retval: %d\n", retval); exit(1); } your_slow_code(); if((retval=PAPI_flips( &real_time, &proc_time, &flpins, &mflips)) #include #include "papi.h" main() { float real_time, proc_time,mflops; long long flpops; float ireal_time, iproc_time, imflops; long long iflpops; int retval; /*********************************************************************** * if PAPI_FP_OPS is a derived event in your platform, then your * * platform must have at least three counters to support PAPI_flops, * * because PAPI needs one counter to cycles. So in UltraSparcIII, even * * the platform supports PAPI_FP_OPS, but UltraSparcIII only has two * * available hardware counters and PAPI_FP_OPS is a derived event in * * this platform, so PAPI_flops returns an error. * ***********************************************************************/ if((retval=PAPI_flops(&ireal_time,&iproc_time,&iflpops,&imflops)) < PAPI_OK) { printf("Could not initialise PAPI_flops \n"); printf("Your platform may not support floating point operation event.\n"); printf("retval: %d\n", retval); exit(1); } your_slow_code(); if((retval=PAPI_flops( &real_time, &proc_time, &flpops, &mflops)) #include #include #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int poorly_tuned_function() { float tmp; int i; for(i=1; i<2000; i++) { tmp=(tmp+100)/i; } return 0; } int main() { int num, retval, EventSet = PAPI_NULL; long long values[2]; PAPI_option_t options; int fd; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /* Set the domain of this EventSet to counter user mode. The domain will be valid for all the eventset created after this function call unless you call PAPI_set_domain again */ if ((retval=PAPI_set_domain(PAPI_DOM_USER)) != PAPI_OK) ERROR_RETURN(retval); if ((retval=PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed event to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Cycles Executed event to the EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); poorly_tuned_function(); /* add some system calls */ fd = open("/dev/zero", O_RDONLY); if (fd == -1) { perror("open(/dev/zero)"); exit(1); } close(fd); /* Stop counting */ if((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf(" Total instructions: %lld Total Cycles: %lld \n", values[0], values[1]); /* Set the domain of this EventSet to counter user and kernel modes */ if ((retval=PAPI_set_domain(PAPI_DOM_ALL)) != PAPI_OK) ERROR_RETURN(retval); EventSet = PAPI_NULL; if ((retval=PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); poorly_tuned_function(); /* add some system calls */ fd = open("/dev/zero", O_RDONLY); if (fd == -1) { perror("open(/dev/zero)"); exit(1); } close(fd); /* Stop counting */ if((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf(" Total instructions: %lld Total Cycles: %lld \n", values[0], values[1]); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_get_real_cyc.c0000600003276200002170000000331212247131121020135 0ustar ralphundrgrad/****************************************************************************** * This is an example to show how to use low level function PAPI_get_real_cyc * * and PAPI_get_real_usec. * ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ int your_slow_code() { int i,tmp; for(i=1; i<20000; i++) { tmp=(tmp+100)/i; } return 0; } int main() { long long s,s1, e, e1; int retval; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /* Here you get initial cycles and time */ /* No error checking is done here because this function call is always successful */ s = PAPI_get_real_cyc(); your_slow_code(); /*Here you get final cycles and time */ e = PAPI_get_real_cyc(); s1= PAPI_get_real_usec(); your_slow_code(); e1= PAPI_get_real_usec(); printf("Wallclock cycles : %lld\nWallclock time(ms): %lld\n",e-s,e1-s1); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/run_examples.sh0000700003276200002170000000071012247131121017576 0ustar ralphundrgrad#!/bin/sh # File: run_example.sh # CVS: $Id$ # Author: Min Zhou # min@cs.utk.edu CTESTS=`find . -perm -u+x -type f`; ALLTESTS="$CTESTS"; x=0; CWD=`pwd` echo "Platform:" uname -a echo "" echo "The following test cases will be run:"; echo $ALLTESTS; echo ""; echo "Running C Example Programs"; echo "" for i in $CTESTS; do if [ -x $i ]; then if [ "$i" != "./run_examples.sh" ]; then echo "Running $i: "; ./$i fi; fi; echo ""; done papi-5.3.0/src/examples/high_level.c0000600003276200002170000001241212247131121017013 0ustar ralphundrgrad/***************************************************************************** * This example code shows how to use most of PAPI's High level functions * * to start,count,read and stop on an event set. We use two preset events * * here: * * PAPI_TOT_INS: Total instructions executed in a period of time * * PAPI_TOT_CYC: Total cpu cycles in a period of time * ******************************************************************************/ #include #include #include "papi.h" #define NUM_EVENTS 2 #define THRESHOLD 10000 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } /* stupid codes to be monitored */ void computation_mult() { double tmp=1.0; int i=1; for( i = 1; i < THRESHOLD; i++ ) { tmp = tmp*i; } } /* stupid codes to be monitored */ void computation_add() { int tmp = 0; int i=0; for( i = 0; i < THRESHOLD; i++ ) { tmp = tmp + i; } } int main() { /*Declaring and initializing the event set with the presets*/ int Events[2] = {PAPI_TOT_INS, PAPI_TOT_CYC}; /*The length of the events array should be no longer than the value returned by PAPI_num_counters.*/ /*declaring place holder for no of hardware counters */ int num_hwcntrs = 0; int retval; char errstring[PAPI_MAX_STR_LEN]; /*This is going to store our list of results*/ long long values[NUM_EVENTS]; /*************************************************************************** * This part initializes the library and compares the version number of the* * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ***************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { fprintf(stderr, "Error: %d %s\n",retval, errstring); exit(1); } /************************************************************************** * PAPI_num_counters returns the number of hardware counters the platform * * has or a negative number if there is an error * **************************************************************************/ if ((num_hwcntrs = PAPI_num_counters()) < PAPI_OK) { printf("There are no counters available. \n"); exit(1); } printf("There are %d counters in this system\n",num_hwcntrs); /************************************************************************** * PAPI_start_counters initializes the PAPI library (if necessary) and * * starts counting the events named in the events array. This function * * implicitly stops and initializes any counters running as a result of * * a previous call to PAPI_start_counters. * **************************************************************************/ if ( (retval = PAPI_start_counters(Events, NUM_EVENTS)) != PAPI_OK) ERROR_RETURN(retval); printf("\nCounter Started: \n"); /* Your code goes here*/ computation_add(); /********************************************************************** * PAPI_read_counters reads the counter values into values array * **********************************************************************/ if ( (retval=PAPI_read_counters(values, NUM_EVENTS)) != PAPI_OK) ERROR_RETURN(retval); printf("Read successfully\n"); printf("The total instructions executed for addition are %lld \n",values[0]); printf("The total cycles used are %lld \n", values[1] ); printf("\nNow we try to use PAPI_accum to accumulate values\n"); /* Do some computation here */ computation_add(); /************************************************************************ * What PAPI_accum_counters does is it adds the running counter values * * to what is in the values array. The hardware counters are reset and * * left running after the call. * ************************************************************************/ if ( (retval=PAPI_accum_counters(values, NUM_EVENTS)) != PAPI_OK) ERROR_RETURN(retval); printf("We did an additional %d times addition!\n", THRESHOLD); printf("The total instructions executed for addition are %lld \n", values[0] ); printf("The total cycles used are %lld \n", values[1] ); /*********************************************************************** * Stop counting events(this reads the counters as well as stops them * ***********************************************************************/ printf("\nNow we try to do some multiplications\n"); computation_mult(); /******************* PAPI_stop_counters **********************************/ if ((retval=PAPI_stop_counters(values, NUM_EVENTS)) != PAPI_OK) ERROR_RETURN(retval); printf("The total instruction executed for multiplication are %lld \n", values[0] ); printf("The total cycles used are %lld \n", values[1] ); exit(0); } papi-5.3.0/src/examples/PAPI_add_remove_events.c0000600003276200002170000000573212247131121021216 0ustar ralphundrgrad/****************************************************************************** * This is a simple low level function demonstration on using PAPI_add_events * * to add an array of events to a created eventset, we are going to use these * * events to monitor a set of instructions, start the counters, read the * * counters and then cleanup the eventset when done. In this example we use * * the presets PAPI_TOT_INS and PAPI_TOT_CYC. PAPI_add_events,PAPI_start, * * PAPI_stop, PAPI_clean_eventset, PAPI_destroy_eventset and * * PAPI_create_eventset all return PAPI_OK(which is 0) when succesful. * ******************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define NUM_EVENT 2 #define THRESHOLD 100000 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int main(){ int i,retval,tmp; int EventSet = PAPI_NULL; /*must be initialized to PAPI_NULL before calling PAPI_create_event*/ int event_codes[NUM_EVENT]={PAPI_TOT_INS,PAPI_TOT_CYC}; char errstring[PAPI_MAX_STR_LEN]; long long values[NUM_EVENT]; /*************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { fprintf(stderr, "Error: %s\n", errstring); exit(1); } /* Creating event set */ if ((retval=PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add the array of events PAPI_TOT_INS and PAPI_TOT_CYC to the eventset*/ if ((retval=PAPI_add_events(EventSet, event_codes, NUM_EVENT)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if ( (retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); /*** this is where your computation goes *********/ for(i=0;i<1000;i++) { tmp = tmp+i; } /* Stop counting, this reads from the counter as well as stop it. */ if ( (retval=PAPI_stop(EventSet,values)) != PAPI_OK) ERROR_RETURN(retval); printf("\nThe total instructions executed are %lld, total cycles %lld\n", values[0],values[1]); if ( (retval=PAPI_remove_events(EventSet,event_codes, NUM_EVENT))!=PAPI_OK) ERROR_RETURN(retval); /* Free all memory and data structures, EventSet must be empty. */ if ( (retval=PAPI_destroy_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* free the resources used by PAPI */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/multiplex.c0000600003276200002170000001024512247131121016732 0ustar ralphundrgrad/**************************************************************************** * Multiplexing allows more counters to be used than what is supported by * * the platform, thus allowing a larger number of events to be counted * * simultaneously. When a microprocessor has a very limited number of * * counters that can be counted simultaneously, a large application with * * many hours of run time may require days of profiling in order to gather * * enough information to base a performance analysis. Multiplexing overcomes* * this limitation by the usage of the counters over timesharing. * * This is an example demonstrating how to use PAPI_set_multiplex to * * convert a standard event set to a multiplexed event set. * ****************************************************************************/ #include #include #include #include "papi.h" #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } #define NUM_ITERS 10000000 #define MAX_TO_ADD 6 double c = 0.11; void do_flops(int n) { int i; double a = 0.5; double b = 6.2; for (i=0; i < n; i++) c += a * b; return; } /* Tests that we can really multiplex a lot. */ int multiplex(void) { int retval, i, EventSet = PAPI_NULL, j = 0; long long *values; PAPI_event_info_t pset; int events[MAX_TO_ADD], number; /* Initialize the library */ retval = PAPI_library_init(PAPI_VER_CURRENT); if (retval != PAPI_VER_CURRENT) { printf("Library initialization error! \n"); exit(1); } /* initialize multiplex support */ retval = PAPI_multiplex_init(); if (retval != PAPI_OK) ERROR_RETURN(retval); retval = PAPI_create_eventset(&EventSet); if (retval != PAPI_OK) ERROR_RETURN(retval); /* convert the event set to a multiplex event set */ retval = PAPI_set_multiplex(EventSet); if (retval != PAPI_OK) ERROR_RETURN(retval); /* retval = PAPI_add_event(EventSet, PAPI_TOT_INS); if ((retval != PAPI_OK) && (retval != PAPI_ECNFLCT)) ERROR_RETURN(retval); printf("Adding %s\n", "PAPI_TOT_INS"); */ for (i = 0; i < PAPI_MAX_PRESET_EVENTS; i++) { retval = PAPI_get_event_info(i | PAPI_PRESET_MASK, &pset); if (retval != PAPI_OK) ERROR_RETURN(retval); if ((pset.count) && (pset.event_code != PAPI_TOT_CYC)) { printf("Adding %s\n", pset.symbol); retval = PAPI_add_event(EventSet, pset.event_code); if ((retval != PAPI_OK) && (retval != PAPI_ECNFLCT)) ERROR_RETURN(retval); if (retval == PAPI_OK) printf("Added %s\n", pset.symbol); else printf("Could not add %s due to resource limitation.\n", pset.symbol); if (retval == PAPI_OK) { if (++j >= MAX_TO_ADD) break; } } } values = (long long *) malloc(MAX_TO_ADD * sizeof(long long)); if (values == NULL) { printf("Not enough memory available. \n"); exit(1); } if ((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); do_flops(NUM_ITERS); retval = PAPI_stop(EventSet, values); if (retval != PAPI_OK) ERROR_RETURN(retval); /* get the number of events in the event set */ number=MAX_TO_ADD; if ( (retval = PAPI_list_events(EventSet, events, &number)) != PAPI_OK) ERROR_RETURN(retval); /* print the read result */ for (i = 0; i < MAX_TO_ADD; i++) { retval = PAPI_get_event_info(events[i], &pset); if (retval != PAPI_OK) ERROR_RETURN(retval); printf("Event name: %s value: %lld \n", pset.symbol, values[i]); } retval = PAPI_cleanup_eventset(EventSet); if (retval != PAPI_OK) ERROR_RETURN(retval); retval = PAPI_destroy_eventset(&EventSet); if (retval != PAPI_OK) ERROR_RETURN(retval); /* free the resources used by PAPI */ PAPI_shutdown(); return (0); } int main(int argc, char **argv) { printf("Using %d iterations\n\n", NUM_ITERS); printf("Does PAPI_multiplex_init() handle lots of events?\n"); multiplex(); exit(0); } papi-5.3.0/src/examples/Makefile.AIX0000600003276200002170000000127112247131121016622 0ustar ralphundrgradPAPIINC = .. PAPILIB = ../libpapi.a CC = xlc CFLAGS = -I$(PAPIINC) LDFLAGS = $(PAPILIB) -lpmapi LDFLAGS_PTHREAD = $(PAPILIB) -lpthread -lpmapi TARGETS = PAPI_set_domain sprofile multiplex PAPI_state PAPI_reset PAPI_profil PAPI_perror PAPI_get_virt_cyc PAPI_get_real_cyc PAPI_get_opt PAPI_hw_info PAPI_get_executable_info PAPI_ipc PAPI_flops PAPI_flips PAPI_overflow PAPI_add_remove_event high_level PAPI_add_remove_events TARGETS_PTHREAD = locks_pthreads overflow_pthreads all: $(TARGETS) $(TARGETS_PTHREAD) $(TARGETS): $$@.c $(CC) $? -o $@ $(CFLAGS) $(LDFLAGS) $(TARGETS_PTHREAD): $$@.c $(CC) $? -o $@ $(CFLAGS) $(LDFLAGS_PTHREAD) clean: rm -f *.o $(TARGETS) $(TARGETS_PTHREAD) papi-5.3.0/src/examples/sprofile.c0000600003276200002170000001155312247131121016535 0ustar ralphundrgrad/* This program shows how to use PAPI_sprofil */ #include #include #include #include #include #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define NUM_FLOPS 20000000 #define NUM_ITERS 100000 #define THRESHOLD 100000 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } #if (defined(linux) && defined(__ia64__)) || (defined(_AIX)) #define DO_FLOPS1 (caddr_t)(*(void **)do_flops1) #define DO_FLOPS2 (caddr_t)(*(void **)do_flops2) #else #define DO_FLOPS1 (caddr_t)(do_flops1) #define DO_FLOPS2 (caddr_t)(do_flops2) #endif void do_flops2(int); volatile double t1 = 0.8, t2 = 0.9; void do_flops1(int n) { int i; double c = 22222.11; for (i = 0; i < n; i++) c -= t1 * t2; } void do_both(int n) { int i; const int flops2 = NUM_FLOPS / n; const int flops1 = NUM_FLOPS / n; for (i = 0; i < n; i++) { do_flops1(flops1); do_flops2(flops2); } } int main(int argc, char **argv) { int i , PAPI_event; int EventSet = PAPI_NULL; unsigned short *profbuf; unsigned short *profbuf2; unsigned short *profbuf3; unsigned long length; caddr_t start, end; long long values[2]; const PAPI_exe_info_t *prginfo = NULL; PAPI_sprofil_t sprof[3]; int retval; /* initializaion */ if ((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) { printf("Library initialization error! \n"); exit(1); } if ((prginfo = PAPI_get_executable_info()) == NULL) ERROR_RETURN(1); start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; length = (end - start)/sizeof(unsigned short) * sizeof(unsigned short); printf("start= %p end =%p \n", start, end); profbuf = (unsigned short *) malloc(length); if (profbuf == NULL) ERROR_RETURN(PAPI_ESYS); memset(profbuf, 0x00, length ); profbuf2 = (unsigned short *) malloc(length); if (profbuf2 == NULL) ERROR_RETURN(PAPI_ESYS); memset(profbuf2, 0x00, length ); profbuf3 = (unsigned short *) malloc(1 * sizeof(unsigned short)); if (profbuf3 == NULL) ERROR_RETURN(PAPI_ESYS); memset(profbuf3, 0x00, 1 * sizeof(unsigned short)); /* First half */ sprof[0].pr_base = profbuf; sprof[0].pr_size = length / 2; sprof[0].pr_off = DO_FLOPS2; fprintf(stderr, "do_flops is at %p %lx\n", &do_flops2, strtoul(sprof[0].pr_off,NULL,0)); sprof[0].pr_scale = 65536; /* constant needed by PAPI_sprofil */ /* Second half */ sprof[1].pr_base = profbuf2; sprof[1].pr_size = length / 2; sprof[1].pr_off = DO_FLOPS1; fprintf(stderr, "do_flops1 is at %p %lx\n", &do_flops1, strtoul(sprof[1].pr_off,NULL,0)); sprof[1].pr_scale = 65536; /* constant needed by PAPI_sprofil */ /* Overflow bin */ sprof[2].pr_base = profbuf3; sprof[2].pr_size = 1; sprof[2].pr_off = 0; sprof[2].pr_scale = 0x2; /* constant needed by PAPI_sprofil */ /* Creating the eventset */ if ( (retval = PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); PAPI_event = PAPI_TOT_CYC; /* Add Total Instructions Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_event)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); /* set profile flag */ if ((retval = PAPI_sprofil(sprof, 3, EventSet, PAPI_event, THRESHOLD, PAPI_PROFIL_POSIX)) != PAPI_OK) ERROR_RETURN(retval); if ((retval = PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); do_both(NUM_ITERS); if ((retval = PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); /* to clear the profile flag before removing the events */ if ((retval = PAPI_sprofil(sprof, 3, EventSet, PAPI_event, 0, PAPI_PROFIL_POSIX)) != PAPI_OK) ERROR_RETURN(retval); /* free the resources hold by PAPI */ PAPI_shutdown(); printf("Test case: PAPI_sprofil()\n"); printf("---------Buffer 1--------\n"); for (i = 0; i < length / 2; i++) { if (profbuf[i]) printf("0x%lx\t%d\n", strtoul(DO_FLOPS2,NULL,0) + 2 * i, profbuf[i]); } printf("---------Buffer 2--------\n"); for (i = 0; i < length / 2; i++) { if (profbuf2[i]) printf("0x%lx\t%d\n", strtoul(DO_FLOPS1,NULL,0) + 2 * i, profbuf2[i]); } printf("-------------------------\n"); printf("%u samples fell outside the regions.\n", *profbuf3); exit(0); } /* Declare a and b to be volatile. This is to try to keep the compiler from optimizing the loop */ volatile double a = 0.5, b = 2.2; void do_flops2(int n) { int i; double c = 0.11; for (i = 0; i < n; i++) c += a * b; } papi-5.3.0/src/examples/overflow_pthreads.c0000600003276200002170000001141112247131121020440 0ustar ralphundrgrad/* This file performs the following test: overflow dispatch with pthreads - This example tests the dispatch of overflow calls from PAPI. The event set is counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). The Eventset contains: + PAPI_TOT_INS (overflow monitor) + PAPI_TOT_CYC Each thread will do the followings : - enable overflow - Start eventset 1 - Do flops - Stop eventset 1 - disable overflow */ #include #include #include #include "papi.h" #define THRESHOLD 200000 #define OVER_FMT "handler(%d ) Overflow at %p! bit=0x%llx \n" #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int total = 0; void do_flops(int n) { int i; double c = 0.11; double a = 0.5; double b = 6.2; for (i=0; i < n; i++) c += a * b; } /* overflow handler */ void handler(int EventSet, void *address, long long overflow_vector, void *context) { fprintf(stderr, OVER_FMT, EventSet, address, overflow_vector); total++; } void *Thread(void *arg) { int retval; int EventSet1=PAPI_NULL; long long values[2]; long long elapsed_us, elapsed_cyc; fprintf(stderr,"Thread %lx running PAPI\n",PAPI_thread_id()); /* create the event set */ if ( (retval = PAPI_create_eventset(&EventSet1))!=PAPI_OK) ERROR_RETURN(retval); /* query whether the event exists */ if ((retval=PAPI_query_event(PAPI_TOT_INS)) != PAPI_OK) ERROR_RETURN(retval); if ((retval=PAPI_query_event(PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* add events to the event set */ if ( (retval = PAPI_add_event(EventSet1, PAPI_TOT_INS))!= PAPI_OK) ERROR_RETURN(retval); if ( (retval = PAPI_add_event(EventSet1, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); elapsed_us = PAPI_get_real_usec(); elapsed_cyc = PAPI_get_real_cyc(); retval = PAPI_overflow(EventSet1, PAPI_TOT_CYC, THRESHOLD, 0, handler); if(retval !=PAPI_OK) ERROR_RETURN(retval); /* start counting */ if((retval = PAPI_start(EventSet1))!=PAPI_OK) ERROR_RETURN(retval); do_flops(*(int *)arg); if ((retval = PAPI_stop(EventSet1, values))!=PAPI_OK) ERROR_RETURN(retval); elapsed_us = PAPI_get_real_usec() - elapsed_us; elapsed_cyc = PAPI_get_real_cyc() - elapsed_cyc; /* disable overflowing */ retval = PAPI_overflow(EventSet1, PAPI_TOT_CYC, 0, 0, handler); if(retval !=PAPI_OK) ERROR_RETURN(retval); /* remove the event from the eventset */ retval = PAPI_remove_event(EventSet1, PAPI_TOT_INS); if (retval != PAPI_OK) ERROR_RETURN(retval); retval = PAPI_remove_event(EventSet1, PAPI_TOT_CYC); if (retval != PAPI_OK) ERROR_RETURN(retval); printf("Thread %#x PAPI_TOT_INS : \t%lld\n",(int)PAPI_thread_id(), values[0]); printf(" PAPI_TOT_CYC: \t%lld\n", values[1]); printf(" Real usec : \t%lld\n", elapsed_us); printf(" Real cycles : \t%lld\n", elapsed_cyc); pthread_exit(NULL); } int main(int argc, char **argv) { pthread_t thread_one; pthread_t thread_two; int flops1, flops2; int rc,retval; pthread_attr_t attr; long long elapsed_us, elapsed_cyc; /* papi library initialization */ if ((retval=PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) { printf("Library initialization error! \n"); exit(1); } /* thread initialization */ retval=PAPI_thread_init((unsigned long(*)(void))(pthread_self)); if (retval != PAPI_OK) ERROR_RETURN(retval); /* return the number of microseconds since some arbitrary starting point */ elapsed_us = PAPI_get_real_usec(); /* return the number of cycles since some arbitrary starting point */ elapsed_cyc = PAPI_get_real_cyc(); /* pthread attribution init */ pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); /* create the first thread */ flops1 = 1000000; rc = pthread_create(&thread_one, &attr, Thread, (void *)&flops1); if (rc) ERROR_RETURN(rc); /* create the second thread */ flops2 = 4000000; rc = pthread_create(&thread_two, &attr, Thread, (void *)&flops2); if (rc) ERROR_RETURN(rc); /* wait for the threads to finish */ pthread_attr_destroy(&attr); pthread_join(thread_one, NULL); pthread_join(thread_two, NULL); /* compute the elapsed cycles and microseconds */ elapsed_cyc = PAPI_get_real_cyc() - elapsed_cyc; elapsed_us = PAPI_get_real_usec() - elapsed_us; printf("Master real usec : \t%lld\n", elapsed_us); printf("Master real cycles : \t%lld\n", elapsed_cyc); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_profil.c0000600003276200002170000001134612247131121017016 0ustar ralphundrgrad/**************************************************************************** * PAPI_profil - generate PC histogram data * ****************************************************************************/ #include #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define FLOPS 1000000 #define THRESHOLD 100000 #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int code_to_monitor() { int i; double tmp=1.1; for(i=0; i < FLOPS; i++) { tmp=i+tmp; tmp++; } i = (int) tmp; return i; } int main() { unsigned long length; caddr_t start, end; PAPI_sprofil_t * prof; int EventSet = PAPI_NULL; /*must be initialized to PAPI_NULL before calling PAPI_create_event*/ int PAPI_event,i,tmp = 0; char event_name[PAPI_MAX_STR_LEN]; /*These are going to be used as buffers */ unsigned short *profbuf; long long values[2]; const PAPI_exe_info_t *prginfo = NULL; int retval; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } if ((prginfo = PAPI_get_executable_info()) == NULL) { fprintf(stderr, "Error in get executable information \n"); exit(1); } start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; length = (end - start); /* for PAPI_PROFIL_BUCKET_16 and scale = 65536, profile buffer length == program address length. Larger bucket sizes would increase the buffer length. Smaller scale factors would decrease it. Handle with care... */ profbuf = (unsigned short *)malloc(length); if (profbuf == NULL) { fprintf(stderr, "Not enough memory \n"); exit(1); } memset(profbuf,0x00,length); /* Creating the eventset */ if ( (retval = PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); PAPI_event = PAPI_TOT_INS; /* Add Total Instructions Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_event)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Cycles Executed to our EventSet */ if ( (retval = PAPI_add_event(EventSet, PAPI_TOT_CYC)) != PAPI_OK) ERROR_RETURN(retval); /* enable the collection of profiling information */ if ((retval = PAPI_profil(profbuf, length, start, 65536, EventSet, PAPI_event, THRESHOLD, PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16)) != PAPI_OK) ERROR_RETURN(retval); /* let's rock and roll */ if ((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); code_to_monitor(); if ((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); /* disable the collection of profiling information by setting threshold to 0 */ if ((retval = PAPI_profil(profbuf, length, start, 65536, EventSet, PAPI_event, 0, PAPI_PROFIL_POSIX)) != PAPI_OK) ERROR_RETURN(retval); printf("-----------------------------------------------------------\n"); printf("Text start: %p, Text end: %p, \n", prginfo->address_info.text_start,prginfo->address_info.text_end); printf("Data start: %p, Data end: %p\n", prginfo->address_info.data_start,prginfo->address_info.data_end); printf("BSS start : %p, BSS end: %p\n", prginfo->address_info.bss_start,prginfo->address_info.bss_end); printf("------------------------------------------\n"); printf("Test type : \tPAPI_PROFIL_POSIX\n"); printf("------------------------------------------\n\n\n"); printf("PAPI_profil() hash table.\n"); printf("address\t\tflat \n"); for (i = 0; i < (int) length/2; i++) { if (profbuf[i]) printf("0x%lx\t%d \n", (unsigned long) start + (unsigned long) (2 * i), profbuf[i]); } printf("-----------------------------------------\n"); retval = 0; for (i = 0; i < (int) length/2; i++) retval = retval || (profbuf[i]); if (retval) printf("Test succeeds! \n"); else printf( "No information in buffers\n"); /* clean up */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/examples/PAPI_reset.c0000600003276200002170000000513612247131121016645 0ustar ralphundrgrad/***************************************************************************** * PAPI_reset - resets the hardware event counters used by an EventSet. * *****************************************************************************/ #include #include #include "papi.h" /* This needs to be included every time you use PAPI */ #define ERROR_RETURN(retval) { fprintf(stderr, "Error %d %s:line %d: \n", retval,__FILE__,__LINE__); exit(retval); } int poorly_tuned_function() { float tmp; int i; for(i=1; i<2000; i++) { tmp=(tmp+100)/i; } return 0; } int main() { int EventSet = PAPI_NULL; /*must be initialized to PAPI_NULL before calling PAPI_create_event*/ int retval; unsigned int event_code=PAPI_TOT_INS; /* By default monitor total instructions */ char errstring[PAPI_MAX_STR_LEN]; long long values[1]; /**************************************************************************** * This part initializes the library and compares the version number of the * * header file, to the version of the library, if these don't match then it * * is likely that PAPI won't work correctly.If there is an error, retval * * keeps track of the version number. * ****************************************************************************/ if((retval = PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT ) { printf("Library initialization error! \n"); exit(1); } /* Creating the eventset */ if ( (retval=PAPI_create_eventset(&EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Add Total Instructions Executed to our EventSet */ if ((retval=PAPI_add_event(EventSet, event_code)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); poorly_tuned_function(); /* Stop counting */ if((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf("The first time read value is %lld\n",values[0]); /* This zeroes out the counters on the eventset that was created */ if((retval=PAPI_reset(EventSet)) != PAPI_OK) ERROR_RETURN(retval); /* Start counting */ if((retval=PAPI_start(EventSet)) != PAPI_OK) ERROR_RETURN(retval); poorly_tuned_function(); /* Stop counting */ if((retval=PAPI_stop(EventSet, values)) != PAPI_OK) ERROR_RETURN(retval); printf("The second time read value is %lld\n",values[0]); /* free the resources used by PAPI */ PAPI_shutdown(); exit(0); } papi-5.3.0/src/papi_libpfm_events.h0000600003276200002170000000333112247131124016745 0ustar ralphundrgrad#ifndef _PAPI_LIBPFM_EVENTS_H #define _PAPI_LIBPFM_EVENTS_H #include "papi.h" /* For PAPI_event_info_t */ #include "papi_vector.h" /* For papi_vector_t */ /* * File: papi_libpfm_events.h */ /* Prototypes for libpfm name library access */ int _papi_libpfm_error( int pfm_error ); int _papi_libpfm_setup_presets( char *name, int type, int cidx ); int _papi_libpfm_ntv_enum_events( unsigned int *EventCode, int modifier ); int _papi_libpfm_ntv_name_to_code( char *ntv_name, unsigned int *EventCode ); int _papi_libpfm_ntv_code_to_name( unsigned int EventCode, char *name, int len ); int _papi_libpfm_ntv_code_to_descr( unsigned int EventCode, char *name, int len ); int _papi_libpfm_ntv_code_to_bits( unsigned int EventCode, hwd_register_t * bits ); int _papi_libpfm_ntv_code_to_bits_perfctr( unsigned int EventCode, hwd_register_t * bits ); int _papi_libpfm_shutdown(void); int _papi_libpfm_init(papi_vector_t *my_vector, int cidx); int _pfm_decode_native_event( unsigned int EventCode, unsigned int *event, unsigned int *umask ); unsigned int _pfm_convert_umask( unsigned int event, unsigned int umask ); int prepare_umask( unsigned int foo, unsigned int *values ); int _papi_libpfm_ntv_code_to_info(unsigned int EventCode, PAPI_event_info_t *info); /* Gross perfctr/perf_events compatability hack */ /* need to think up a better way to handle this */ #ifndef __PERFMON_PERF_EVENT_H__ struct perf_event_attr { int config; int type; }; #define PERF_TYPE_RAW 4; #endif /* !__PERFMON_PERF_EVENT_H__ */ extern int _papi_libpfm_setup_counters( struct perf_event_attr *attr, hwd_register_t *ni_bits ); #endif // _PAPI_LIBPFM_EVENTS_H papi-5.3.0/src/solaris-lock.h0000600003276200002170000000022612247131126015503 0ustar ralphundrgradextern rwlock_t lock[PAPI_MAX_LOCK]; #define _papi_hwd_lock(lck) rw_wrlock(&lock[lck]); #define _papi_hwd_unlock(lck) rw_unlock(&lock[lck]); papi-5.3.0/src/linux-bgq-common.h0000600003276200002170000000303212247131124016271 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ /** * @file linux-bgq-common.h * CVS: $Id$ * @author Heike Jagode * jagode@eecs.utk.edu * Mods: < your name here > * < your email address > * BGPM component * * Tested version of bgpm (early access) * * @brief * This file is part of the source code for a component that enables PAPI-C to * access hardware monitoring counters for BG/Q through the bgpm library. */ #include "papi.h" /* Header required by BGPM */ #include "bgpm/include/bgpm.h" extern int _papi_hwi_publish_error( char *error ); // Define gymnastics to create a compile time AT string. #define STRINGIFY(x) #x #define TOSTRING(x) STRINGIFY(x) #define _AT_ __FILE__ ":" TOSTRING(__LINE__) /* return EXIT_FAILURE; \*/ #define MAX_COUNTERS ( PEVT_LAST_EVENT + 1 ) //#define DEBUG_BGQ /************************* COMMON PROTOTYPES ********************************* *******************************************************************************/ /* common prototypes for BGQ sustrate and BGPM components */ int _check_BGPM_error( int err, char* bgpmfunc ); long_long _common_getEventValue( unsigned event_id, int EventGroup ); int _common_deleteRecreate( int *EventGroup_ptr ); int _common_rebuildEventgroup( int count, int *EventGroup_local, int *EventGroup_ptr ); int _common_set_overflow_BGPM( int EventGroup, int evt_idx, int threshold, void (*handler)(int, uint64_t, uint64_t, const ucontext_t *) ); papi-5.3.0/src/linux-bgp-memory.c0000600003276200002170000000250712247131124016311 0ustar ralphundrgrad/* * File: linux-bgp-memory.c * Author: Dave Hermsmeier * dlherms@us.ibm.com */ #include "papi.h" #include "papi_internal.h" #ifdef __LINUX__ #include #endif #include #include /* * Prototypes... */ int init_bgp( PAPI_mh_info_t * pMem_Info ); /* * Get Memory Information * * Fills in memory information - effectively set to all 0x00's */ extern int _bgp_get_memory_info( PAPI_hw_info_t * pHwInfo, int pCPU_Type ) { int retval = 0; switch ( pCPU_Type ) { default: //fprintf(stderr,"Default CPU type in %s (%d)\n",__FUNCTION__,__LINE__); retval = init_bgp( &pHwInfo->mem_hierarchy ); break; } return retval; } /* * Get DMem Information for BG/P * * NOTE: Currently, all values set to -1 */ extern int _bgp_get_dmem_info( PAPI_dmem_info_t * pDmemInfo ) { pDmemInfo->size = PAPI_EINVAL; pDmemInfo->resident = PAPI_EINVAL; pDmemInfo->high_water_mark = PAPI_EINVAL; pDmemInfo->shared = PAPI_EINVAL; pDmemInfo->text = PAPI_EINVAL; pDmemInfo->library = PAPI_EINVAL; pDmemInfo->heap = PAPI_EINVAL; pDmemInfo->locked = PAPI_EINVAL; pDmemInfo->stack = PAPI_EINVAL; pDmemInfo->pagesize = PAPI_EINVAL; return PAPI_OK; } /* * Cache configuration for BG/P */ int init_bgp( PAPI_mh_info_t * pMem_Info ) { memset( pMem_Info, 0x0, sizeof ( *pMem_Info ) ); return PAPI_OK; } papi-5.3.0/src/papi_lock.h0000600003276200002170000000151512247131124015042 0ustar ralphundrgrad#ifndef _PAPI_DEFINES_H #define _PAPI_DEFINES_H /* Thread related: locks */ #define INTERNAL_LOCK PAPI_NUM_LOCK+0 /* papi_internal.c */ #define MULTIPLEX_LOCK PAPI_NUM_LOCK+1 /* multiplex.c */ #define THREADS_LOCK PAPI_NUM_LOCK+2 /* threads.c */ #define HIGHLEVEL_LOCK PAPI_NUM_LOCK+3 /* papi_hl.c */ #define MEMORY_LOCK PAPI_NUM_LOCK+4 /* papi_memory.c */ #define COMPONENT_LOCK PAPI_NUM_LOCK+5 /* per-component */ #define GLOBAL_LOCK PAPI_NUM_LOCK+6 /* papi.c for global variable (static and non) initialization/shutdown */ #define CPUS_LOCK PAPI_NUM_LOCK+7 /* cpus.c */ #define NAMELIB_LOCK PAPI_NUM_LOCK+8 /* papi_pfm4_events.c */ #define NUM_INNER_LOCK 9 #define PAPI_MAX_LOCK (NUM_INNER_LOCK + PAPI_NUM_LOCK) #include OSLOCK #endif papi-5.3.0/src/solaris-common.c0000600003276200002170000005222312247131126016042 0ustar ralphundrgrad#include "papi.h" #include "papi_internal.h" #include "papi_vector.h" #include "papi_memory.h" #include "solaris-common.h" #include #if 0 /* once the bug in dladdr is fixed by SUN, (now dladdr caused deadlock when used with pthreads) this function can be used again */ int _solaris_update_shlib_info( papi_mdi_t *mdi ) { char fname[80], name[PAPI_HUGE_STR_LEN]; prmap_t newp; int count, t_index; FILE *map_f; void *vaddr; Dl_info dlip; PAPI_address_map_t *tmp = NULL; sprintf( fname, "/proc/%d/map", getpid( ) ); map_f = fopen( fname, "r" ); if ( !map_f ) { PAPIERROR( "fopen(%s) returned < 0", fname ); return ( PAPI_OK ); } /* count the entries we need */ count = 0; t_index = 0; while ( fread( &newp, sizeof ( prmap_t ), 1, map_f ) > 0 ) { vaddr = ( void * ) ( 1 + ( newp.pr_vaddr ) ); // map base address if ( dladdr( vaddr, &dlip ) > 0 ) { count++; if ( ( newp.pr_mflags & MA_EXEC ) && ( newp.pr_mflags & MA_READ ) ) { if ( !( newp.pr_mflags & MA_WRITE ) ) t_index++; } strcpy( name, dlip.dli_fname ); if ( strcmp( _papi_hwi_system_info.exe_info.address_info.name, basename( name ) ) == 0 ) { if ( ( newp.pr_mflags & MA_EXEC ) && ( newp.pr_mflags & MA_READ ) ) { if ( !( newp.pr_mflags & MA_WRITE ) ) { _papi_hwi_system_info.exe_info.address_info.text_start = ( caddr_t ) newp.pr_vaddr; _papi_hwi_system_info.exe_info.address_info.text_end = ( caddr_t ) ( newp.pr_vaddr + newp.pr_size ); } else { _papi_hwi_system_info.exe_info.address_info.data_start = ( caddr_t ) newp.pr_vaddr; _papi_hwi_system_info.exe_info.address_info.data_end = ( caddr_t ) ( newp.pr_vaddr + newp.pr_size ); } } } } } rewind( map_f ); tmp = ( PAPI_address_map_t * ) papi_calloc( t_index - 1, sizeof ( PAPI_address_map_t ) ); if ( tmp == NULL ) { PAPIERROR( "Error allocating shared library address map" ); return ( PAPI_ENOMEM ); } t_index = -1; while ( fread( &newp, sizeof ( prmap_t ), 1, map_f ) > 0 ) { vaddr = ( void * ) ( 1 + ( newp.pr_vaddr ) ); // map base address if ( dladdr( vaddr, &dlip ) > 0 ) { // valid name strcpy( name, dlip.dli_fname ); if ( strcmp( _papi_hwi_system_info.exe_info.address_info.name, basename( name ) ) == 0 ) continue; if ( ( newp.pr_mflags & MA_EXEC ) && ( newp.pr_mflags & MA_READ ) ) { if ( !( newp.pr_mflags & MA_WRITE ) ) { t_index++; tmp[t_index].text_start = ( caddr_t ) newp.pr_vaddr; tmp[t_index].text_end = ( caddr_t ) ( newp.pr_vaddr + newp.pr_size ); strncpy( tmp[t_index].name, dlip.dli_fname, PAPI_HUGE_STR_LEN - 1 ); tmp[t_index].name[PAPI_HUGE_STR_LEN - 1] = '\0'; } else { if ( t_index < 0 ) continue; tmp[t_index].data_start = ( caddr_t ) newp.pr_vaddr; tmp[t_index].data_end = ( caddr_t ) ( newp.pr_vaddr + newp.pr_size ); } } } } fclose( map_f ); if ( _papi_hwi_system_info.shlib_info.map ) papi_free( _papi_hwi_system_info.shlib_info.map ); _papi_hwi_system_info.shlib_info.map = tmp; _papi_hwi_system_info.shlib_info.count = t_index + 1; return PAPI_OK; } #endif int _papi_hwi_init_os(void) { struct utsname uname_buffer; uname(&uname_buffer); strncpy(_papi_os_info.name,uname_buffer.sysname,PAPI_MAX_STR_LEN); strncpy(_papi_os_info.version,uname_buffer.release,PAPI_MAX_STR_LEN); _papi_os_info.itimer_sig = PAPI_INT_MPX_SIGNAL; _papi_os_info.itimer_num = PAPI_INT_ITIMER; _papi_os_info.itimer_ns = PAPI_INT_MPX_DEF_US * 1000; _papi_os_info.itimer_res_ns = 1; return PAPI_OK; } #if 0 int _ultra_hwd_update_shlib_info( papi_mdi_t *mdi ) { /*??? system call takes very long */ char cmd_line[PAPI_HUGE_STR_LEN + PAPI_HUGE_STR_LEN], fname[L_tmpnam]; char line[256]; char address[16], size[10], flags[64], objname[256]; PAPI_address_map_t *tmp = NULL; FILE *f = NULL; int t_index = 0, i; struct map_record { long address; int size; int flags; char objname[256]; struct map_record *next; } *tmpr, *head, *curr; tmpnam( fname ); SUBDBG( "Temporary name %s\n", fname ); sprintf( cmd_line, "/bin/pmap %d > %s", ( int ) getpid( ), fname ); if ( system( cmd_line ) != 0 ) { PAPIERROR( "Could not run %s to get shared library address map", cmd_line ); return ( PAPI_OK ); } f = fopen( fname, "r" ); if ( f == NULL ) { PAPIERROR( "fopen(%s) returned < 0", fname ); remove( fname ); return ( PAPI_OK ); } /* ignore the first line */ fgets( line, 256, f ); head = curr = NULL; while ( fgets( line, 256, f ) != NULL ) { /* discard the last line */ if ( strncmp( line, " total", 6 ) != 0 ) { sscanf( line, "%s %s %s %s", address, size, flags, objname ); if ( objname[0] == '/' ) { tmpr = ( struct map_record * ) papi_malloc( sizeof ( struct map_record ) ); if ( tmpr == NULL ) return ( -1 ); tmpr->next = NULL; if ( curr ) { curr->next = tmpr; curr = tmpr; } if ( head == NULL ) { curr = head = tmpr; } SUBDBG( "%s\n", objname ); if ( ( strstr( flags, "read" ) && strstr( flags, "exec" ) ) || ( strstr( flags, "r" ) && strstr( flags, "x" ) ) ) { if ( !( strstr( flags, "write" ) || strstr( flags, "w" ) ) ) { /* text segment */ t_index++; tmpr->flags = 1; } else { tmpr->flags = 0; } sscanf( address, "%lx", &tmpr->address ); sscanf( size, "%d", &tmpr->size ); tmpr->size *= 1024; strcpy( tmpr->objname, objname ); } } } } tmp = ( PAPI_address_map_t * ) papi_calloc( t_index - 1, sizeof ( PAPI_address_map_t ) ); if ( tmp == NULL ) { PAPIERROR( "Error allocating shared library address map" ); return ( PAPI_ENOMEM ); } t_index = -1; tmpr = curr = head; i = 0; while ( curr != NULL ) { if ( strcmp( _papi_hwi_system_info.exe_info.address_info.name, basename( curr->objname ) ) == 0 ) { if ( curr->flags ) { _papi_hwi_system_info.exe_info.address_info.text_start = ( caddr_t ) curr->address; _papi_hwi_system_info.exe_info.address_info.text_end = ( caddr_t ) ( curr->address + curr->size ); } else { _papi_hwi_system_info.exe_info.address_info.data_start = ( caddr_t ) curr->address; _papi_hwi_system_info.exe_info.address_info.data_end = ( caddr_t ) ( curr->address + curr->size ); } } else { if ( curr->flags ) { t_index++; tmp[t_index].text_start = ( caddr_t ) curr->address; tmp[t_index].text_end = ( caddr_t ) ( curr->address + curr->size ); strncpy( tmp[t_index].name, curr->objname, PAPI_HUGE_STR_LEN - 1 ); tmp[t_index].name[PAPI_HUGE_STR_LEN - 1] = '\0'; } else { if ( t_index < 0 ) continue; tmp[t_index].data_start = ( caddr_t ) curr->address; tmp[t_index].data_end = ( caddr_t ) ( curr->address + curr->size ); } } tmpr = curr->next; /* free the temporary allocated memory */ papi_free( curr ); curr = tmpr; } /* end of while */ remove( fname ); fclose( f ); if ( _papi_hwi_system_info.shlib_info.map ) papi_free( _papi_hwi_system_info.shlib_info.map ); _papi_hwi_system_info.shlib_info.map = tmp; _papi_hwi_system_info.shlib_info.count = t_index + 1; return ( PAPI_OK ); } #endif /* From niagara2 code */ int _solaris_update_shlib_info( papi_mdi_t *mdi ) { char *file = "/proc/self/map"; char *resolve_pattern = "/proc/self/path/%s"; char lastobject[PRMAPSZ]; char link[PAPI_HUGE_STR_LEN]; char path[PAPI_HUGE_STR_LEN]; prmap_t mapping; int fd, count = 0, total = 0, position = -1, first = 1; caddr_t t_min, t_max, d_min, d_max; PAPI_address_map_t *pam, *cur; #ifdef DEBUG SUBDBG( "ENTERING FUNCTION >>%s<< at %s:%d\n", __func__, __FILE__, __LINE__ ); #endif fd = open( file, O_RDONLY ); if ( fd == -1 ) { return PAPI_ESYS; } memset( lastobject, 0, PRMAPSZ ); #ifdef DEBUG SUBDBG( " -> %s: Preprocessing memory maps from procfs\n", __func__ ); #endif /* Search through the list of mappings in order to identify a) how many mappings are available and b) how many unique mappings are available. */ while ( read( fd, &mapping, sizeof ( prmap_t ) ) > 0 ) { #ifdef DEBUG SUBDBG( " -> %s: Found a new memory map entry\n", __func__ ); #endif /* Another entry found, just the total count of entries. */ total++; /* Is the mapping accessible and not anonymous? */ if ( mapping.pr_mflags & ( MA_READ | MA_WRITE | MA_EXEC ) && !( mapping.pr_mflags & MA_ANON ) ) { /* Test if a new library has been found. If a new library has been found a new entry needs to be counted. */ if ( strcmp( lastobject, mapping.pr_mapname ) != 0 ) { strncpy( lastobject, mapping.pr_mapname, PRMAPSZ ); count++; #ifdef DEBUG SUBDBG( " -> %s: Memory mapping entry valid for %s\n", __func__, mapping.pr_mapname ); #endif } } } #ifdef DEBUG SUBDBG( " -> %s: Preprocessing done, starting to analyze\n", __func__ ); #endif /* Start from the beginning, now fill in the found mappings */ if ( lseek( fd, 0, SEEK_SET ) == -1 ) { return PAPI_ESYS; } memset( lastobject, 0, PRMAPSZ ); /* Allocate memory */ pam = ( PAPI_address_map_t * ) papi_calloc( count, sizeof ( PAPI_address_map_t ) ); while ( read( fd, &mapping, sizeof ( prmap_t ) ) > 0 ) { if ( mapping.pr_mflags & MA_ANON ) { #ifdef DEBUG SUBDBG ( " -> %s: Anonymous mapping (MA_ANON) found for %s, skipping\n", __func__, mapping.pr_mapname ); #endif continue; } /* Check for a new entry */ if ( strcmp( mapping.pr_mapname, lastobject ) != 0 ) { #ifdef DEBUG SUBDBG( " -> %s: Analyzing mapping for %s\n", __func__, mapping.pr_mapname ); #endif cur = &( pam[++position] ); strncpy( lastobject, mapping.pr_mapname, PRMAPSZ ); snprintf( link, PAPI_HUGE_STR_LEN, resolve_pattern, lastobject ); memset( path, 0, PAPI_HUGE_STR_LEN ); readlink( link, path, PAPI_HUGE_STR_LEN ); strncpy( cur->name, path, PAPI_HUGE_STR_LEN ); #ifdef DEBUG SUBDBG( " -> %s: Resolved name for %s: %s\n", __func__, mapping.pr_mapname, cur->name ); #endif } if ( mapping.pr_mflags & MA_READ ) { /* Data (MA_WRITE) or text (MA_READ) segment? */ if ( mapping.pr_mflags & MA_WRITE ) { cur->data_start = ( caddr_t ) mapping.pr_vaddr; cur->data_end = ( caddr_t ) ( mapping.pr_vaddr + mapping.pr_size ); if ( strcmp ( cur->name, _papi_hwi_system_info.exe_info.fullname ) == 0 ) { _papi_hwi_system_info.exe_info.address_info.data_start = cur->data_start; _papi_hwi_system_info.exe_info.address_info.data_end = cur->data_end; } if ( first ) d_min = cur->data_start; if ( first ) d_max = cur->data_end; if ( cur->data_start < d_min ) { d_min = cur->data_start; } if ( cur->data_end > d_max ) { d_max = cur->data_end; } } else if ( mapping.pr_mflags & MA_EXEC ) { cur->text_start = ( caddr_t ) mapping.pr_vaddr; cur->text_end = ( caddr_t ) ( mapping.pr_vaddr + mapping.pr_size ); if ( strcmp ( cur->name, _papi_hwi_system_info.exe_info.fullname ) == 0 ) { _papi_hwi_system_info.exe_info.address_info.text_start = cur->text_start; _papi_hwi_system_info.exe_info.address_info.text_end = cur->text_end; } if ( first ) t_min = cur->text_start; if ( first ) t_max = cur->text_end; if ( cur->text_start < t_min ) { t_min = cur->text_start; } if ( cur->text_end > t_max ) { t_max = cur->text_end; } } } first = 0; } close( fd ); /* During the walk of shared objects the upper and lower bound of the segments could be discovered. The bounds are stored in the PAPI info structure. The information is important for the profiling functions of PAPI. */ /* This variant would pass the addresses of all text and data segments _papi_hwi_system_info.exe_info.address_info.text_start = t_min; _papi_hwi_system_info.exe_info.address_info.text_end = t_max; _papi_hwi_system_info.exe_info.address_info.data_start = d_min; _papi_hwi_system_info.exe_info.address_info.data_end = d_max; */ #ifdef DEBUG SUBDBG( " -> %s: Analysis of memory maps done, results:\n", __func__ ); SUBDBG( " -> %s: text_start=%#x, text_end=%#x, text_size=%lld\n", __func__, _papi_hwi_system_info.exe_info.address_info.text_start, _papi_hwi_system_info.exe_info.address_info.text_end, _papi_hwi_system_info.exe_info.address_info.text_end - _papi_hwi_system_info.exe_info.address_info.text_start ); SUBDBG( " -> %s: data_start=%#x, data_end=%#x, data_size=%lld\n", __func__, _papi_hwi_system_info.exe_info.address_info.data_start, _papi_hwi_system_info.exe_info.address_info.data_end, _papi_hwi_system_info.exe_info.address_info.data_end - _papi_hwi_system_info.exe_info.address_info.data_start ); #endif /* Store the map read and the total count of shlibs found */ _papi_hwi_system_info.shlib_info.map = pam; _papi_hwi_system_info.shlib_info.count = count; #ifdef DEBUG SUBDBG( "LEAVING FUNCTION >>%s<< at %s:%d\n", __func__, __FILE__, __LINE__ ); #endif return PAPI_OK; } #if 0 int _niagara2_get_system_info( papi_mdi_t *mdi ) { // Used for evaluating return values int retval = 0; // Check for process settings pstatus_t *proc_status; psinfo_t *proc_info; // Used for string truncating char *c_ptr; // For retrieving the executable full name char exec_name[PAPI_HUGE_STR_LEN]; // For retrieving processor information __sol_processor_information_t cpus; #ifdef DEBUG SUBDBG( "ENTERING FUNCTION >>%s<< at %s:%d\n", __func__, __FILE__, __LINE__ ); #endif /* Get and set pid */ pid = getpid( ); /* Check for microstate accounting */ proc_status = __sol_get_proc_status( pid ); if ( proc_status->pr_flags & PR_MSACCT == 0 || proc_status->pr_flags & PR_MSFORK == 0 ) { /* Solaris 10 should have microstate accounting always activated */ return PAPI_ECMP; } /* Fill _papi_hwi_system_info.exe_info.fullname */ proc_info = __sol_get_proc_info( pid ); // If there are arguments, trim the string to the executable name. if ( proc_info->pr_argc > 1 ) { c_ptr = strchr( proc_info->pr_psargs, ' ' ); if ( c_ptr != NULL ) c_ptr = '\0'; } /* If the path can be qualified, use the full path, otherwise the trimmed name. */ if ( realpath( proc_info->pr_psargs, exec_name ) != NULL ) { strncpy( _papi_hwi_system_info.exe_info.fullname, exec_name, PAPI_HUGE_STR_LEN ); } else { strncpy( _papi_hwi_system_info.exe_info.fullname, proc_info->pr_psargs, PAPI_HUGE_STR_LEN ); } /* Fill _papi_hwi_system_info.exe_info.address_info */ // Taken from the old component strncpy( _papi_hwi_system_info.exe_info.address_info.name, basename( _papi_hwi_system_info.exe_info.fullname ), PAPI_HUGE_STR_LEN ); __CHECK_ERR_PAPI( _niagara2_update_shlib_info( &_papi_hwi_system_info ) ); /* Fill _papi_hwi_system_info.hw_info */ // Taken from the old component _papi_hwi_system_info.hw_info.ncpu = sysconf( _SC_NPROCESSORS_ONLN ); _papi_hwi_system_info.hw_info.nnodes = 1; _papi_hwi_system_info.hw_info.vendor = PAPI_VENDOR_SUN; strcpy( _papi_hwi_system_info.hw_info.vendor_string, "SUN" ); _papi_hwi_system_info.hw_info.totalcpus = sysconf( _SC_NPROCESSORS_CONF ); _papi_hwi_system_info.hw_info.model = 1; strcpy( _papi_hwi_system_info.hw_info.model_string, cpc_cciname( cpc ) ); /* The field sparc-version is no longer in prtconf -pv */ _papi_hwi_system_info.hw_info.revision = 1; /* Clock speed */ _papi_hwi_system_info.hw_info.mhz = ( float ) __sol_get_processor_clock( ); _papi_hwi_system_info.hw_info.clock_mhz = __sol_get_processor_clock( ); _papi_hwi_system_info.hw_info.cpu_max_mhz = __sol_get_processor_clock( ); _papi_hwi_system_info.hw_info.cpu_min_mhz = __sol_get_processor_clock( ); /* Fill _niagara2_vector.cmp_info.mem_hierarchy */ _niagara2_get_memory_info( &_papi_hwi_system_info.hw_info, 0 ); /* Fill _papi_hwi_system_info.sub_info */ strcpy( _niagara2_vector.cmp_info.name, "SunNiagara2" ); strcpy( _niagara2_vector.cmp_info.version, "ALPHA" ); strcpy( _niagara2_vector.cmp_info.support_version, "libcpc2" ); strcpy( _niagara2_vector.cmp_info.kernel_version, "libcpc2" ); /* libcpc2 uses SIGEMT using real hardware signals, no sw emu */ #ifdef DEBUG SUBDBG( "LEAVING FUNCTION >>%s<< at %s:%d\n", __func__, __FILE__, __LINE__ ); #endif return PAPI_OK; } #endif int _solaris_get_system_info( papi_mdi_t *mdi ) { int retval; pid_t pid; char maxargs[PAPI_MAX_STR_LEN] = ""; psinfo_t psi; int fd; int hz, version; char cpuname[PAPI_MAX_STR_LEN], pname[PAPI_HUGE_STR_LEN]; /* Check counter access */ if ( cpc_version( CPC_VER_CURRENT ) != CPC_VER_CURRENT ) return PAPI_ECMP; SUBDBG( "CPC version %d successfully opened\n", CPC_VER_CURRENT ); if ( cpc_access( ) == -1 ) return PAPI_ECMP; /* Global variable cpuver */ cpuver = cpc_getcpuver( ); SUBDBG( "Got %d from cpc_getcpuver()\n", cpuver ); if ( cpuver == -1 ) return PAPI_ECMP; #ifdef DEBUG { if ( ISLEVEL( DEBUG_SUBSTRATE ) ) { const char *name; int i; name = cpc_getcpuref( cpuver ); if ( name ) { SUBDBG( "CPC CPU reference: %s\n", name ); } else { SUBDBG( "Could not get a CPC CPU reference\n" ); } for ( i = 0; i < cpc_getnpic( cpuver ); i++ ) { SUBDBG( "\n%6s %-40s %8s\n", "Reg", "Symbolic name", "Code" ); cpc_walk_names( cpuver, i, "%6d %-40s %02x\n", print_walk_names ); } SUBDBG( "\n" ); } } #endif /* Initialize other globals */ if ( ( retval = build_tables( ) ) != PAPI_OK ) return retval; preset_search_map = preset_table; if ( cpuver <= CPC_ULTRA2 ) { SUBDBG( "cpuver (==%d) <= CPC_ULTRA2 (==%d)\n", cpuver, CPC_ULTRA2 ); pcr_shift[0] = CPC_ULTRA_PCR_PIC0_SHIFT; pcr_shift[1] = CPC_ULTRA_PCR_PIC1_SHIFT; } else if ( cpuver <= LASTULTRA3 ) { SUBDBG( "cpuver (==%d) <= CPC_ULTRA3x (==%d)\n", cpuver, LASTULTRA3 ); pcr_shift[0] = CPC_ULTRA_PCR_PIC0_SHIFT; pcr_shift[1] = CPC_ULTRA_PCR_PIC1_SHIFT; _solaris_vector.cmp_info.hardware_intr = 1; _solaris_vector.cmp_info.hardware_intr_sig = SIGEMT; } else return PAPI_ECMP; /* Path and args */ pid = getpid( ); if ( pid == -1 ) return ( PAPI_ESYS ); /* Turn on microstate accounting for this process and any LWPs. */ sprintf( maxargs, "/proc/%d/ctl", ( int ) pid ); if ( ( fd = open( maxargs, O_WRONLY ) ) == -1 ) return ( PAPI_ESYS ); { int retval; struct { long cmd; long flags; } cmd; cmd.cmd = PCSET; cmd.flags = PR_MSACCT | PR_MSFORK; retval = write( fd, &cmd, sizeof ( cmd ) ); close( fd ); SUBDBG( "Write PCSET returned %d\n", retval ); if ( retval != sizeof ( cmd ) ) return ( PAPI_ESYS ); } /* Get executable info */ sprintf( maxargs, "/proc/%d/psinfo", ( int ) pid ); if ( ( fd = open( maxargs, O_RDONLY ) ) == -1 ) return ( PAPI_ESYS ); read( fd, &psi, sizeof ( psi ) ); close( fd ); /* Cut off any arguments to exe */ { char *tmp; tmp = strchr( psi.pr_psargs, ' ' ); if ( tmp != NULL ) *tmp = '\0'; } if ( realpath( psi.pr_psargs, pname ) ) strncpy( _papi_hwi_system_info.exe_info.fullname, pname, PAPI_HUGE_STR_LEN ); else strncpy( _papi_hwi_system_info.exe_info.fullname, psi.pr_psargs, PAPI_HUGE_STR_LEN ); /* please don't use pr_fname here, because it can only store less that 16 characters */ strcpy( _papi_hwi_system_info.exe_info.address_info.name, basename( _papi_hwi_system_info.exe_info.fullname ) ); SUBDBG( "Full Executable is %s\n", _papi_hwi_system_info.exe_info.fullname ); /* Executable regions, reading /proc/pid/maps file */ retval = _ultra_hwd_update_shlib_info( &_papi_hwi_system_info ); /* Hardware info */ _papi_hwi_system_info.hw_info.ncpu = sysconf( _SC_NPROCESSORS_ONLN ); _papi_hwi_system_info.hw_info.nnodes = 1; _papi_hwi_system_info.hw_info.totalcpus = sysconf( _SC_NPROCESSORS_CONF ); retval = scan_prtconf( cpuname, PAPI_MAX_STR_LEN, &hz, &version ); if ( retval == -1 ) return PAPI_ECMP; strcpy( _papi_hwi_system_info.hw_info.model_string, cpc_getcciname( cpuver ) ); _papi_hwi_system_info.hw_info.model = cpuver; strcpy( _papi_hwi_system_info.hw_info.vendor_string, "SUN" ); _papi_hwi_system_info.hw_info.vendor = PAPI_VENDOR_SUN; _papi_hwi_system_info.hw_info.revision = version; _papi_hwi_system_info.hw_info.mhz = ( ( float ) hz / 1.0e6 ); SUBDBG( "hw_info.mhz = %f\n", _papi_hwi_system_info.hw_info.mhz ); _papi_hwi_system_info.hw_info.cpu_max_mhz = _papi_hwi_system_info.hw_info.mhz; _papi_hwi_system_info.hw_info.cpu_min_mhz = _papi_hwi_system_info.hw_info.mhz; /* Number of PMCs */ retval = cpc_getnpic( cpuver ); if ( retval < 0 ) return PAPI_ECMP; _solaris_vector.cmp_info.num_cntrs = retval; _solaris_vector.cmp_info.fast_real_timer = 1; _solaris_vector.cmp_info.fast_virtual_timer = 1; _solaris_vector.cmp_info.default_domain = PAPI_DOM_USER; _solaris_vector.cmp_info.available_domains = PAPI_DOM_USER | PAPI_DOM_KERNEL; /* Setup presets */ retval = _papi_hwi_setup_all_presets( preset_search_map, NULL ); if ( retval ) return ( retval ); return ( PAPI_OK ); } long long _solaris_get_real_usec( void ) { return ( ( long long ) gethrtime( ) / ( long long ) 1000 ); } long long _solaris_get_real_cycles( void ) { return ( _ultra_hwd_get_real_usec( ) * ( long long ) _papi_hwi_system_info.hw_info.cpu_max_mhz ); } long long _solaris_get_virt_usec( void ) { return ( ( long long ) gethrvtime( ) / ( long long ) 1000 ); } papi-5.3.0/src/freebsd-memory.h0000600003276200002170000000017012247131121016012 0ustar ralphundrgradint _freebsd_get_memory_info( PAPI_hw_info_t *hw_info, int id); int _papi_freebsd_get_dmem_info(PAPI_dmem_info_t *d); papi-5.3.0/src/papi_debug.h0000600003276200002170000001476212247131124015210 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ /** * @file papi_debug.h * @author Philip Mucci * mucci@cs.utk.edu * @author Dan Terpstra * terpstra.utk.edu * @author Kevin London * london@cs.utk.edu * @author Haihang You * you@cs.utk.edu */ #ifndef _PAPI_DEBUG_H #define _PAPI_DEBUG_H #ifdef NO_VARARG_MACRO #include #endif #include /* Debug Levels */ #define DEBUG_SUBSTRATE 0x002 #define DEBUG_API 0x004 #define DEBUG_INTERNAL 0x008 #define DEBUG_THREADS 0x010 #define DEBUG_MULTIPLEX 0x020 #define DEBUG_OVERFLOW 0x040 #define DEBUG_PROFILE 0x080 #define DEBUG_MEMORY 0x100 #define DEBUG_LEAK 0x200 #define DEBUG_ALL (DEBUG_SUBSTRATE|DEBUG_API|DEBUG_INTERNAL|DEBUG_THREADS|DEBUG_MULTIPLEX|DEBUG_OVERFLOW|DEBUG_PROFILE|DEBUG_MEMORY|DEBUG_LEAK) /* Please get rid of the DBG macro from your code */ extern int _papi_hwi_debug; extern unsigned long int ( *_papi_hwi_thread_id_fn ) ( void ); #ifdef DEBUG #ifdef __GNUC__ #define FUNC __FUNCTION__ #elif defined(__func__) #define FUNC __func__ #else #define FUNC "?" #endif #define DEBUGLABEL(a) if (_papi_hwi_thread_id_fn) fprintf(stderr, "%s:%s:%s:%d:%d:0x%lx ",a,__FILE__, FUNC, __LINE__,(int)getpid(),_papi_hwi_thread_id_fn()); else fprintf(stderr, "%s:%s:%s:%d:%d ",a,__FILE__, FUNC, __LINE__, (int)getpid()) #define ISLEVEL(a) (_papi_hwi_debug&a) #define DEBUGLEVEL(a) ((a&DEBUG_SUBSTRATE)?"SUBSTRATE":(a&DEBUG_API)?"API":(a&DEBUG_INTERNAL)?"INTERNAL":(a&DEBUG_THREADS)?"THREADS":(a&DEBUG_MULTIPLEX)?"MULTIPLEX":(a&DEBUG_OVERFLOW)?"OVERFLOW":(a&DEBUG_PROFILE)?"PROFILE":(a&DEBUG_MEMORY)?"MEMORY":(a&DEBUG_LEAK)?"LEAK":"UNKNOWN") #ifndef NO_VARARG_MACRO /* Has variable arg macro support */ #define PAPIDEBUG(level,format, args...) { if(_papi_hwi_debug&level){DEBUGLABEL(DEBUGLEVEL(level));fprintf(stderr,format, ## args);}} /* Macros */ #define SUBDBG(format, args...) (PAPIDEBUG(DEBUG_SUBSTRATE,format, ## args)) #define APIDBG(format, args...) (PAPIDEBUG(DEBUG_API,format, ## args)) #define INTDBG(format, args...) (PAPIDEBUG(DEBUG_INTERNAL,format, ## args)) #define THRDBG(format, args...) (PAPIDEBUG(DEBUG_THREADS,format, ## args)) #define MPXDBG(format, args...) (PAPIDEBUG(DEBUG_MULTIPLEX,format, ## args)) #define OVFDBG(format, args...) (PAPIDEBUG(DEBUG_OVERFLOW,format, ## args)) #define PRFDBG(format, args...) (PAPIDEBUG(DEBUG_PROFILE,format, ## args)) #define MEMDBG(format, args...) (PAPIDEBUG(DEBUG_MEMORY,format, ## args)) #define LEAKDBG(format, args...) (PAPIDEBUG(DEBUG_LEAK,format, ## args)) #endif #else #ifndef NO_VARARG_MACRO /* Has variable arg macro support */ #define SUBDBG(format, args...) { ; } #define APIDBG(format, args...) { ; } #define INTDBG(format, args...) { ; } #define THRDBG(format, args...) { ; } #define MPXDBG(format, args...) { ; } #define OVFDBG(format, args...) { ; } #define PRFDBG(format, args...) { ; } #define MEMDBG(format, args...) { ; } #define LEAKDBG(format, args...) { ; } #define PAPIDEBUG(level, format, args...) { ; } #endif #endif /* * Debug functions for platforms without vararg macro support */ #ifdef NO_VARARG_MACRO static void PAPIDEBUG( int level, char *format, va_list args ) { #ifdef DEBUG if ( ISLEVEL( level ) ) { vfprintf( stderr, format, args ); } else #endif return; } static void _SUBDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_SUBSTRATE, format, args ); va_end(args); #endif } #ifdef DEBUG #define SUBDBG do { \ if (DEBUG_SUBSTRATE & _papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_SUBSTRATE ) ); \ } \ } while(0); _SUBDBG #else #define SUBDBG _SUBDBG #endif static void _APIDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_API, format, args ); va_end(args); #endif } #ifdef DEBUG #define APIDBG do { \ if (DEBUG_API&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_API ) ); \ } \ } while(0); _APIDBG #else #define APIDBG _APIDBG #endif static void _INTDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_INTERNAL, format, args ); va_end(args); #endif } #ifdef DEBUG #define INTDBG do { \ if (DEBUG_INTERNAL&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_INTERNAL ) ); \ } \ } while(0); _INTDBG #else #define INTDBG _INTDBG #endif static void _THRDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_THREADS, format, args ); va_end(args); #endif } #ifdef DEBUG #define THRDBG do { \ if (DEBUG_THREADS&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_THREADS ) ); \ } \ } while(0); _THRDBG #else #define THRDBG _THRDBG #endif static void _MPXDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_MULTIPLEX, format, args ); va_end(args); #endif } #ifdef DEBUG #define MPXDBG do { \ if (DEBUG_MULTIPLEX&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_MULTIPLEX ) ); \ } \ } while(0); _MPXDBG #else #define MPXDBG _MPXDBG #endif static void _OVFDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_OVERFLOW, format, args ); va_end(args); #endif } #ifdef DEBUG #define OVFDBG do { \ if (DEBUG_OVERFLOW&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_OVERFLOW ) ); \ } \ } while(0); _OVFDBG #else #define OVFDBG _OVFDBG #endif static void _PRFDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_PROFILE, format, args ); va_end(args); #endif } #ifdef DEBUG #define PRFDBG do { \ if (DEBUG_PROFILE&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_PROFILE ) ); \ } \ } while(0); _PRFDBG #else #define PRFDBG _PRFDBG #endif static void _MEMDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_MEMORY, format , args); va_end(args); #endif } #ifdef DEBUG #define MEMDBG do { \ if (DEBUG_MEMORY&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_MEMORY ) ); \ } \ } while(0); _MEMDBG #else #define MEMDBG _MEMDBG #endif static void _LEAKDBG( char *format, ... ) { #ifdef DEBUG va_list args; va_start(args, format); PAPIDEBUG( DEBUG_LEAK, format , args); va_end(args); #endif } #ifdef DEBUG #define LEAKDBG do { \ if (DEBUG_LEAK&_papi_hwi_debug) {\ DEBUGLABEL( DEBUGLEVEL ( DEBUG_LEAK ) ); \ } \ } while(0); _LEAKDBG #else #define LEAKDBG _LEAKDBG #endif /* ifdef NO_VARARG_MACRO */ #endif #endif /* PAPI_DEBUG_H */ papi-5.3.0/src/Rules.bgpm0000600003276200002170000000111012247131120014654 0ustar ralphundrgrad# $Id: Rules.bgpm,v 1.1 2011/03/11 23:06:54 jagode Exp $ ifneq ($(USE_DEBUG),) BGPM_LIBNAME = bgpm_debug DEBUG_BGPM = "-DDEBUG_BGPM" else BGPM_LIBNAME = bgpm endif BGPM_OBJS=$(shell $(AR) t $(BGPM_INSTALL_DIR)/bgpm/lib/lib$(BGPM_LIBNAME).a && $(AR) 2>/dev/null) MISCOBJS = $(BGPM_OBJS) $(MISCSRCS:.c=.o) include Makefile.inc CFLAGS += -I$(BGPM_INSTALL_DIR) -I$(BGPM_INSTALL_DIR)/spi/include/kernel/cnk $(DEBUG_BGPM) LDFLAGS += $(BGPM_INSTALL_DIR)/bgpm/lib/lib$(BGPM_LIBNAME).a -lrt -lstdc++ $(BGPM_OBJS): $(AR) xv $(BGPM_INSTALL_DIR)/bgpm/lib/lib$(BGPM_LIBNAME).a papi-5.3.0/src/aix-memory.c0000600003276200002170000000570112247131120015160 0ustar ralphundrgrad/* * File: aix-memory.c * Author: Kevin London * london@cs.utk.edu * * Mods: * */ #include "papi.h" #include "papi_internal.h" #include "aix.h" int _aix_get_memory_info( PAPI_hw_info_t * mem_info, int type ) { PAPI_mh_level_t *L = mem_info->mem_hierarchy.level; /* Not quite sure what bit 30 indicates. I'm assuming it flags a unified tlb */ if ( _system_configuration.tlb_attrib & ( 1 << 30 ) ) { L[0].tlb[0].type = PAPI_MH_TYPE_UNIFIED; L[0].tlb[0].num_entries = _system_configuration.itlb_size; L[0].tlb[0].type = PAPI_MH_TYPE_UNIFIED; } else { L[0].tlb[0].type = PAPI_MH_TYPE_INST; L[0].tlb[0].num_entries = _system_configuration.itlb_size; L[0].tlb[0].associativity = _system_configuration.itlb_asc; L[0].tlb[1].type = PAPI_MH_TYPE_DATA; L[0].tlb[1].num_entries = _system_configuration.dtlb_size; L[0].tlb[1].associativity = _system_configuration.dtlb_asc; } /* Not quite sure what bit 30 indicates. I'm assuming it flags a unified cache */ if ( _system_configuration.cache_attrib & ( 1 << 30 ) ) { L[0].cache[0].type = PAPI_MH_TYPE_UNIFIED; L[0].cache[0].size = _system_configuration.icache_size; L[0].cache[0].associativity = _system_configuration.icache_asc; L[0].cache[0].line_size = _system_configuration.icache_line; } else { L[0].cache[0].type = PAPI_MH_TYPE_INST; L[0].cache[0].size = _system_configuration.icache_size; L[0].cache[0].associativity = _system_configuration.icache_asc; L[0].cache[0].line_size = _system_configuration.icache_line; L[0].cache[1].type = PAPI_MH_TYPE_DATA; L[0].cache[1].size = _system_configuration.dcache_size; L[0].cache[1].associativity = _system_configuration.dcache_asc; L[0].cache[1].line_size = _system_configuration.dcache_line; } L[1].cache[0].type = PAPI_MH_TYPE_UNIFIED; L[1].cache[0].size = _system_configuration.L2_cache_size; L[1].cache[0].associativity = _system_configuration.L2_cache_asc; /* is there a line size for Level 2 cache? */ /* it looks like we've always got at least 2 levels of info */ /* what about level 3 cache? */ mem_info->mem_hierarchy.levels = 2; return PAPI_OK; } int _aix_get_dmem_info( PAPI_dmem_info_t * d ) { /* This function has been reimplemented to conform to current interface. It has not been tested. Nor has it been confirmed for completeness. dkt 05-10-06 */ struct procsinfo pi; pid_t mypid = getpid( ); pid_t pid; int found = 0; pid = 0; while ( 1 ) { if ( getprocs( &pi, sizeof ( pi ), 0, 0, &pid, 1 ) != 1 ) break; if ( mypid == pi.pi_pid ) { found = 1; break; } } if ( !found ) return ( PAPI_ESYS ); d->size = pi.pi_size; d->resident = pi.pi_drss + pi.pi_trss; d->high_water_mark = PAPI_EINVAL; d->shared = PAPI_EINVAL; d->text = pi.pi_trss; /* this is a guess */ d->library = PAPI_EINVAL; d->heap = PAPI_EINVAL; d->locked = PAPI_EINVAL; d->stack = PAPI_EINVAL; d->pagesize = getpagesize( ); return ( PAPI_OK ); } papi-5.3.0/src/linux-common.c0000600003276200002170000003673312247131124015533 0ustar ralphundrgrad/* * File: linux-common.c */ #include #include #include #include #include #include #include #include #include #include #include #include "papi.h" #include "papi_internal.h" #include "papi_vector.h" #include "linux-memory.h" #include "linux-common.h" #include "linux-timer.h" #include "x86_cpuid_info.h" PAPI_os_info_t _papi_os_info; /* The locks used by Linux */ #if defined(USE_PTHREAD_MUTEXES) pthread_mutex_t _papi_hwd_lock_data[PAPI_MAX_LOCK]; #else volatile unsigned int _papi_hwd_lock_data[PAPI_MAX_LOCK]; #endif static int _linux_init_locks(void) { int i; for ( i = 0; i < PAPI_MAX_LOCK; i++ ) { #if defined(USE_PTHREAD_MUTEXES) pthread_mutex_init(&_papi_hwd_lock_data[i],NULL); #else _papi_hwd_lock_data[i] = MUTEX_OPEN; #endif } return PAPI_OK; } int _linux_detect_hypervisor(char *virtual_vendor_name) { int retval=0; #if defined(__i386__)||defined(__x86_64__) retval=_x86_detect_hypervisor(virtual_vendor_name); #else (void) virtual_vendor_name; #endif return retval; } #define _PATH_SYS_SYSTEM "/sys/devices/system" #define _PATH_SYS_CPU0 _PATH_SYS_SYSTEM "/cpu/cpu0" static char pathbuf[PATH_MAX] = "/"; static char * search_cpu_info( FILE * f, char *search_str, char *line ) { /* This function courtesy of Rudolph Berrendorf! */ /* See the home page for the German version of PAPI. */ char *s; while ( fgets( line, 256, f ) != NULL ) { if ( strstr( line, search_str ) != NULL ) { /* ignore all characters in line up to : */ for ( s = line; *s && ( *s != ':' ); ++s ); if ( *s ) return s; } } return NULL; } static void decode_vendor_string( char *s, int *vendor ) { if ( strcasecmp( s, "GenuineIntel" ) == 0 ) *vendor = PAPI_VENDOR_INTEL; else if ( ( strcasecmp( s, "AMD" ) == 0 ) || ( strcasecmp( s, "AuthenticAMD" ) == 0 ) ) *vendor = PAPI_VENDOR_AMD; else if ( strcasecmp( s, "IBM" ) == 0 ) *vendor = PAPI_VENDOR_IBM; else if ( strcasecmp( s, "Cray" ) == 0 ) *vendor = PAPI_VENDOR_CRAY; else if ( strcasecmp( s, "ARM" ) == 0 ) *vendor = PAPI_VENDOR_ARM; else if ( strcasecmp( s, "MIPS" ) == 0 ) *vendor = PAPI_VENDOR_MIPS; else if ( strcasecmp( s, "SiCortex" ) == 0 ) *vendor = PAPI_VENDOR_MIPS; else *vendor = PAPI_VENDOR_UNKNOWN; } static FILE * xfopen( const char *path, const char *mode ) { FILE *fd = fopen( path, mode ); if ( !fd ) err( EXIT_FAILURE, "error: %s", path ); return fd; } static FILE * path_vfopen( const char *mode, const char *path, va_list ap ) { vsnprintf( pathbuf, sizeof ( pathbuf ), path, ap ); return xfopen( pathbuf, mode ); } static int path_sibling( const char *path, ... ) { int c; long n; int result = 0; char s[2]; FILE *fp; va_list ap; va_start( ap, path ); fp = path_vfopen( "r", path, ap ); va_end( ap ); while ( ( c = fgetc( fp ) ) != EOF ) { if ( isxdigit( c ) ) { s[0] = ( char ) c; s[1] = '\0'; for ( n = strtol( s, NULL, 16 ); n > 0; n /= 2 ) { if ( n % 2 ) result++; } } } fclose( fp ); return result; } static int path_exist( const char *path, ... ) { va_list ap; va_start( ap, path ); vsnprintf( pathbuf, sizeof ( pathbuf ), path, ap ); va_end( ap ); return access( pathbuf, F_OK ) == 0; } int _linux_get_cpu_info( PAPI_hw_info_t *hwinfo, int *cpuinfo_mhz ) { int tmp, retval = PAPI_OK; char maxargs[PAPI_HUGE_STR_LEN], *t, *s; float mhz = 0.0; FILE *f; if ( ( f = fopen( "/proc/cpuinfo", "r" ) ) == NULL ) { PAPIERROR( "fopen(/proc/cpuinfo) errno %d", errno ); return PAPI_ESYS; } /* All of this information maybe overwritten by the component */ /* MHZ */ rewind( f ); s = search_cpu_info( f, "clock", maxargs ); if ( !s ) { rewind( f ); s = search_cpu_info( f, "cpu MHz", maxargs ); } if ( s ) { sscanf( s + 1, "%f", &mhz ); } *cpuinfo_mhz = mhz; /* Vendor Name and Vendor Code */ rewind( f ); s = search_cpu_info( f, "vendor_id", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; strcpy( hwinfo->vendor_string, s + 2 ); } else { rewind( f ); s = search_cpu_info( f, "vendor", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; strcpy( hwinfo->vendor_string, s + 2 ); } else { rewind( f ); s = search_cpu_info( f, "system type", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; s = strtok( s + 2, " " ); strcpy( hwinfo->vendor_string, s ); } else { rewind( f ); s = search_cpu_info( f, "platform", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; s = strtok( s + 2, " " ); if ( ( strcasecmp( s, "pSeries" ) == 0 ) || ( strcasecmp( s, "PowerMac" ) == 0 ) ) { strcpy( hwinfo->vendor_string, "IBM" ); } } else { rewind( f ); s = search_cpu_info( f, "CPU implementer", maxargs ); if ( s ) { strcpy( hwinfo->vendor_string, "ARM" ); } } } } } if ( strlen( hwinfo->vendor_string ) ) { decode_vendor_string( hwinfo->vendor_string, &hwinfo->vendor ); } /* Revision */ rewind( f ); s = search_cpu_info( f, "stepping", maxargs ); if ( s ) { sscanf( s + 1, "%d", &tmp ); hwinfo->revision = ( float ) tmp; hwinfo->cpuid_stepping = tmp; } else { rewind( f ); s = search_cpu_info( f, "revision", maxargs ); if ( s ) { sscanf( s + 1, "%d", &tmp ); hwinfo->revision = ( float ) tmp; hwinfo->cpuid_stepping = tmp; } } /* Model Name */ rewind( f ); s = search_cpu_info( f, "model name", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; strcpy( hwinfo->model_string, s + 2 ); } else { rewind( f ); s = search_cpu_info( f, "family", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; strcpy( hwinfo->model_string, s + 2 ); } else { rewind( f ); s = search_cpu_info( f, "cpu model", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; strtok( s + 2, " " ); s = strtok( NULL, " " ); strcpy( hwinfo->model_string, s ); } else { rewind( f ); s = search_cpu_info( f, "cpu", maxargs ); if ( s && ( t = strchr( s + 2, '\n' ) ) ) { *t = '\0'; /* get just the first token */ s = strtok( s + 2, " " ); strcpy( hwinfo->model_string, s ); } } } } /* Family */ rewind( f ); s = search_cpu_info( f, "family", maxargs ); if ( s ) { sscanf( s + 1, "%d", &tmp ); hwinfo->cpuid_family = tmp; } else { rewind( f ); s = search_cpu_info( f, "cpu family", maxargs ); if ( s ) { sscanf( s + 1, "%d", &tmp ); hwinfo->cpuid_family = tmp; } } /* CPU Model */ rewind( f ); s = search_cpu_info( f, "model", maxargs ); if ( s ) { sscanf( s + 1, "%d", &tmp ); hwinfo->model = tmp; hwinfo->cpuid_model = tmp; } /* The following members are set using the same methodology */ /* used in lscpu. */ /* Total number of CPUs */ /* The following line assumes totalcpus was initialized to zero! */ while ( path_exist( _PATH_SYS_SYSTEM "/cpu/cpu%d", hwinfo->totalcpus ) ) hwinfo->totalcpus++; /* Number of threads per core */ if ( path_exist( _PATH_SYS_CPU0 "/topology/thread_siblings" ) ) hwinfo->threads = path_sibling( _PATH_SYS_CPU0 "/topology/thread_siblings" ); /* Number of cores per socket */ if ( path_exist( _PATH_SYS_CPU0 "/topology/core_siblings" ) && hwinfo->threads > 0 ) hwinfo->cores = path_sibling( _PATH_SYS_CPU0 "/topology/core_siblings" ) / hwinfo->threads; /* Number of NUMA nodes */ /* The following line assumes nnodes was initialized to zero! */ while ( path_exist( _PATH_SYS_SYSTEM "/node/node%d", hwinfo->nnodes ) ) hwinfo->nnodes++; /* Number of CPUs per node */ hwinfo->ncpu = hwinfo->nnodes > 1 ? hwinfo->totalcpus / hwinfo->nnodes : hwinfo->totalcpus; /* Number of sockets */ if ( hwinfo->threads > 0 && hwinfo->cores > 0 ) hwinfo->sockets = hwinfo->totalcpus / hwinfo->cores / hwinfo->threads; #if 0 int *nodecpu; /* cpumap data is not currently part of the _papi_hw_info struct */ nodecpu = malloc( (unsigned int) hwinfo->nnodes * sizeof(int) ); if ( nodecpu ) { int i; for ( i = 0; i < hwinfo->nnodes; ++i ) { nodecpu[i] = path_sibling( _PATH_SYS_SYSTEM "/node/node%d/cpumap", i ); } } else { PAPIERROR( "malloc failed for variable not currently used" ); } #endif /* Fixup missing Megahertz Value */ /* This is missing from cpuinfo on ARM and MIPS */ if (*cpuinfo_mhz < 1.0) { rewind( f ); s = search_cpu_info( f, "BogoMIPS", maxargs ); if ((!s) || (sscanf( s + 1, "%f", &mhz ) != 1)) { INTDBG("Mhz detection failed. Please edit file %s at line %d.\n", __FILE__,__LINE__); } if (hwinfo->vendor == PAPI_VENDOR_MIPS) { /* MIPS has 2x clock multiplier */ *cpuinfo_mhz = 2*(((int)mhz)+1); /* Also update version info on MIPS */ rewind( f ); s = search_cpu_info( f, "cpu model", maxargs ); s = strstr(s+1," V")+2; strtok(s," "); sscanf(s, "%f ", &hwinfo->revision ); } else { /* In general bogomips is proportional to number of CPUs */ if (hwinfo->totalcpus) { if (mhz!=0) *cpuinfo_mhz = mhz / hwinfo->totalcpus; } } } fclose( f ); return retval; } int _linux_get_mhz( int *sys_min_mhz, int *sys_max_mhz ) { FILE *fff; int result; /* Try checking for min MHz */ /* Assume cpu0 exists */ fff=fopen("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq","r"); if (fff==NULL) return PAPI_EINVAL; result=fscanf(fff,"%d",sys_min_mhz); fclose(fff); if (result!=1) return PAPI_EINVAL; fff=fopen("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq","r"); if (fff==NULL) return PAPI_EINVAL; result=fscanf(fff,"%d",sys_max_mhz); fclose(fff); if (result!=1) return PAPI_EINVAL; return PAPI_OK; } int _linux_get_system_info( papi_mdi_t *mdi ) { int retval; char maxargs[PAPI_HUGE_STR_LEN]; pid_t pid; int cpuinfo_mhz,sys_min_khz,sys_max_khz; /* Software info */ /* Path and args */ pid = getpid( ); if ( pid < 0 ) { PAPIERROR( "getpid() returned < 0" ); return PAPI_ESYS; } mdi->pid = pid; sprintf( maxargs, "/proc/%d/exe", ( int ) pid ); if ( readlink( maxargs, mdi->exe_info.fullname, PAPI_HUGE_STR_LEN ) < 0 ) { PAPIERROR( "readlink(%s) returned < 0", maxargs ); return PAPI_ESYS; } /* Careful, basename can modify it's argument */ strcpy( maxargs, mdi->exe_info.fullname ); strcpy( mdi->exe_info.address_info.name, basename( maxargs ) ); SUBDBG( "Executable is %s\n", mdi->exe_info.address_info.name ); SUBDBG( "Full Executable is %s\n", mdi->exe_info.fullname ); /* Executable regions, may require reading /proc/pid/maps file */ retval = _linux_update_shlib_info( mdi ); SUBDBG( "Text: Start %p, End %p, length %d\n", mdi->exe_info.address_info.text_start, mdi->exe_info.address_info.text_end, ( int ) ( mdi->exe_info.address_info.text_end - mdi->exe_info.address_info.text_start ) ); SUBDBG( "Data: Start %p, End %p, length %d\n", mdi->exe_info.address_info.data_start, mdi->exe_info.address_info.data_end, ( int ) ( mdi->exe_info.address_info.data_end - mdi->exe_info.address_info.data_start ) ); SUBDBG( "Bss: Start %p, End %p, length %d\n", mdi->exe_info.address_info.bss_start, mdi->exe_info.address_info.bss_end, ( int ) ( mdi->exe_info.address_info.bss_end - mdi->exe_info.address_info.bss_start ) ); /* PAPI_preload_option information */ strcpy( mdi->preload_info.lib_preload_env, "LD_PRELOAD" ); mdi->preload_info.lib_preload_sep = ' '; strcpy( mdi->preload_info.lib_dir_env, "LD_LIBRARY_PATH" ); mdi->preload_info.lib_dir_sep = ':'; /* Hardware info */ retval = _linux_get_cpu_info( &mdi->hw_info, &cpuinfo_mhz ); if ( retval ) return retval; /* Handle MHz */ retval = _linux_get_mhz( &sys_min_khz, &sys_max_khz ); if ( retval ) { mdi->hw_info.cpu_max_mhz=cpuinfo_mhz; mdi->hw_info.cpu_min_mhz=cpuinfo_mhz; /* mdi->hw_info.mhz=cpuinfo_mhz; mdi->hw_info.clock_mhz=cpuinfo_mhz; */ } else { mdi->hw_info.cpu_max_mhz=sys_max_khz/1000; mdi->hw_info.cpu_min_mhz=sys_min_khz/1000; /* mdi->hw_info.mhz=sys_max_khz/1000; mdi->hw_info.clock_mhz=sys_max_khz/1000; */ } /* Set Up Memory */ retval = _linux_get_memory_info( &mdi->hw_info, mdi->hw_info.model ); if ( retval ) return retval; SUBDBG( "Found %d %s(%d) %s(%d) CPUs at %d Mhz.\n", mdi->hw_info.totalcpus, mdi->hw_info.vendor_string, mdi->hw_info.vendor, mdi->hw_info.model_string, mdi->hw_info.model, mdi->hw_info.cpu_max_mhz); /* Get virtualization info */ mdi->hw_info.virtualized=_linux_detect_hypervisor(mdi->hw_info.virtual_vendor_string); return PAPI_OK; } int _papi_hwi_init_os(void) { int major=0,minor=0,sub=0; char *ptr; struct utsname uname_buffer; /* Initialize the locks */ _linux_init_locks(); /* Get the kernel info */ uname(&uname_buffer); SUBDBG("Native kernel version %s\n",uname_buffer.release); strncpy(_papi_os_info.name,uname_buffer.sysname,PAPI_MAX_STR_LEN); #ifdef ASSUME_KERNEL strncpy(_papi_os_info.version,ASSUME_KERNEL,PAPI_MAX_STR_LEN); SUBDBG("Assuming kernel version %s\n",_papi_os_info.name); #else strncpy(_papi_os_info.version,uname_buffer.release,PAPI_MAX_STR_LEN); #endif ptr=strtok(_papi_os_info.version,"."); if (ptr!=NULL) major=atoi(ptr); ptr=strtok(NULL,"."); if (ptr!=NULL) minor=atoi(ptr); ptr=strtok(NULL,"."); if (ptr!=NULL) sub=atoi(ptr); _papi_os_info.os_version=LINUX_VERSION(major,minor,sub); _papi_os_info.itimer_sig = PAPI_INT_MPX_SIGNAL; _papi_os_info.itimer_num = PAPI_INT_ITIMER; _papi_os_info.itimer_ns = PAPI_INT_MPX_DEF_US * 1000; _papi_os_info.itimer_res_ns = 1; _papi_os_info.clock_ticks = sysconf( _SC_CLK_TCK ); /* Get Linux-specific system info */ _linux_get_system_info( &_papi_hwi_system_info ); return PAPI_OK; } int _linux_detect_nmi_watchdog() { int watchdog_detected=0,watchdog_value=0; FILE *fff; fff=fopen("/proc/sys/kernel/nmi_watchdog","r"); if (fff!=NULL) { if (fscanf(fff,"%d",&watchdog_value)==1) { if (watchdog_value>0) watchdog_detected=1; } fclose(fff); } return watchdog_detected; } papi_os_vector_t _papi_os_vector = { .get_memory_info = _linux_get_memory_info, .get_dmem_info = _linux_get_dmem_info, .get_real_cycles = _linux_get_real_cycles, .update_shlib_info = _linux_update_shlib_info, .get_system_info = _linux_get_system_info, #if defined(HAVE_CLOCK_GETTIME) .get_real_usec = _linux_get_real_usec_gettime, #elif defined(HAVE_GETTIMEOFDAY) .get_real_usec = _linux_get_real_usec_gettimeofday, #else .get_real_usec = _linux_get_real_usec_cycles, #endif #if defined(USE_PROC_PTTIMER) .get_virt_usec = _linux_get_virt_usec_pttimer, #elif defined(HAVE_CLOCK_GETTIME_THREAD) .get_virt_usec = _linux_get_virt_usec_gettime, #elif defined(HAVE_PER_THREAD_TIMES) .get_virt_usec = _linux_get_virt_usec_times, #elif defined(HAVE_PER_THREAD_GETRUSAGE) .get_virt_usec = _linux_get_virt_usec_rusage, #endif #if defined(HAVE_CLOCK_GETTIME) .get_real_nsec = _linux_get_real_nsec_gettime, #endif #if defined(HAVE_CLOCK_GETTIME_THREAD) .get_virt_nsec = _linux_get_virt_nsec_gettime, #endif }; papi-5.3.0/src/darwin-context.h0000600003276200002170000000000012247131121016030 0ustar ralphundrgradpapi-5.3.0/src/aix.h0000600003276200002170000000613512247131120013661 0ustar ralphundrgrad#ifndef _PAPI_AIX_H /* _PAPI_AIX */ #define _PAPI_AIX_H /****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ /* * File: pmapi-ppc64.h * Author: Maynard Johnson * maynardj@us.ibm.com * Mods: * */ #include #include #include #include #include #include #include #include #if defined( _AIXVERSION_510) || defined(_AIXVERSION_520) #include #include #endif #include #include #include #include #include #include #include #include #include "pmapi.h" #define ANY_THREAD_GETS_SIGNAL #define POWER_MAX_COUNTERS MAX_COUNTERS #define MAX_COUNTER_TERMS MAX_COUNTERS #define MAX_MPX_COUNTERS 32 #define INVALID_EVENT -2 #define POWER_MAX_COUNTERS_MAPPING 8 extern _text; extern _etext; extern _edata; extern _end; extern _data; /* globals */ #ifdef PM_INITIALIZE #ifdef _AIXVERSION_510 #define PMINFO_T pm_info2_t #define PMEVENTS_T pm_events2_t #else #define PMINFO_T pm_info_t #define PMEVENTS_T pm_events_t #endif PMINFO_T pminfo; #else #define PMINFO_T pm_info_t #define PMEVENTS_T pm_events_t /*pm_info_t pminfo;*/ #endif #include "aix-context.h" /* define the vector structure at the bottom of this file */ #define PM_INIT_FLAGS PM_VERIFIED|PM_UNVERIFIED|PM_CAVEAT|PM_GET_GROUPS #ifdef PM_INITIALIZE typedef pm_info2_t hwd_pminfo_t; typedef pm_events2_t hwd_pmevents_t; #else typedef pm_info_t hwd_pminfo_t; typedef pm_events_t hwd_pmevents_t; #endif #include "ppc64_events.h" typedef struct ppc64_pmapi_control { /* Buffer to pass to the kernel to control the counters */ pm_prog_t counter_cmd; int group_id; /* Space to read the counters */ pm_data_t state; } ppc64_pmapi_control_t; typedef struct ppc64_reg_alloc { int ra_position; unsigned int ra_group[GROUP_INTS]; int ra_counter_cmd[MAX_COUNTERS]; } ppc64_reg_alloc_t; typedef struct ppc64_pmapi_context { /* this structure is a work in progress */ ppc64_pmapi_control_t cntrl; } ppc64_pmapi_context_t; /* Override void* definitions from PAPI framework layer */ /* typedefs to conform to hardware independent PAPI code. */ #undef hwd_control_state_t #undef hwd_reg_alloc_t #undef hwd_context_t typedef ppc64_pmapi_control_t hwd_control_state_t; typedef ppc64_reg_alloc_t hwd_reg_alloc_t; typedef ppc64_pmapi_context_t hwd_context_t; /* typedef struct hwd_groups { // group number from the pmapi pm_groups_t struct //int group_id; // Buffer containing counter cmds for this group unsigned char counter_cmd[POWER_MAX_COUNTERS]; } hwd_groups_t; */ /* prototypes */ extern int _aix_set_granularity( hwd_control_state_t * this_state, int domain ); extern int _papi_hwd_init_preset_search_map( hwd_pminfo_t * info ); extern int _aix_get_memory_info( PAPI_hw_info_t * mem_info, int type ); extern int _aix_get_dmem_info( PAPI_dmem_info_t * d ); /* Machine dependent info structure */ extern pm_groups_info_t pmgroups; #endif /* _PAPI_AIX */ papi-5.3.0/src/papi_memory.h0000600003276200002170000000355012247131124015423 0ustar ralphundrgrad#ifndef _PAPI_MALLOC #define _PAPI_MALLOC #include #define DEBUG_FILE_LEN 20 typedef struct pmem { void *ptr; int size; #ifdef DEBUG char file[DEBUG_FILE_LEN]; int line; #endif struct pmem *next; struct pmem *prev; } pmem_t; #ifndef IN_MEM_FILE #ifdef PAPI_NO_MEMORY_MANAGEMENT #define papi_malloc(a) malloc(a) #define papi_free(a) free(a) #define papi_realloc(a,b) realloc(a,b) #define papi_calloc(a,b) calloc(a,b) #define papi_valid_free(a) 1 #define papi_strdup(a) strdup(a) #define papi_mem_cleanup_all() ; #define papi_mem_print_info(a) ; #define papi_mem_print_stats() ; #define papi_mem_overhead(a) ; #define papi_mem_check_all_overflow() ; #else #define papi_malloc(a) _papi_malloc(__FILE__,__LINE__, a) #define papi_free(a) _papi_free(__FILE__,__LINE__, a) #define papi_realloc(a,b) _papi_realloc(__FILE__,__LINE__,a,b) #define papi_calloc(a,b) _papi_calloc(__FILE__,__LINE__,a,b) #define papi_valid_free(a) _papi_valid_free(__FILE__,__LINE__,a) #define papi_strdup(a) _papi_strdup(__FILE__,__LINE__,a) #define papi_mem_cleanup_all _papi_mem_cleanup_all #define papi_mem_print_info(a) _papi_mem_print_info(a) #define papi_mem_print_stats _papi_mem_print_stats #define papi_mem_overhead(a) _papi_mem_overhead(a) #define papi_mem_check_all_overflow _papi_mem_check_all_overflow #endif #endif void *_papi_malloc( char *, int, size_t ); void _papi_free( char *, int, void * ); void *_papi_realloc( char *, int, void *, size_t ); void *_papi_calloc( char *, int, size_t, size_t ); int _papi_valid_free( char *, int, void * ); char *_papi_strdup( char *, int, const char *s ); void _papi_mem_cleanup_all( ); void _papi_mem_print_info( void *ptr ); void _papi_mem_print_stats( ); int _papi_mem_overhead( int ); int _papi_mem_check_all_overflow( ); #define PAPI_MEM_LIB_OVERHEAD 1 /* PAPI Library Overhead */ #define PAPI_MEM_OVERHEAD 2 /* Memory Overhead */ #endif papi-5.3.0/src/ctests/0000700003276200002170000000000012247131324014233 5ustar ralphundrgradpapi-5.3.0/src/ctests/native.c0000600003276200002170000001321312247131121015662 0ustar ralphundrgrad/* * File: native.c * Mods: Maynard Johnson * maynardj@us.ibm.com */ /* This test defines an array of native event names, either at compile time or at run time (some x86 platforms). It then: - add the table of events to an event set; - starts counting - does a little work - stops counting; - reports the results. */ #include "papi_test.h" static int EventSet = PAPI_NULL; extern int TESTS_QUIET; /* Declared in test_utils.c */ #if (defined(PPC32)) /* Select 4 events common to both ppc750 and ppc7450 */ static char *native_name[] = { "CPU_CLK", "FLOPS", "TOT_INS", "BR_MSP", NULL }; #elif defined(_POWER4) || defined(_PPC970) /* arbitrarily code events from group 28: pm_fpu3 - Floating point events by unit */ static char *native_name[] = { "PM_FPU0_FDIV", "PM_FPU1_FDIV", "PM_FPU0_FRSP_FCONV", "PM_FPU1_FRSP_FCONV", "PM_FPU0_FMA", "PM_FPU1_FMA", "PM_INST_CMPL", "PM_CYC", NULL }; #elif defined(_POWER5p) /* arbitrarily code events from group 33: pm_fpustall - Floating Point Unit stalls */ static char *native_name[] = { "PM_FPU_FULL_CYC", "PM_CMPLU_STALL_FDIV", "PM_CMPLU_STALL_FPU", "PM_RUN_INST_CMPL", "PM_RUN_CYC", NULL }; #elif defined(_POWER5) /* arbitrarily code events from group 78: pm_fpu1 - Floating Point events */ static char *native_name[] = { "PM_FPU_FDIV", "PM_FPU_FMA", "PM_FPU_FMOV_FEST", "PM_FPU_FEST", "PM_INST_CMPL", "PM_RUN_CYC", NULL }; #elif defined(POWER3) static char *native_name[] = { "PM_IC_MISS", "PM_FPU1_CMPL", "PM_LD_MISS_L1", "PM_LD_CMPL", "PM_FPU0_CMPL", "PM_CYC", "PM_TLB_MISS", NULL }; #elif defined(__ia64__) #ifdef ITANIUM2 static char *native_name[] = { "CPU_CYCLES", "L1I_READS", "L1D_READS_SET0", "IA64_INST_RETIRED", NULL }; #else static char *native_name[] = { "DEPENDENCY_SCOREBOARD_CYCLE", "DEPENDENCY_ALL_CYCLE", "UNSTALLED_BACKEND_CYCLE", "MEMORY_CYCLE", NULL }; #endif #elif ((defined(linux) && (defined(__i386__) || (defined __x86_64__))) ) static char *p3_native_name[] = { "DATA_MEM_REFS", "DCU_LINES_IN", NULL }; static char *core_native_name[] = { "UnhltCore_Cycles", "Instr_Retired", NULL }; static char *k7_native_name[] = { "TOT_CYC", "IC_MISSES", "DC_ACCESSES", "DC_MISSES", NULL }; // static char *k8_native_name[] = { "FP_ADD_PIPE", "FP_MULT_PIPE", "FP_ST_PIPE", "FP_NONE_RET", NULL }; static char *k8_native_name[] = { "DISPATCHED_FPU:OPS_ADD", "DISPATCHED_FPU:OPS_MULTIPLY", "DISPATCHED_FPU:OPS_STORE", "CYCLES_NO_FPU_OPS_RETIRED", NULL }; static char *p4_native_name[] = { "retired_mispred_branch_type:CONDITIONAL", "resource_stall:SBFULL", "tc_ms_xfer:CISC", "instr_retired:BOGUSNTAG:BOGUSTAG", "BSQ_cache_reference:RD_2ndL_HITS", NULL }; static char **native_name = p3_native_name; #elif defined(mips) && defined(sgi) static char *native_name[] = { "Primary_instruction_cache_misses", "Primary_data_cache_misses", NULL }; #elif defined(mips) && defined(linux) static char *native_name[] = { "CYCLES", NULL }; #elif defined(sun) && defined(sparc) static char *native_name[] = { "Cycle_cnt", "Instr_cnt", NULL }; #elif defined(_BGL) static char *native_name[] = { "BGL_UPC_PU0_PREF_STREAM_HIT", "BGL_PAPI_TIMEBASE", "BGL_UPC_PU1_PREF_STREAM_HIT", NULL }; #elif defined(__bgp__) static char *native_name[] = { "PNE_BGP_PU0_JPIPE_LOGICAL_OPS", "PNE_BGP_PU0_JPIPE_LOGICAL_OPS", "PNE_BGP_PU2_IPIPE_INSTRUCTIONS", NULL }; #else #error "Architecture not supported in test file." #endif int main( int argc, char **argv ) { int i, retval, native; const PAPI_hw_info_t *hwinfo; long long values[8]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( hwinfo = PAPI_get_hardware_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", PAPI_EMISC ); printf( "Architecture %s, %d\n", hwinfo->model_string, hwinfo->model ); #if ((defined(linux) && (defined(__i386__) || (defined __x86_64__))) ) if ( !strncmp( hwinfo->model_string, "Intel Pentium 4", 15 ) ) { native_name = p4_native_name; } else if ( !strncmp( hwinfo->model_string, "AMD K7", 6 ) ) { native_name = k7_native_name; } else if ( !strncmp( hwinfo->model_string, "AMD K8", 6 ) ) { native_name = k8_native_name; } else if ( !strncmp( hwinfo->model_string, "Intel Core", 17 ) || !strncmp( hwinfo->model_string, "Intel Core 2", 17 ) ) { native_name = core_native_name; } #endif for ( i = 0; native_name[i] != NULL; i++ ) { retval = PAPI_event_name_to_code( native_name[i], &native ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); printf( "Adding %s\n", native_name[i] ); if ( ( retval = PAPI_add_event( EventSet, native ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_both( 1000 ); if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { for ( i = 0; native_name[i] != NULL; i++ ) { fprintf( stderr, "%-40s: ", native_name[i] ); fprintf( stderr, LLDFMT, values[i] ); fprintf( stderr, "\n" ); } } retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/kufrin.c0000600003276200002170000001136312247131121015676 0ustar ralphundrgrad/* * File: multiplex1_pthreads.c * Author: Rick Kufrin * rkufrin@ncsa.uiuc.edu * Mods: Philip Mucci * mucci@cs.utk.edu */ /* This file really bangs on the multiplex pthread functionality */ #include #include "papi_test.h" int *events; int numevents = 0; int max_events=0; double loop( long n ) { long i; double a = 0.0012; for ( i = 0; i < n; i++ ) { a += 0.01; } return a; } void * thread( void *arg ) { ( void ) arg; /*unused */ int eventset = PAPI_NULL; long long *values; int ret = PAPI_register_thread( ); if ( ret != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", ret ); ret = PAPI_create_eventset( &eventset ); if ( ret != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", ret ); values=calloc(max_events,sizeof(long long)); printf( "Event set %d created\n", eventset ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ ret = PAPI_assign_eventset_component( eventset, 0 ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", ret ); } ret = PAPI_set_multiplex( eventset ); if ( ret == PAPI_ENOSUPP) { test_skip( __FILE__, __LINE__, "Multiplexing not supported", 1 ); } else if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", ret ); } ret = PAPI_add_events( eventset, events, numevents ); if ( ret < PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_add_events", ret ); } ret = PAPI_start( eventset ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", ret ); } do_stuff( ); ret = PAPI_stop( eventset, values ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", ret ); } ret = PAPI_cleanup_eventset( eventset ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", ret ); } ret = PAPI_destroy_eventset( &eventset ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", ret ); } ret = PAPI_unregister_thread( ); if ( ret != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", ret ); return ( NULL ); } int main( int argc, char **argv ) { int nthreads = 8, ret, i; PAPI_event_info_t info; pthread_t *threads; const PAPI_hw_info_t *hw_info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( !TESTS_QUIET ) { if ( argc > 1 ) { int tmp = atoi( argv[1] ); if ( tmp >= 1 ) nthreads = tmp; } } ret = PAPI_library_init( PAPI_VER_CURRENT ); if ( ret != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", ret ); } hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( strcmp( hw_info->model_string, "POWER6" ) == 0 ) { ret = PAPI_set_domain( PAPI_DOM_ALL ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_set_domain", ret ); } } ret = PAPI_thread_init( ( unsigned long ( * )( void ) ) pthread_self ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_thread_init", ret ); } ret = PAPI_multiplex_init( ); if ( ret != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_multiplex_init", ret ); } if ((max_events = PAPI_get_cmp_opt(PAPI_MAX_MPX_CTRS,NULL,0)) <= 0) { test_fail( __FILE__, __LINE__, "PAPI_get_cmp_opt", max_events ); } if ((events = calloc(max_events,sizeof(int))) == NULL) { test_fail( __FILE__, __LINE__, "calloc", PAPI_ESYS ); } /* Fill up the event set with as many non-derived events as we can */ i = PAPI_PRESET_MASK; do { if ( PAPI_get_event_info( i, &info ) == PAPI_OK ) { if ( info.count == 1 ) { events[numevents++] = ( int ) info.event_code; printf( "Added %s\n", info.symbol ); } else { printf( "Skipping derived event %s\n", info.symbol ); } } } while ( ( PAPI_enum_event( &i, PAPI_PRESET_ENUM_AVAIL ) == PAPI_OK ) && ( numevents < max_events ) ); printf( "Found %d events\n", numevents ); do_stuff( ); printf( "Creating %d threads:\n", nthreads ); threads = ( pthread_t * ) malloc( ( size_t ) nthreads * sizeof ( pthread_t ) ); if ( threads == NULL ) { test_fail( __FILE__, __LINE__, "malloc", PAPI_ENOMEM ); } /* Create the threads */ for ( i = 0; i < nthreads; i++ ) { ret = pthread_create( &threads[i], NULL, thread, NULL ); if ( ret != 0 ) { test_fail( __FILE__, __LINE__, "pthread_create", PAPI_ESYS ); } } /* Wait for thread completion */ for ( i = 0; i < nthreads; i++ ) { ret = pthread_join( threads[i], NULL ); if ( ret != 0 ) { test_fail( __FILE__, __LINE__, "pthread_join", PAPI_ESYS ); } } printf( "Done." ); test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); exit( 0 ); } papi-5.3.0/src/ctests/zero_pthreads.c0000600003276200002170000001203712247131121017250 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for 2 slave pthreads - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC Each of 2 slave pthreads: - Get cyc. - Get us. - Start counters - Do flops - Stop and read counters - Get us. - Get cyc. Master pthread: - Get us. - Get cyc. - Fork threads - Wait for threads to exit - Get us. - Get cyc. */ #include #include "papi_test.h" void * Thread( void *arg ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN]; retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); } printf( "Thread %#x started\n", ( int ) pthread_self( ) ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } values = allocate_test_space( num_tests, num_events1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( *( int * ) arg ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { printf( "Thread %#x %-12s : \t%lld\n", ( int ) pthread_self( ), event_name, values[0][1] ); printf( "Thread %#x PAPI_TOT_CYC : \t%lld\n", (int) pthread_self(), values[0][0] ); printf( "Thread %#x Real usec : \t%lld\n", ( int ) pthread_self( ), elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", (int) pthread_self(), elapsed_cyc ); } free_test_space( values, num_tests ); retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return NULL; } int main( int argc, char **argv ) { pthread_t e_th, f_th, g_th, h_th; int flops1, flops2, flops3, flops4; int retval, rc; pthread_attr_t attr; long long elapsed_us, elapsed_cyc; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) { test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); } else { test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } } elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif flops1 = 1000000; rc = pthread_create( &e_th, &attr, Thread, ( void * ) &flops1 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops2 = 2000000; rc = pthread_create( &f_th, &attr, Thread, ( void * ) &flops2 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops3 = 4000000; rc = pthread_create( &g_th, &attr, Thread, ( void * ) &flops3 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops4 = 8000000; rc = pthread_create( &h_th, &attr, Thread, ( void * ) &flops4 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } pthread_attr_destroy( &attr ); flops1 = 500000; Thread( &flops1 ); pthread_join( h_th, NULL ); pthread_join( g_th, NULL ); pthread_join( f_th, NULL ); pthread_join( e_th, NULL ); elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; if ( !TESTS_QUIET ) { printf( "Master real usec : \t%lld\n", elapsed_us ); printf( "Master real cycles : \t%lld\n", elapsed_cyc ); } test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); exit( 1 ); } papi-5.3.0/src/ctests/clockres_pthreads.c0000600003276200002170000000453412247131121020101 0ustar ralphundrgrad#include #include "papi_test.h" void * pthread_main( void *arg ) { ( void ) arg; int retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); } clockcore( ); retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); } return NULL; } int main( int argc, char **argv ) { pthread_t t1, t2, t3, t4; pthread_attr_t attr; int retval; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); if (( retval = PAPI_library_init( PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = PAPI_thread_init( ( unsigned long ( * )(void) ) (pthread_self) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) { test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); } else { test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } } if ( !TESTS_QUIET ) { printf( "Test case: Clock latency and resolution.\n" ); printf( "Note: Virtual timers are proportional to # CPUs.\n" ); printf( "------------------------------------------------\n" ); } pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) { test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); } #endif if (pthread_create( &t1, &attr, pthread_main, NULL )) { test_fail(__FILE__, __LINE__, "cannot create thread", retval); } if (pthread_create( &t2, &attr, pthread_main, NULL )) { test_fail(__FILE__, __LINE__, "cannot create thread", retval); } if (pthread_create( &t3, &attr, pthread_main, NULL )) { test_fail(__FILE__, __LINE__, "cannot create thread", retval); } if (pthread_create( &t4, &attr, pthread_main, NULL )) { test_fail(__FILE__, __LINE__, "cannot create thread", retval); } pthread_main( NULL ); pthread_join( t1, NULL ); pthread_join( t2, NULL ); pthread_join( t3, NULL ); pthread_join( t4, NULL ); test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/locks_pthreads.c0000600003276200002170000000463612247131121017412 0ustar ralphundrgrad/* This file checks to make sure the locking mechanisms work correctly on the platform. * Platforms where the locking mechanisms are not implemented or are incorrectly implemented * will fail. -KSL */ #include #include "papi_test.h" volatile long long count = 0; volatile long long tmpcount = 0; volatile int num_iters = 0; void lockloop( int iters, volatile long long *mycount ) { int i; for ( i = 0; i < iters; i++ ) { PAPI_lock( PAPI_USR1_LOCK ); *mycount = *mycount + 1; PAPI_unlock( PAPI_USR1_LOCK ); } } void * Slave( void *arg ) { long long duration; ( void ) arg; sleep( 1 ); duration = PAPI_get_real_usec( ); lockloop( 10000, &tmpcount ); duration = PAPI_get_real_usec( ) - duration; /* First one here set's the number */ PAPI_lock( PAPI_USR2_LOCK ); if ( num_iters == 0 ) { printf( "10000 iterations took %lld us.\n", duration ); num_iters = ( int ) ( 10 * ( TIME_LIMIT_IN_US / duration ) ); printf( "Running %d iterations\n", num_iters ); } PAPI_unlock( PAPI_USR2_LOCK ); lockloop( num_iters, &count ); pthread_exit( NULL ); } int main( int argc, char **argv ) { pthread_t slaves[MAX_THREADS]; int rc, i, nthr; int retval; const PAPI_hw_info_t *hwinfo = NULL; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( hwinfo = PAPI_get_hardware_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } if ( hwinfo->ncpu > MAX_THREADS ) nthr = MAX_THREADS; else nthr = hwinfo->ncpu; printf( "Creating %d threads\n", nthr ); for ( i = 0; i < nthr; i++ ) { rc = pthread_create( &slaves[i], NULL, Slave, NULL ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } } for ( i = 0; i < nthr; i++ ) { pthread_join( slaves[i], NULL ); } printf( "Expected: %lld Received: %lld\n", ( long long ) nthr * num_iters, count ); if ( nthr * num_iters != count ) test_fail( __FILE__, __LINE__, "Thread Locks", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/forkexec.c0000600003276200002170000000240612247131121016204 0ustar ralphundrgrad/* * File: forkexec.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: PAPI_library_init(); PAPI_shutdown() fork() / \ parent child wait() execlp() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); PAPI_shutdown( ); if ( fork( ) == 0 ) { if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/timer_overflow.c0000600003276200002170000000263612247131121017446 0ustar ralphundrgrad/* * File: timer_overflow.c * Author: Kevin London * london@cs.utk.edu * Mods: * */ /* This file looks for possible timer overflows. */ #include "papi_test.h" #define TIMER_THRESHOLD 100 extern int TESTS_QUIET; int main( int argc, char **argv ) { int sleep_time = TIMER_THRESHOLD; int retval, i; long long timer; if ( argc > 1 ) { if ( !strcmp( argv[1], "TESTS_QUIET" ) ) tests_quiet( argc, argv ); else { sleep_time = atoi( argv[1] ); if ( sleep_time <= 0 ) sleep_time = TIMER_THRESHOLD; } } if ( TESTS_QUIET ) { /* Skip the test in TESTS_QUIET so that the main script doesn't * run this as it takes a long time to check for overflow */ printf( "%-40s SKIPPED\nLine # %d\n", __FILE__, __LINE__ ); printf( "timer_overflow takes a long time to run, run separately.\n" ); exit( 0 ); } printf( "This test will take about: %f minutes.\n", ( float ) ( 20 * ( sleep_time / 60.0 ) ) ); if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); timer = PAPI_get_real_usec( ); for ( i = 0; i <= 20; i++ ) { if ( timer < 0 ) break; sleep( ( unsigned int ) sleep_time ); timer = PAPI_get_real_usec( ); } if ( timer < 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_real_usec: overflow", 1 ); else test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/api.c0000600003276200002170000004060712247131121015154 0ustar ralphundrgrad/* * File: api.c * CVS: $Id$ * Author: Brian Sheely * bsheely@eecs.utk.edu * * Description: This test is designed to provide unit testing and complete * coverage for all functions which comprise the "Low Level API" * and the "High Level API" as defined in papi.h. */ #include "papi.h" #include "papi_test.h" int main( int argc, char **argv ) { const int NUM_COUNTERS = 1; int Events[] = { PAPI_TOT_INS }; long long values[NUM_COUNTERS]; float rtime, ptime, ipc, mflips, mflops; long long ins, flpins, flpops; int retval; tests_quiet( argc, argv ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /****** High Level API ******/ if ( !TESTS_QUIET ) printf( "Testing PAPI_num_components... " ); /* get the number of components available on the system */ retval = PAPI_num_components( ); if ( !TESTS_QUIET ) printf( "%d\n", retval ); if ( retval == 0) { if ( !TESTS_QUIET ) printf( "No components found, skipping high level tests\n"); } else { if ( !TESTS_QUIET ) printf( "Testing PAPI_num_counters... " ); /* get the number of hardware counters available on the system */ retval = PAPI_num_counters( ); if ( retval != PAPI_get_cmp_opt( PAPI_MAX_HWCTRS, NULL, 0 ) ) test_fail_exit( __FILE__, __LINE__, "PAPI_num_counters", retval ); else if ( !TESTS_QUIET ) printf( "%d\n", retval ); if ( !TESTS_QUIET ) printf( "Testing PAPI_start_counters... " ); retval = PAPI_start_counters( NULL, NUM_COUNTERS ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_start_counters", retval ); retval = PAPI_start_counters( Events, 0 ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_start_counters", retval ); retval = PAPI_start_counters( Events, NUM_COUNTERS ); // start counting hardware events if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_start_counters", retval ); else if ( !TESTS_QUIET ) printf( "started PAPI_TOT_INS\n" ); if ( !TESTS_QUIET ) printf( "Testing PAPI_stop_counters... " ); retval = PAPI_stop_counters( NULL, NUM_COUNTERS ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop_counters", retval ); retval = PAPI_stop_counters( values, 0 ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop_counters", retval ); retval = PAPI_stop_counters( values, NUM_COUNTERS ); // stop counters and return current counts if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop_counters", retval ); else if ( !TESTS_QUIET ) printf( "stopped counting PAPI_TOT_INS\n" ); //NOTE: There are currently no checks on whether or not counter values are correct retval = PAPI_start_counters( Events, NUM_COUNTERS ); // start counting hardware events again if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_start_counters", retval ); if ( !TESTS_QUIET ) printf( "Testing PAPI_read_counters... " ); retval = PAPI_read_counters( NULL, NUM_COUNTERS ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_read_counters", retval ); retval = PAPI_read_counters( values, 0 ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_read_counters", retval ); retval = PAPI_read_counters( values, NUM_COUNTERS ); // copy current counts to array and reset counters if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_read_counters", retval ); else if ( !TESTS_QUIET ) printf( "read PAPI_TOT_INS counts and reset counter\n" ); //NOTE: There are currently no checks on whether or not counter values are correct if ( !TESTS_QUIET ) printf( "Testing PAPI_accum_counters... " ); retval = PAPI_accum_counters( NULL, NUM_COUNTERS ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_accum_counters", retval ); retval = PAPI_accum_counters( values, 0 ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_accum_counters", retval ); retval = PAPI_accum_counters( values, NUM_COUNTERS ); // add current counts to array and reset counters if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_accum_counters", retval ); else if ( !TESTS_QUIET ) printf( "added PAPI_TOT_INS counts and reset counter\n" ); //NOTE: There are currently no checks on whether or not counter values are correct retval = PAPI_stop_counters( values, NUM_COUNTERS ); // stop counting hardware events if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop_counters", retval ); if ( !TESTS_QUIET ) printf( "Testing PAPI_ipc... " ); retval = PAPI_ipc( NULL, &ptime, &ins, &ipc ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_ipc", retval ); retval = PAPI_ipc( &rtime, NULL, &ins, &ipc ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_ipc", retval ); retval = PAPI_ipc( &rtime, &ptime, NULL, &ipc ); // pass invalid 3rd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_ipc", retval ); retval = PAPI_ipc( &rtime, &ptime, &ins, NULL ); // pass invalid 4th argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_ipc", retval ); retval = PAPI_ipc( &rtime, &ptime, &ins, &ipc ); // get instructions per cycle, real and processor time if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_ipc", retval ); else if ( !TESTS_QUIET ) printf( "got instructions per cycle, real and processor time\n" ); //NOTE: There are currently no checks on whether or not returned values are correct //NOTE: PAPI_flips and PAPI_flops fail if any other low-level calls have been made! PAPI_shutdown( ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( !TESTS_QUIET ) printf( "Testing PAPI_flips... " ); retval = PAPI_flips( NULL, &ptime, &flpins, &mflips ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flips", retval ); retval = PAPI_flips( &rtime, NULL, &flpins, &mflips ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flips", retval ); retval = PAPI_flips( &rtime, &ptime, NULL, &mflips ); // pass invalid 3rd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flips", retval ); retval = PAPI_flips( &rtime, &ptime, &flpins, NULL ); // pass invalid 4th argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flips", retval ); retval = PAPI_flips( &rtime, &ptime, &flpins, &mflips ); // get Mflips/s, real and processor time if ( retval == PAPI_ENOEVNT ) test_warn( __FILE__, __LINE__, "PAPI_flips", retval); else if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_flips", retval ); else if ( !TESTS_QUIET ) printf( "got Mflips/s, real and processor time\n" ); //NOTE: There are currently no checks on whether or not returned values are correct PAPI_shutdown( ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( !TESTS_QUIET ) printf( "Testing PAPI_flops... " ); retval = PAPI_flops( NULL, &ptime, &flpops, &mflops ); // pass invalid 1st argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flops", retval ); retval = PAPI_flops( &rtime, NULL, &flpops, &mflops ); // pass invalid 2nd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flops", retval ); retval = PAPI_flops( &rtime, &ptime, NULL, &mflops ); // pass invalid 3rd argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flops", retval ); retval = PAPI_flops( &rtime, &ptime, &flpops, NULL ); // pass invalid 4th argument if ( retval != PAPI_EINVAL ) test_fail_exit( __FILE__, __LINE__, "PAPI_flops", retval ); retval = PAPI_flops( &rtime, &ptime, &flpops, &mflops ); // get Mflops/s, real and processor time if ( retval == PAPI_ENOEVNT ) test_warn( __FILE__, __LINE__, "PAPI_flops", retval); else if ( retval != PAPI_OK ) { test_fail_exit( __FILE__, __LINE__, "PAPI_flops", retval ); } else if ( !TESTS_QUIET ) { printf( "got Mflops/s, real and processor time\n" ); } //NOTE: There are currently no checks on whether or not returned values are correct } /***************************/ /****** Low Level API ******/ /***************************/ /* int PAPI_accum(int EventSet, long long * values); // accumulate and reset hardware events from an event set int PAPI_add_event(int EventSet, int Event); // add single PAPI preset or native hardware event to an event set int PAPI_add_events(int EventSet, int *Events, int number); // add array of PAPI preset or native hardware events to an event set int PAPI_assign_eventset_component(int EventSet, int cidx); // assign a component index to an existing but empty eventset int PAPI_attach(int EventSet, unsigned long tid); // attach specified event set to a specific process or thread id int PAPI_cleanup_eventset(int EventSet); // remove all PAPI events from an event set int PAPI_create_eventset(int *EventSet); // create a new empty PAPI event set int PAPI_detach(int EventSet); // detach specified event set from a previously specified process or thread id int PAPI_destroy_eventset(int *EventSet); // deallocates memory associated with an empty PAPI event set int PAPI_enum_event(int *EventCode, int modifier); // return the event code for the next available preset or natvie event int PAPI_event_code_to_name(int EventCode, char *out); // translate an integer PAPI event code into an ASCII PAPI preset or native name int PAPI_event_name_to_code(char *in, int *out); // translate an ASCII PAPI preset or native name into an integer PAPI event code int PAPI_get_dmem_info(PAPI_dmem_info_t *dest); // get dynamic memory usage information int PAPI_get_event_info(int EventCode, PAPI_event_info_t * info); // get the name and descriptions for a given preset or native event code const PAPI_exe_info_t *PAPI_get_executable_info(void); // get the executable's address space information const PAPI_hw_info_t *PAPI_get_hardware_info(void); // get information about the system hardware const PAPI_component_info_t *PAPI_get_component_info(int cidx); // get information about the component features int PAPI_get_multiplex(int EventSet); // get the multiplexing status of specified event set int PAPI_get_opt(int option, PAPI_option_t * ptr); // query the option settings of the PAPI library or a specific event set int PAPI_get_cmp_opt(int option, PAPI_option_t * ptr,int cidx); // query the component specific option settings of a specific event set long long PAPI_get_real_cyc(void); // return the total number of cycles since some arbitrary starting point long long PAPI_get_real_nsec(void); // return the total number of nanoseconds since some arbitrary starting point long long PAPI_get_real_usec(void); // return the total number of microseconds since some arbitrary starting point const PAPI_shlib_info_t *PAPI_get_shared_lib_info(void); // get information about the shared libraries used by the process int PAPI_get_thr_specific(int tag, void **ptr); // return a pointer to a thread specific stored data structure int PAPI_get_overflow_event_index(int Eventset, long long overflow_vector, int *array, int *number); // # decomposes an overflow_vector into an event index array long long PAPI_get_virt_cyc(void); // return the process cycles since some arbitrary starting point long long PAPI_get_virt_nsec(void); // return the process nanoseconds since some arbitrary starting point long long PAPI_get_virt_usec(void); // return the process microseconds since some arbitrary starting point int PAPI_is_initialized(void); // return the initialized state of the PAPI library int PAPI_library_init(int version); // initialize the PAPI library int PAPI_list_events(int EventSet, int *Events, int *number); // list the events that are members of an event set int PAPI_list_threads(unsigned long *tids, int *number); // list the thread ids currently known to PAPI int PAPI_lock(int); // lock one of two PAPI internal user mutex variables int PAPI_multiplex_init(void); // initialize multiplex support in the PAPI library int PAPI_num_hwctrs(void); // return the number of hardware counters for the cpu int PAPI_num_cmp_hwctrs(int cidx); // return the number of hardware counters for a specified component int PAPI_num_hwctrs(void); // for backward compatibility int PAPI_num_events(int EventSet); // return the number of events in an event set int PAPI_overflow(int EventSet, int EventCode, int threshold, int flags, PAPI_overflow_handler_t handler); // set up an event set to begin registering overflows int PAPI_perror( char *msg); // convert PAPI error codes to strings int PAPI_profil(void *buf, unsigned bufsiz, caddr_t offset, unsigned scale, int EventSet, int EventCode, int threshold, int flags); // generate PC histogram data where hardware counter overflow occurs int PAPI_query_event(int EventCode); // query if a PAPI event exists int PAPI_read(int EventSet, long long * values); // read hardware events from an event set with no reset int PAPI_read_ts(int EventSet, long long * values, long long *cyc); int PAPI_register_thread(void); // inform PAPI of the existence of a new thread int PAPI_remove_event(int EventSet, int EventCode); // remove a hardware event from a PAPI event set int PAPI_remove_events(int EventSet, int *Events, int number); // remove an array of hardware events from a PAPI event set int PAPI_reset(int EventSet); // reset the hardware event counts in an event set int PAPI_set_debug(int level); // set the current debug level for PAPI int PAPI_set_cmp_domain(int domain, int cidx); // set the component specific default execution domain for new event sets int PAPI_set_domain(int domain); // set the default execution domain for new event sets int PAPI_set_cmp_granularity(int granularity, int cidx); // set the component specific default granularity for new event sets int PAPI_set_granularity(int granularity); //set the default granularity for new event sets int PAPI_set_multiplex(int EventSet); // convert a standard event set to a multiplexed event set int PAPI_set_opt(int option, PAPI_option_t * ptr); // change the option settings of the PAPI library or a specific event set int PAPI_set_thr_specific(int tag, void *ptr); // save a pointer as a thread specific stored data structure void PAPI_shutdown(void); // finish using PAPI and free all related resources int PAPI_sprofil(PAPI_sprofil_t * prof, int profcnt, int EventSet, int EventCode, int threshold, int flags); // generate hardware counter profiles from multiple code regions int PAPI_start(int EventSet); // start counting hardware events in an event set int PAPI_state(int EventSet, int *status); // return the counting state of an event set int PAPI_stop(int EventSet, long long * values); // stop counting hardware events in an event set and return current events char *PAPI_strerror(int); // return a pointer to the error message corresponding to a specified error code unsigned long PAPI_thread_id(void); // get the thread identifier of the current thread int PAPI_thread_init(unsigned long (*id_fn) (void)); // initialize thread support in the PAPI library int PAPI_unlock(int); // unlock one of two PAPI internal user mutex variables int PAPI_unregister_thread(void); // inform PAPI that a previously registered thread is disappearing int PAPI_write(int EventSet, long long * values); // write counter values into counters */ test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/krentel_pthreads.c0000600003276200002170000000776712247131121017753 0ustar ralphundrgrad/* * Test PAPI with multiple threads. */ #include #include #include "papi_test.h" #define EVENT PAPI_TOT_CYC int program_time = 5; int threshold = 20000000; int num_threads = 3; long count[MAX_THREADS]; long iter[MAX_THREADS]; struct timeval last[MAX_THREADS]; pthread_key_t key; struct timeval start; void my_handler( int EventSet, void *pc, long long ovec, void *context ) { ( void ) EventSet; ( void ) pc; ( void ) ovec; ( void ) context; long num = ( long ) pthread_getspecific( key ); if ( num < 0 || num > num_threads ) test_fail( __FILE__, __LINE__, "getspecific failed", 1 ); count[num]++; } void print_rate( long num ) { struct timeval now; long st_secs; double last_secs; gettimeofday( &now, NULL ); st_secs = now.tv_sec - start.tv_sec; last_secs = ( double ) ( now.tv_sec - last[num].tv_sec ) + ( ( double ) ( now.tv_usec - last[num].tv_usec ) ) / 1000000.0; if ( last_secs <= 0.001 ) last_secs = 0.001; printf( "[%ld] time = %ld, count = %ld, iter = %ld, " "rate = %.1f/Kiter\n", num, st_secs, count[num], iter[num], ( 1000.0 * ( double ) count[num] ) / ( double ) iter[num] ); count[num] = 0; iter[num] = 0; last[num] = now; } void do_cycles( long num, int len ) { struct timeval start, now; double x, sum; gettimeofday( &start, NULL ); for ( ;; ) { sum = 1.0; for ( x = 1.0; x < 250000.0; x += 1.0 ) sum += x; if ( sum < 0.0 ) printf( "==>> SUM IS NEGATIVE !! <<==\n" ); iter[num]++; gettimeofday( &now, NULL ); if ( now.tv_sec >= start.tv_sec + len ) break; } } void launch_timer( int *EventSet ) { if ( PAPI_create_eventset( EventSet ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset failed", 1 ); if ( PAPI_add_event( *EventSet, EVENT ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event failed", 1 ); if ( PAPI_overflow( *EventSet, EVENT, threshold, 0, my_handler ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow failed", 1 ); if ( PAPI_start( *EventSet ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start failed", 1 ); } void * my_thread( void *v ) { long num = ( long ) v; int n; int EventSet = PAPI_NULL; long long value; int retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); pthread_setspecific( key, v ); count[num] = 0; iter[num] = 0; last[num] = start; launch_timer( &EventSet ); printf( "launched timer in thread %ld\n", num ); for ( n = 1; n <= program_time; n++ ) { do_cycles( num, 1 ); print_rate( num ); } PAPI_stop( EventSet, &value ); PAPI_remove_event( EventSet, EVENT ); PAPI_destroy_eventset( &EventSet ); retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( NULL ); } int main( int argc, char **argv ) { pthread_t td; long n; tests_quiet( argc, argv ); /*Set TESTS_QUIET variable */ if ( argc < 2 || sscanf( argv[1], "%d", &program_time ) < 1 ) program_time = 6; if ( argc < 3 || sscanf( argv[2], "%d", &threshold ) < 1 ) threshold = 20000000; if ( argc < 4 || sscanf( argv[3], "%d", &num_threads ) < 1 ) num_threads = 3; printf( "program_time = %d, threshold = %d, num_threads = %d\n\n", program_time, threshold, num_threads ); if ( PAPI_library_init( PAPI_VER_CURRENT ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init failed", 1 ); if ( PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_thread_init failed", 1 ); if ( pthread_key_create( &key, NULL ) != 0 ) test_fail( __FILE__, __LINE__, "pthread key create failed", 1 ); gettimeofday( &start, NULL ); for ( n = 1; n <= num_threads; n++ ) { if ( pthread_create( &td, NULL, my_thread, ( void * ) n ) != 0 ) test_fail( __FILE__, __LINE__, "pthread create failed", 1 ); } my_thread( ( void * ) 0 ); printf( "done\n" ); test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); return ( 0 ); } papi-5.3.0/src/ctests/pernode.c0000600003276200002170000000601012247131121016025 0ustar ralphundrgrad/* This file performs the following test: - make an event set with PAPI_TOT_INS and PAPI_TOT_CYC. - enable per node counting - enable full domain counting - sleeps for 5 seconds - print the results */ #include #include #include #include #include #include #include #include "papi_test.h" int main( ) { int ncpu, nctr, i, actual_domain; int retval; int EventSet = PAPI_NULL; long long *values; long long elapsed_us, elapsed_cyc; PAPI_option_t options; retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { fprintf( stderr, "Library mismatch: code %d, library %d\n", retval, PAPI_VER_CURRENT ); exit( 1 ); } if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) exit( 1 ); /* Set the domain as high as it will go. */ options.domain.eventset = EventSet; options.domain.domain = PAPI_DOM_ALL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) exit( 1 ); actual_domain = options.domain.domain; /* This should only happen to an empty eventset */ options.granularity.eventset = EventSet; options.granularity.granularity = PAPI_GRN_SYS_CPU; retval = PAPI_set_opt( PAPI_GRANUL, &options ); if ( retval != PAPI_OK ) exit( 1 ); /* Malloc the output array */ ncpu = PAPI_get_opt( PAPI_MAX_CPUS, NULL ); nctr = PAPI_get_opt( PAPI_MAX_HWCTRS, NULL ); values = ( long long * ) malloc( ncpu * nctr * sizeof ( long long ) ); memset( values, 0x0, ( ncpu * nctr * sizeof ( long long ) ) ); /* Add the counters */ if ( PAPI_add_event( EventSet, PAPI_TOT_CYC ) != PAPI_OK ) exit( 1 ); if ( PAPI_add_event( EventSet, PAPI_TOT_INS ) != PAPI_OK ) exit( 1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) exit( 1 ); sleep( 5 ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) exit( 1 ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; printf( "Test case: per node\n" ); printf( "-------------------\n\n" ); printf( "This machine has %d cpus, each with %d counters.\n", ncpu, nctr ); printf( "Test case asked for: PAPI_DOM_ALL\n" ); printf( "Test case got: " ); if ( actual_domain & PAPI_DOM_USER ) printf( "PAPI_DOM_USER " ); if ( actual_domain & PAPI_DOM_KERNEL ) printf( "PAPI_DOM_KERNEL " ); if ( actual_domain & PAPI_DOM_OTHER ) printf( "PAPI_DOM_OTHER " ); printf( "\n" ); for ( i = 0; i < ncpu; i++ ) { printf( "CPU %d\n", i ); printf( "PAPI_TOT_CYC: \t%lld\n", values[0 + i * nctr] ); printf( "PAPI_TOT_INS: \t%lld\n", values[1 + i * nctr] ); } printf ( "\n-------------------------------------------------------------------------\n" ); printf( "Real usec : \t%lld\n", elapsed_us ); printf( "Real cycles : \t%lld\n", elapsed_cyc ); printf ( "-------------------------------------------------------------------------\n" ); free( values ); PAPI_shutdown( ); exit( 0 ); } papi-5.3.0/src/ctests/hwinfo.c0000600003276200002170000000374512247131121015677 0ustar ralphundrgrad/* This file performs the following test: valid fields in hw_info */ #include "papi_test.h" int main( int argc, char **argv ) { int retval, i, j; const PAPI_hw_info_t *hwinfo = NULL; const PAPI_mh_info_t *mh; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = papi_print_header ( "Test case hwinfo.c: Check output of PAPI_get_hardware_info.\n", &hwinfo ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); mh = &hwinfo->mem_hierarchy; validate_string( ( char * ) hwinfo->vendor_string, "vendor_string" ); validate_string( ( char * ) hwinfo->model_string, "model_string" ); if ( hwinfo->vendor == PAPI_VENDOR_UNKNOWN ) test_fail( __FILE__, __LINE__, "Vendor unknown", 0 ); if ( hwinfo->cpu_max_mhz == 0.0 ) test_fail( __FILE__, __LINE__, "Mhz unknown", 0 ); if ( hwinfo->ncpu < 1 ) test_fail( __FILE__, __LINE__, "ncpu < 1", 0 ); if ( hwinfo->totalcpus < 1 ) test_fail( __FILE__, __LINE__, "totalcpus < 1", 0 ); /* if ( PAPI_get_opt( PAPI_MAX_HWCTRS, NULL ) < 1 ) test_fail( __FILE__, __LINE__, "get_opt(MAX_HWCTRS) < 1", 0 ); if ( PAPI_get_opt( PAPI_MAX_MPX_CTRS, NULL ) < 1 ) test_fail( __FILE__, __LINE__, "get_opt(MAX_MPX_CTRS) < 1", 0 );*/ if ( mh->levels < 0 ) test_fail( __FILE__, __LINE__, "max mh level < 0", 0 ); printf( "Max level of TLB or Cache: %d\n", mh->levels ); for ( i = 0; i < mh->levels; i++ ) { for ( j = 0; j < PAPI_MH_MAX_LEVELS; j++ ) { const PAPI_mh_cache_info_t *c = &mh->level[i].cache[j]; const PAPI_mh_tlb_info_t *t = &mh->level[i].tlb[j]; printf( "Level %d, TLB %d: %d, %d, %d\n", i, j, t->type, t->num_entries, t->associativity ); printf( "Level %d, Cache %d: %d, %d, %d, %d, %d\n", i, j, c->type, c->size, c->line_size, c->num_lines, c->associativity ); } } test_pass( __FILE__, 0, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/burn.c0000600003276200002170000000010312247131121015334 0ustar ralphundrgrad#include "papi_test.h" int main( ) { do_stuff( ); exit( 0 ); } papi-5.3.0/src/ctests/tenth.c0000600003276200002170000001465312247131121015527 0ustar ralphundrgrad/* * File: tenth.c * Mods: Maynard Johnson * maynardj@us.ibm.com */ #define ITERS 100 /* This file performs the following test: start, stop and timer functionality for PAPI_L1_TCM derived event - They are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #if defined(sun) && defined(sparc) #define CACHE_LEVEL "PAPI_L2_TCM" #define EVT1 PAPI_L2_TCM #define EVT2 PAPI_L2_TCA #define EVT3 PAPI_L2_TCH #define EVT1_STR "PAPI_L2_TCM: " #define EVT2_STR "PAPI_L2_TCA: " #define EVT3_STR "PAPI_L2_TCH: " #define MASK1 MASK_L2_TCM #define MASK2 MASK_L2_TCA #define MASK3 MASK_L2_TCH #else #if defined(__powerpc__) #define CACHE_LEVEL "PAPI_L1_DCA" #define EVT1 PAPI_L1_DCA #define EVT2 PAPI_L1_DCW #define EVT3 PAPI_L1_DCR #define EVT1_STR "PAPI_L1_DCA: " #define EVT2_STR "PAPI_L1_DCW: " #define EVT3_STR "PAPI_L1_DCR: " #define MASK1 MASK_L1_DCA #define MASK2 MASK_L1_DCW #define MASK3 MASK_L1_DCR #else #define CACHE_LEVEL "PAPI_L1_TCM" #define EVT1 PAPI_L1_TCM #define EVT2 PAPI_L1_ICM #define EVT3 PAPI_L1_DCM #define EVT1_STR "PAPI_L1_TCM: " #define EVT2_STR "PAPI_L1_ICM: " #define EVT3_STR "PAPI_L1_DCM: " #define MASK1 MASK_L1_TCM #define MASK2 MASK_L1_ICM #define MASK3 MASK_L1_DCM #endif #endif #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 30, tmp; int EventSet1 = PAPI_NULL; int EventSet2 = PAPI_NULL; int EventSet3 = PAPI_NULL; int mask1 = MASK1; int mask2 = MASK2; int mask3 = MASK3; int num_events1; int num_events2; int num_events3; long long **values; int i, j; long long min[3]; long long max[3]; long long sum[3]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* Make sure that required resources are available */ /* Skip (don't fail!) if they are not */ retval = PAPI_query_event( EVT1 ); if ( retval != PAPI_OK ) test_skip( __FILE__, __LINE__, EVT1_STR, retval ); retval = PAPI_query_event( EVT2 ); if ( retval != PAPI_OK ) test_skip( __FILE__, __LINE__, EVT2_STR, retval ); retval = PAPI_query_event( EVT3 ); if ( retval != PAPI_OK ) test_skip( __FILE__, __LINE__, EVT3_STR, retval ); EventSet1 = add_test_events( &num_events1, &mask1, 1 ); EventSet2 = add_test_events( &num_events2, &mask2, 1 ); EventSet3 = add_test_events( &num_events3, &mask3, 1 ); values = allocate_test_space( num_tests, 1 ); /* Warm me up */ do_l1misses( ITERS ); do_misses( 1, 1024 * 1024 * 4 ); for ( i = 0; i < 10; i++ ) { retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_l1misses( ITERS ); do_misses( 1, 1024 * 1024 * 4 ); retval = PAPI_stop( EventSet1, values[( i * 3 ) + 0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_start( EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_l1misses( ITERS ); do_misses( 1, 1024 * 1024 * 4 ); retval = PAPI_stop( EventSet2, values[( i * 3 ) + 1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_start( EventSet3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_l1misses( ITERS ); do_misses( 1, 1024 * 1024 * 4 ); retval = PAPI_stop( EventSet3, values[( i * 3 ) + 2] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } remove_test_events( &EventSet1, mask1 ); remove_test_events( &EventSet2, mask2 ); remove_test_events( &EventSet3, mask3 ); for ( j = 0; j < 3; j++ ) { min[j] = 65535; max[j] = sum[j] = 0; } for ( i = 0; i < 10; i++ ) { for ( j = 0; j < 3; j++ ) { if ( min[j] > values[( i * 3 ) + j][0] ) min[j] = values[( i * 3 ) + j][0]; if ( max[j] < values[( i * 3 ) + j][0] ) max[j] = values[( i * 3 ) + j][0]; sum[j] += values[( i * 3 ) + j][0]; } } if ( !TESTS_QUIET ) { printf( "Test case 10: start, stop for derived event %s.\n", CACHE_LEVEL ); printf( "--------------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", ITERS ); printf( "Repeated 10 times\n" ); printf ( "-------------------------------------------------------------------------\n" ); /* for (i=0;i<10;i++) { printf("Test type : %12s%13s%13s\n", "1", "2", "3"); printf(TAB3, EVT1_STR, values[(i*3)+0][0], (long long)0, (long long)0); printf(TAB3, EVT2_STR, (long long)0, values[(i*3)+1][0], (long long)0); printf(TAB3, EVT3_STR, (long long)0, (long long)0, values[(i*3)+2][0]); printf ("-------------------------------------------------------------------------\n"); } */ printf( "Test type : %12s%13s%13s\n", "min", "max", "sum" ); printf( TAB3, EVT1_STR, min[0], max[0], sum[0] ); printf( TAB3, EVT2_STR, min[1], max[1], sum[1] ); printf( TAB3, EVT3_STR, min[2], max[2], sum[2] ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); #if defined(sun) && defined(sparc) printf( TAB1, "Sum 1 approximately equals sum 2 - sum 3 or", ( sum[1] - sum[2] ) ); #else printf( TAB1, "Sum 1 approximately equals sum 2 + sum 3 or", ( sum[1] + sum[2] ) ); #endif } { long long tmin, tmax; #if defined(sun) && defined(sparc) tmax = ( long long ) ( sum[1] - sum[2] ); #else tmax = ( long long ) ( sum[1] + sum[2] ); #endif printf( "percent error: %f\n", ( float ) ( abs( ( int ) ( tmax - sum[0] ) ) * 100 / sum[0] ) ); tmin = ( long long ) ( ( double ) tmax * 0.8 ); tmax = ( long long ) ( ( double ) tmax * 1.2 ); if ( sum[0] > tmax || sum[0] < tmin ) test_fail( __FILE__, __LINE__, CACHE_LEVEL, 1 ); } test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/attach_target.c0000600003276200002170000000035612247131121017212 0ustar ralphundrgrad #include "papi_test.h" int main(int argc, char **argv) { int c, i = NUM_FLOPS; if (argc > 1) { c = atoi(argv[1]); if (c >= 0) { i = c; } } do_flops(i); exit(0); } papi-5.3.0/src/ctests/data_range.c0000600003276200002170000001664312247131121016473 0ustar ralphundrgrad/* * File: data_range.c * Author: Dan Terpstra * terpstra@cs.utk.edu * Mods: * */ /* This file performs the following test: */ /* exercise the Itanium data address range interface */ #include "papi_test.h" #define NUM 16384 static void init_array( void ); static int do_malloc_work( long loop ); static int do_static_work( long loop ); static void measure_load_store( caddr_t start, caddr_t end ); static void measure_event( int index, PAPI_option_t * option ); int *parray1, *parray2, *parray3; int array1[NUM], array2[NUM], array3[NUM]; char event_name[2][PAPI_MAX_STR_LEN]; int PAPI_event[2]; int EventSet = PAPI_NULL; int main( int argc, char **argv ) { int retval; const PAPI_exe_info_t *prginfo = NULL; const PAPI_hw_info_t *hw_info; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); #if !defined(ITANIUM2) && !defined(ITANIUM3) test_skip( __FILE__, __LINE__, "Currently only works on itanium2", 0 ); exit( 1 ); #endif tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ init_array( ); printf( "Malloc'd array pointers: %p %p %p\n", &parray1, &parray2, &parray3 ); printf( "Malloc'd array addresses: %p %p %p\n", parray1, parray2, parray3 ); printf( "Static array addresses: %p %p %p\n", &array1, &array2, &array3 ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); prginfo = PAPI_get_executable_info( ); if ( prginfo == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); #if defined(linux) && defined(__ia64__) sprintf( event_name[0], "loads_retired" ); sprintf( event_name[1], "stores_retired" ); PAPI_event_name_to_code( event_name[0], &PAPI_event[0] ); PAPI_event_name_to_code( event_name[1], &PAPI_event[1] ); #else test_skip( __FILE__, __LINE__, "only works for Itanium", PAPI_ENOSUPP ); #endif if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); /***************************************************************************************/ printf ( "\n\nMeasure loads and stores on the pointers to the allocated arrays\n" ); printf( "Expected loads: %d; Expected stores: 0\n", NUM * 2 ); printf ( "These loads result from accessing the pointers to compute array addresses.\n" ); printf ( "They will likely disappear with higher levels of optimization.\n" ); measure_load_store( ( caddr_t ) & parray1, ( caddr_t ) ( &parray1 + 1 ) ); measure_load_store( ( caddr_t ) & parray2, ( caddr_t ) ( &parray2 + 1 ) ); measure_load_store( ( caddr_t ) & parray3, ( caddr_t ) ( &parray3 + 1 ) ); /***************************************************************************************/ printf ( "\n\nMeasure loads and stores on the allocated arrays themselves\n" ); printf( "Expected loads: %d; Expected stores: %d\n", NUM, NUM ); measure_load_store( ( caddr_t ) parray1, ( caddr_t ) ( parray1 + NUM ) ); measure_load_store( ( caddr_t ) parray2, ( caddr_t ) ( parray2 + NUM ) ); measure_load_store( ( caddr_t ) parray3, ( caddr_t ) ( parray3 + NUM ) ); /***************************************************************************************/ printf( "\n\nMeasure loads and stores on the static arrays\n" ); printf ( "These values will differ from the expected values by the size of the offsets.\n" ); printf( "Expected loads: %d; Expected stores: %d\n", NUM, NUM ); measure_load_store( ( caddr_t ) array1, ( caddr_t ) ( array1 + NUM ) ); measure_load_store( ( caddr_t ) array2, ( caddr_t ) ( array2 + NUM ) ); measure_load_store( ( caddr_t ) array3, ( caddr_t ) ( array3 + NUM ) ); /***************************************************************************************/ retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy", retval ); free( parray1 ); free( parray2 ); free( parray3 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } static void measure_load_store( caddr_t start, caddr_t end ) { PAPI_option_t option; int retval; /* set up the optional address structure for starting and ending data addresses */ option.addr.eventset = EventSet; option.addr.start = start; option.addr.end = end; if ( ( retval = PAPI_set_opt( PAPI_DATA_ADDRESS, &option ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt(PAPI_DATA_ADDRESS)", retval ); measure_event( 0, &option ); measure_event( 1, &option ); } static void measure_event( int index, PAPI_option_t * option ) { int retval; long long value; if ( ( retval = PAPI_add_event( EventSet, PAPI_event[index] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( index == 0 ) { /* if ((retval = PAPI_get_opt(PAPI_DATA_ADDRESS, option)) != PAPI_OK) test_fail(__FILE__, __LINE__, "PAPI_get_opt(PAPI_DATA_ADDRESS)", retval); */ printf ( "Requested Start Address: %p; Start Offset: 0x%5x; Actual Start Address: %p\n", option->addr.start, option->addr.start_off, option->addr.start - option->addr.start_off ); printf ( "Requested End Address: %p; End Offset: 0x%5x; Actual End Address: %p\n", option->addr.end, option->addr.end_off, option->addr.end + option->addr.end_off ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_malloc_work( NUM ); do_static_work( NUM ); retval = PAPI_stop( EventSet, &value ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } printf( "%s: %lld\n", event_name[index], value ); if ( ( retval = PAPI_remove_event( EventSet, PAPI_event[index] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event", retval ); } static void init_array( void ) { parray1 = ( int * ) malloc( NUM * sizeof ( int ) ); if ( parray1 == NULL ) test_fail( __FILE__, __LINE__, "No memory available!\n", 0 ); memset( parray1, 0x0, NUM * sizeof ( int ) ); parray2 = ( int * ) malloc( NUM * sizeof ( int ) ); if ( parray2 == NULL ) test_fail( __FILE__, __LINE__, "No memory available!\n", 0 ); memset( parray2, 0x0, NUM * sizeof ( int ) ); parray3 = ( int * ) malloc( NUM * sizeof ( int ) ); if ( parray3 == NULL ) test_fail( __FILE__, __LINE__, "No memory available!\n", 0 ); memset( parray3, 0x0, NUM * sizeof ( int ) ); } static int do_static_work( long loop ) { int i; int sum = 0; for ( i = 0; i < loop; i++ ) { array1[i] = i; sum += array1[i]; } for ( i = 0; i < loop; i++ ) { array2[i] = i; sum += array2[i]; } for ( i = 0; i < loop; i++ ) { array3[i] = i; sum += array3[i]; } return sum; } static int do_malloc_work( long loop ) { int i; int sum = 0; for ( i = 0; i < loop; i++ ) { parray1[i] = i; sum += parray1[i]; } for ( i = 0; i < loop; i++ ) { parray2[i] = i; sum += parray2[i]; } for ( i = 0; i < loop; i++ ) { parray3[i] = i; sum += parray3[i]; } return sum; } papi-5.3.0/src/ctests/profile_pthreads.c0000600003276200002170000001236312247131121017733 0ustar ralphundrgrad/* This file performs the following test: profile for pthreads */ #include #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ #define THR 1000000 #define FLOPS 100000000 unsigned int length; caddr_t my_start, my_end; void * Thread( void *arg ) { int retval, num_tests = 1, i; int EventSet1 = PAPI_NULL, mask1, PAPI_event; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; unsigned short *profbuf; char event_name[PAPI_MAX_STR_LEN]; retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); profbuf = ( unsigned short * ) malloc( length * sizeof ( unsigned short ) ); if ( profbuf == NULL ) exit( 1 ); memset( profbuf, 0x00, length * sizeof ( unsigned short ) ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ EventSet1 = add_two_nonderived_events( &num_events1, &PAPI_event, &mask1 ); values = allocate_test_space( num_tests, num_events1 ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_profil( profbuf, length, my_start, 65536, EventSet1, PAPI_event, THR, PAPI_PROFIL_POSIX ); if ( retval ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); if ( ( retval = PAPI_start( EventSet1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( *( int * ) arg ); if ( ( retval = PAPI_stop( EventSet1, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; /* to remove the profile flag */ retval = PAPI_profil( profbuf, length, my_start, 65536, EventSet1, PAPI_event, 0, PAPI_PROFIL_POSIX ); if ( retval ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { if ( mask1 == 0x3 ) { printf( "Thread %#x PAPI_TOT_INS : \t%lld\n", ( int ) pthread_self( ), ( values[0] )[0] ); } else { printf( "Thread %#x PAPI_FP_INS : \t%lld\n", ( int ) pthread_self( ), ( values[0] )[0] ); } printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", ( int ) pthread_self( ), ( values[0] )[1] ); printf( "Thread %#x Real usec : \t%lld\n", ( int ) pthread_self( ), elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", ( int ) pthread_self( ), elapsed_cyc ); printf( "Test case: PAPI_profil() for pthreads\n" ); printf( "----Profile buffer for Thread %#x---\n", ( int ) pthread_self( ) ); for ( i = 0; i < ( int ) length; i++ ) { if ( profbuf[i] ) printf( "0x%lx\t%d\n", ( unsigned long ) ( my_start + 2 * i ), profbuf[i] ); } } for ( i = 0; i < ( int ) length; i++ ) if ( profbuf[i] ) break; if ( i >= ( int ) length ) test_fail( __FILE__, __LINE__, "No information in buffers", 1 ); free_test_space( values, num_tests ); retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( NULL ); } int main( int argc, char **argv ) { pthread_t id[NUM_THREADS]; int flops[NUM_THREADS]; int i, rc, retval; pthread_attr_t attr; long long elapsed_us, elapsed_cyc; const PAPI_exe_info_t *prginfo = NULL; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } if ( ( prginfo = PAPI_get_executable_info( ) ) == NULL ) { retval = 1; test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", retval ); } my_start = prginfo->address_info.text_start; my_end = prginfo->address_info.text_end; length = ( unsigned int ) ( my_end - my_start ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif for ( i = 0; i < NUM_THREADS; i++ ) { flops[i] = FLOPS * ( i + 1 ); rc = pthread_create( &id[i], &attr, Thread, ( void * ) &flops[i] ); if ( rc ) return ( FAILURE ); } for ( i = 0; i < NUM_THREADS; i++ ) pthread_join( id[i], NULL ); pthread_attr_destroy( &attr ); elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; if ( !TESTS_QUIET ) { printf( "Master real usec : \t%lld\n", elapsed_us ); printf( "Master real cycles : \t%lld\n", elapsed_cyc ); } test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); exit( 1 ); } papi-5.3.0/src/ctests/reset.c0000600003276200002170000002155612247131121015527 0ustar ralphundrgrad/* This file performs the following test: start, read, stop and again functionality - It attempts to use the following three counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS or PAPI_TOT_INS if PAPI_FP_INS doesn't exist + PAPI_TOT_CYC 1 - Start counters - Do flops - Stop counters 2 - Start counters - Do flops - Stop counters (should duplicate above) 3 - Reset counters (should be redundant if stop works properly) - Start counters - Do flops - Stop counters 4 - Start counters - Do flops/2 - Read counters (flops/2;counters keep counting) 5 - Do flops/2 - Read counters (2flops/2; counters keep counting) 6 - Do flops/2 - Read counters (3*flops/2; counters keep counting) - Accum counters (2*(3*flops.2); counters clear and counting) 7 - Do flops/2 - Read counters (flops/2; counters keep counting) 8 - Reset (counters set to zero; still counting) - Stop counters (flops/2; counters stopped) 9 - Reset (counters set to zero; still counting) - Do flops/2 - Stop counters (flops/2; counters stopped) 9 - Reset (counters set to zero and stopped) - Read counters (should be zero) */ #include "papi_test.h" int main( int argc, char **argv ) { int retval, num_tests = 9, num_events, tmp, i; long long **values; int EventSet = PAPI_NULL; int PAPI_event, mask; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet = add_two_events( &num_events, &PAPI_event, &mask ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); values = allocate_test_space( num_tests, num_events ); /*===== Test 1: Start/Stop =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 2 Start/Stop =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 3: Reset/Start/Stop =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[2] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 4: Start/Read =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[3] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 5: Read =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[4] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 6: Read/Accum =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[5] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } retval = PAPI_accum( EventSet, values[5] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_accum", retval ); } /*===== Test 7: Read =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[6] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 8 Reset/Stop =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } do_flops( NUM_FLOPS / 2 ); retval = PAPI_stop( EventSet, values[7] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 9: Reset/Read =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } retval = PAPI_read( EventSet, values[8] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } remove_test_events( &EventSet, mask ); printf( "Test case: Start/Stop/Read/Accum/Reset.\n" ); printf( "----------------------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); sprintf( add_event_str, "%s:", event_name ); printf( " PAPI_TOT_CYC %s\n", event_name ); printf( "1. start,ops,stop %10lld %10lld\n", values[0][0], values[0][1] ); printf( "2. start,ops,stop %10lld %10lld\n", values[1][0], values[1][1] ); printf( "3. reset,start,ops,stop %10lld %10lld\n", values[2][0], values[2][1] ); printf( "4. start,ops/2,read %10lld %10lld\n", values[3][0], values[3][1] ); printf( "5. ops/2,read %10lld %10lld\n", values[4][0], values[4][1] ); printf( "6. ops/2,accum %10lld %10lld\n", values[5][0], values[5][1] ); printf( "7. ops/2,read %10lld %10lld\n", values[6][0], values[6][1] ); printf( "8. reset,ops/2,stop %10lld %10lld\n", values[7][0], values[7][1] ); printf( "9. reset,read %10lld %10lld\n", values[8][0], values[8][1] ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Row 1 approximately equals rows 2 and 3 \n" ); printf( "Row 4 approximately equals 1/2 of row 3\n" ); printf( "Row 5 approximately equals twice row 4\n" ); printf( "Row 6 approximately equals 6 times row 4\n" ); printf( "Rows 7 and 8 approximately equal row 4\n" ); printf( "Row 9 equals 0\n" ); printf( "%% difference between %s 1 & 2: %.2f\n", "PAPI_TOT_CYC", 100.0 * ( float ) values[0][0] / ( float ) values[1][0] ); printf( "%% difference between %s 1 & 2: %.2f\n", add_event_str, 100.0 * ( float ) values[0][1] / ( float ) values[1][1] ); for ( i = 0; i <= 1; i++ ) { if ( !approx_equals ( ( double ) values[0][i], ( double ) values[1][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[1][i], ( double ) values[2][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[2][i], ( double ) values[3][i] * 2.0 ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[2][i], ( double ) values[4][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[5][i], ( double ) values[3][i] * 6.0 ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[6][i], ( double ) values[3][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[7][i], ( double ) values[3][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( values[8][i] != 0LL ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/first.c0000600003276200002170000001507112247131121015527 0ustar ralphundrgrad/* This file performs the following test: start, read, stop and again functionality - It attempts to use the following three counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS or PAPI_TOT_INS if PAPI_FP_INS doesn't exist + PAPI_TOT_CYC - Start counters - Do flops - Read counters - Reset counters - Do flops - Read counters - Do flops - Read counters - Do flops - Stop and read counters - Read counters */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 5, num_events, tmp; long long **values; int EventSet = PAPI_NULL; int PAPI_event, mask; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; long long min, max; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet = add_two_events( &num_events, &PAPI_event, &mask ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); /* Allocate space for results */ values = allocate_test_space( num_tests, num_events ); /* Start PAPI */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* Benchmark code */ do_flops( NUM_FLOPS ); /* read results 0 */ retval = PAPI_read( EventSet, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /* Reset */ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } /* Benchmark some more */ do_flops( NUM_FLOPS ); /* Read Results 1 */ retval = PAPI_read( EventSet, values[1] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /* Benchmark some more */ do_flops( NUM_FLOPS ); /* Read results 2 */ retval = PAPI_read( EventSet, values[2] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /* Benchmark some more */ do_flops( NUM_FLOPS ); /* Read results 3 */ retval = PAPI_stop( EventSet, values[3] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* Read results 4 */ retval = PAPI_read( EventSet, values[4] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /* remove results. We never stop??? */ remove_test_events( &EventSet, mask ); if ( !TESTS_QUIET ) { printf( "Test case 1: Non-overlapping start, stop, read.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); printf( "Test type : 1 2 3 4 5\n" ); sprintf( add_event_str, "%s:", event_name ); printf( TAB5, add_event_str, values[0][1], values[1][1], values[2][1], values[3][1], values[4][1] ); printf( TAB5, "PAPI_TOT_CYC:", values[0][0], values[1][0], values[2][0], values[3][0], values[4][0] ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Row 1 Column 1 at least %d\n", NUM_FLOPS ); printf( "%% difference between %s 1 & 2: %.2f\n", add_event_str, 100.0 * ( float ) values[0][1] / ( float ) values[1][1] ); printf( "%% difference between %s 1 & 2: %.2f\n", "PAPI_TOT_CYC", 100.0 * ( float ) values[0][0] / ( float ) values[1][0] ); printf( "Column 1 approximately equals column 2\n" ); printf( "Column 3 approximately equals 2 * column 2\n" ); printf( "Column 4 approximately equals 3 * column 2\n" ); printf( "Column 4 exactly equals column 5\n" ); } /* Validation */ /* Check cycles constraints */ min = ( long long ) ( ( double ) values[1][0] * .8 ); max = ( long long ) ( ( double ) values[1][0] * 1.2 ); /* Check constraint Col1=Col2 */ if ( values[0][0] > max || values[0][0] < min ) { test_fail( __FILE__, __LINE__, "Cycle Col1!=Col2", 1 ); } /* Check constraint col3 == 2*col2 */ if ( (values[2][0] > ( 2 * max )) || (values[2][0] < ( 2 * min )) ) { test_fail( __FILE__, __LINE__, "Cycle Col3!=2*Col2", 1 ); } /* Check constraint col4 == 3*col2 */ if ( (values[3][0] > ( 3 * max )) || (values[3][0] < ( 3 * min )) ) { test_fail( __FILE__, __LINE__, "Cycle Col3!=3*Col2", 1 ); } /* Check constraint col4 == col5 */ if ( values[3][0] != values[4][0] ) { test_fail( __FILE__, __LINE__, "Cycle Col4!=Col5", 1 ); } /* Check FLOP constraints */ min = ( long long ) ( ( double ) values[1][1] * .9 ); max = ( long long ) ( ( double ) values[1][1] * 1.1 ); /* Check constraint Col1=Col2 */ if ( values[0][1] > max || values[0][1] < min ) { test_fail( __FILE__, __LINE__, "FLOP Col1!=Col2", 1 ); } /* Check constraint col3 == 2*col2 */ if ( (values[2][1] > ( 2 * max )) || (values[2][1] < ( 2 * min )) ) { test_fail( __FILE__, __LINE__, "FLOP Col3!=2*Col2", 1 ); } /* Check constraint col4 == 3*col2 */ if ( (values[3][1] > ( 3 * max )) || (values[3][1] < ( 3 * min )) ) { test_fail( __FILE__, __LINE__, "FLOP Col4!=3*Col2", 1 ); } /* Check constraint col4 == col5 */ if (values[3][1] != values[4][1]) { test_fail( __FILE__, __LINE__, "FLOP Col4!=Col5", 1 ); } /* Check flops are sane */ if (values[0][1] < ( long long ) NUM_FLOPS ) { test_fail( __FILE__, __LINE__, "FLOP sanity", 1 ); } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/case2.c0000600003276200002170000000432212247131121015372 0ustar ralphundrgrad/* From Dave McNamara at PSRV. Thanks! */ /* If an event is countable but you've exhausted the counter resources and you try to add an event, it seems subsequent PAPI_start and/or PAPI_stop will causes a Seg. Violation. I got around this by calling PAPI to get the # of countable events, then making sure that I didn't try to add more than these number of events. I still have a problem if someone adds Level 2 cache misses and then adds FLOPS 'cause I didn't count FLOPS as actually requiring 2 counters. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { double c, a = 0.999, b = 1.001; int n = 1000; int EventSet = PAPI_NULL; int retval; int j = 0, i; long long g1[3]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( PAPI_query_event( PAPI_BR_CN ) == PAPI_OK ) j++; if ( j == 1 && ( retval = PAPI_add_event( EventSet, PAPI_BR_CN ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } i = j; if ( PAPI_query_event( PAPI_TOT_CYC ) == PAPI_OK ) j++; if ( j == ( i + 1 ) && ( retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } i = j; if ( PAPI_query_event( PAPI_TOT_INS ) == PAPI_OK ) j++; if ( j == ( i + 1 ) && ( retval = PAPI_add_event( EventSet, PAPI_TOT_INS ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } if ( j ) { if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); for ( i = 0; i < n; i++ ) { c = a * b; } if (!TESTS_QUIET) fprintf(stdout,"c=%lf\n",c); if ( ( retval = PAPI_stop( EventSet, g1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/case1.c0000600003276200002170000000400412247131121015366 0ustar ralphundrgrad/* From Dave McNamara at PSRV. Thanks! */ /* If you try to add an event that doesn't exist, you get the correct error message, yet you get subsequent Seg. Faults when you try to do PAPI_start and PAPI_stop. I would expect some bizarre behavior if I had no events added to the event set and then tried to PAPI_start but if I had successfully added one event, then the 2nd one get an error when I tried to add it, is it possible for PAPI_start to work but just count the first event? */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { double c, a = 0.999, b = 1.001; int n = 1000; int EventSet = PAPI_NULL; int retval; int i, j = 0; long long g1[2]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( PAPI_query_event( PAPI_L2_TCM ) == PAPI_OK ) j++; if ( j == 1 && ( retval = PAPI_add_event( EventSet, PAPI_L2_TCM ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); j--; /* The event was not added */ } i = j; if ( PAPI_query_event( PAPI_L2_DCM ) == PAPI_OK ) j++; if ( j == ( i + 1 ) && ( retval = PAPI_add_event( EventSet, PAPI_L2_DCM ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); j--; /* The event was not added */ } if ( j ) { if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); for ( i = 0; i < n; i++ ) { c = a * b; } if (!TESTS_QUIET) fprintf(stdout,"c=%lf\n",c); if ( ( retval = PAPI_stop( EventSet, g1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/flops.c0000600003276200002170000000542612247131121015526 0ustar ralphundrgrad/* * A simple example for the use of PAPI, the number of flops you should * get is about INDEX^3 on machines that consider add and multiply one flop * such as SGI, and 2*(INDEX^3) that don't consider it 1 flop such as INTEL * -Kevin London */ #include "papi_test.h" #define INDEX 1000 char format_string[] = { "Real_time: %f Proc_time: %f Total flpins: %lld MFLOPS: %f\n" }; extern int TESTS_QUIET; /* Declared in test_utils.c */ extern void dummy( void * ); float matrixa[INDEX][INDEX], matrixb[INDEX][INDEX], mresult[INDEX][INDEX]; int main( int argc, char **argv ) { float real_time, proc_time, mflops; long long flpins; int retval; int i, j, k, fip = 0; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) fip = 1; else if ( PAPI_query_event( PAPI_FP_OPS ) == PAPI_OK ) fip = 2; else { if ( !TESTS_QUIET ) printf ( "PAPI_FP_INS and PAPI_FP_OPS are not defined for this platform.\n" ); } PAPI_shutdown( ); if ( fip > 0 ) { /* Initialize the Matrix arrays */ for ( i = 0; i < INDEX; i++ ) { for ( j = 0; j < INDEX; j++) { mresult[j][i] = 0.0; matrixa[j][i] = matrixb[j][i] = ( float ) rand( ) * ( float ) 1.1; } } /* Setup PAPI library and begin collecting data from the counters */ if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } /* Matrix-Matrix multiply */ for ( i = 0; i < INDEX; i++ ) for ( j = 0; j < INDEX; j++ ) for ( k = 0; k < INDEX; k++ ) mresult[i][j] = mresult[i][j] + matrixa[i][k] * matrixb[k][j]; /* Collect the data into the variables passed in */ if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } dummy( ( void * ) mresult ); if ( !TESTS_QUIET ) { if ( fip == 1 ) { printf( "Real_time: %f Proc_time: %f Total flpins: ", real_time, proc_time ); } else { printf( "Real_time: %f Proc_time: %f Total flpops: ", real_time, proc_time ); } printf( LLDFMT, flpins ); printf( " MFLOPS: %f\n", mflops ); } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/forkexec4.c0000600003276200002170000000276612247131121016301 0ustar ralphundrgrad/* * File: forkexec4.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: PAPI_library_init() ** unlike forkexec2/forkexec3, no shutdown here ** fork() / \ parent child wait() PAPI_library_init() execlp() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); if ( fork( ) == 0 ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "forked PAPI_library_init", retval ); if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/zero_attach.c0000600003276200002170000001422612247131121016704 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for attached processes. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #include #ifdef _AIX #define _LINUX_SOURCE_COMPAT #endif #if defined(__FreeBSD__) # define PTRACE_ATTACH PT_ATTACH # define PTRACE_CONT PT_CONTINUE #endif int wait_for_attach_and_loop( void ) { kill( getpid( ), SIGSTOP ); do_flops( NUM_FLOPS ); kill( getpid( ), SIGSTOP ); return 0; } int main( int argc, char **argv ) { int status, retval, num_tests = 1, tmp; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; const PAPI_component_info_t *cmpinfo; pid_t pid; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Initialize the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); } if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) { test_fail_exit( __FILE__, __LINE__, "PAPI_get_component_info", 0 ); } if ( cmpinfo->attach == 0 ) { test_skip( __FILE__, __LINE__, "Platform does not support attaching", 0 ); } pid = fork( ); if ( pid < 0 ) { test_fail( __FILE__, __LINE__, "fork()", PAPI_ESYS ); } if ( pid == 0 ) { exit( wait_for_attach_and_loop( ) ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_ATTACH, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) test_fail( __FILE__, __LINE__, "Child process didnt return true to WIFSTOPPED", 0 ); } retval = PAPI_attach( EventSet1, ( unsigned long ) pid ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_attach", retval ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); /* num_events1 is greater than num_events2 so don't worry. */ values = allocate_test_space( num_tests, num_events1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } remove_test_events( &EventSet1, mask1 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFEXITED( status ) == 0 ) test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFEXITED", 0 ); printf( "Test case: 3rd party attach start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\n" ); sprintf( add_event_str, "%-12s : \t", event_name ); printf( TAB1, add_event_str, values[0][1] ); printf( TAB1, "PAPI_TOT_CYC : \t", values[0][0] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification: none\n" ); test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/multiattach.c0000600003276200002170000002434612247131121016724 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for multiple attached processes. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #include #include #ifdef _AIX #define _LINUX_SOURCE_COMPAT #endif #if defined(__FreeBSD__) # define PTRACE_ATTACH PT_ATTACH # define PTRACE_CONT PT_CONTINUE #endif int wait_for_attach_and_loop( int num ) { kill( getpid( ), SIGSTOP ); do_flops( NUM_FLOPS * num ); kill( getpid( ), SIGSTOP ); return ( 0 ); } int main( int argc, char **argv ) { int status, retval, num_tests = 2, tmp; int EventSet1 = PAPI_NULL, EventSet2 = PAPI_NULL; int PAPI_event, PAPI_event2, mask1, mask2; int num_events1, num_events2; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; const PAPI_component_info_t *cmpinfo; pid_t pid, pid2; double ratio1,ratio2; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* get the component info and check if we support attach */ if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) { test_fail_exit( __FILE__, __LINE__, "PAPI_get_component_info", 0 ); } if ( cmpinfo->attach == 0 ) { test_skip( __FILE__, __LINE__, "Platform does not support attaching", 0 ); } /* fork off first child */ pid = fork( ); if ( pid < 0 ) { test_fail_exit( __FILE__, __LINE__, "fork()", PAPI_ESYS ); } if ( pid == 0 ) { exit( wait_for_attach_and_loop( 1 ) ); } /* fork off second child, does twice as much */ pid2 = fork( ); if ( pid2 < 0 ) { test_fail_exit( __FILE__, __LINE__, "fork()", PAPI_ESYS ); } if ( pid2 == 0 ) { exit( wait_for_attach_and_loop( 2 ) ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); EventSet2 = add_two_events( &num_events2, &PAPI_event2, &mask2 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_ATTACH, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1 ; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didnt return true to WIFSTOPPED", 0 ); } if ( ptrace( PTRACE_ATTACH, pid2, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid2, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didnt return true to WIFSTOPPED", 0 ); } } retval = PAPI_attach( EventSet1, ( unsigned long ) pid ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_attach", retval ); } retval = PAPI_attach( EventSet2, ( unsigned long ) pid2 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_attach", retval ); } retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); /* num_events1 is greater than num_events2 so don't worry. */ values = allocate_test_space( num_tests, num_events1 ); /* Gather before values */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } if ( ptrace( PTRACE_CONT, pid2, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( waitpid( pid2, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } /* start first child */ retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* start second child */ retval = PAPI_start( EventSet2 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } if ( ptrace( PTRACE_CONT, pid2, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid2, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; /* stop first child */ retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { printf( "Warning: PAPI_stop returned error %d, probably ok.\n", retval ); } /* stop second child */ retval = PAPI_stop( EventSet2, values[1] ); if ( retval != PAPI_OK ) { printf( "Warning: PAPI_stop returned error %d, probably ok.\n", retval ); } remove_test_events( &EventSet1, mask1 ); remove_test_events( &EventSet2, mask2 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( ptrace( PTRACE_CONT, pid2, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFEXITED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFEXITED", 0 ); } if ( waitpid( pid2, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFEXITED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFEXITED", 0 ); } /* This code isn't necessary as we know the child has exited, */ /* it *may* return an error if the component so chooses. You */ /* should use read() instead. */ printf( "Test case: multiple 3rd party attach start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); sprintf( add_event_str, "(PID %jd) %-12s : \t", ( intmax_t ) pid, event_name ); printf( TAB1, add_event_str, values[0][1] ); sprintf( add_event_str, "(PID %jd) PAPI_TOT_CYC : \t", ( intmax_t ) pid ); printf( TAB1, add_event_str, values[0][0] ); sprintf( add_event_str, "(PID %jd) %-12s : \t", ( intmax_t ) pid2, event_name ); printf( TAB1, add_event_str,values[1][1] ); sprintf( add_event_str, "(PID %jd) PAPI_TOT_CYC : \t", ( intmax_t ) pid2 ); printf( TAB1, add_event_str, values[1][0] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf ( "-------------------------------------------------------------------------\n" ); printf("Verification: pid %d results should be twice pid %d\n",pid2,pid ); ratio1=(double)values[1][0]/(double)values[0][0]; ratio2=(double)values[1][1]/(double)values[0][1]; printf("\t%lld/%lld = %lf\n",values[1][0],values[0][0],ratio1); if ((ratio1 >2.15 ) || (ratio1 < 1.85)) { printf("Ratio out of range, should be ~2.0 not %lf\n",ratio1); test_fail( __FILE__, __LINE__, "Error: Counter ratio not two", 0 ); } printf("\t%lld/%lld = %lf\n",values[1][1],values[0][1],ratio2); if ((ratio2 >2.75 ) || (ratio2 < 1.25)) { printf("Ratio out of range, should be ~2.0, not %lf\n",ratio2); test_fail( __FILE__, __LINE__, "Known issue: Counter ratio not two", 0 ); } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/all_events.c0000600003276200002170000000405412247131121016533 0ustar ralphundrgrad/* This file tries to add,start,stop all events in a component it * is meant not to test the accuracy of the mapping but to make sure * that all events in the component will at least start (Helps to * catch typos. * * Author: Kevin London * london@cs.utk.edu */ #include "papi_test.h" int main( int argc, char **argv ) { int retval, i; int EventSet = PAPI_NULL, count = 0, err_count = 0; long long values; PAPI_event_info_t info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_MAX_PRESET_EVENTS; i++ ) { if ( PAPI_get_event_info( PAPI_PRESET_MASK | i, &info ) != PAPI_OK ) continue; if ( !( info.count ) ) continue; printf( "Adding %-14s", info.symbol ); retval = PAPI_add_event( EventSet, ( int ) info.event_code ); if ( retval != PAPI_OK ) { PAPI_perror( "PAPI_add_event" ); err_count++; } else { retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { PAPI_perror( "PAPI_start" ); err_count++; } else { retval = PAPI_stop( EventSet, &values ); if ( retval != PAPI_OK ) { PAPI_perror( "PAPI_stop" ); err_count++; } else { printf( "successful\n" ); count++; } } retval = PAPI_remove_event( EventSet, ( int ) info.event_code ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event", retval ); } } retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); printf( "Successfully added, started and stopped %d events.\n", count ); if ( err_count ) printf( "Failed to add, start or stop %d events.\n", err_count ); if ( count > 0 ) test_pass( __FILE__, NULL, 0 ); else test_fail( __FILE__, __LINE__, "No events added", 1 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_twoevents.c0000600003276200002170000002014712247131121020361 0ustar ralphundrgrad/* * File: overflow_twoevents.c * CVS: $Id$ * Author: min@cs.utk.edu * Min Zhou * Mods: Philip Mucci * mucci@cs.utk.edu */ /* This file performs the following test: overflow dispatch on 2 counters. */ #include "papi_test.h" #define OVER_FMT "handler(%d) Overflow at %p! vector=0x%llx\n" #define OUT_FMT "%-12s : %18lld%18lld%18lld\n" #define VEC_FMT " at vector 0x%llx, event %-12s : %6d\n" typedef struct { long long mask; int count; } ocount_t; /* there are two experiments: batch and interleaf; for each experiment there are three possible vectors, one counter overflows, the other counter overflows, both overflow */ ocount_t overflow_counts[2][3] = { {{0, 0}, {0, 0}, {0, 0}}, {{0, 0}, {0, 0}, {0, 0}} }; int total_unknown = 0; void handler( int mode, void *address, long long overflow_vector, void *context ) { ( void ) context; /*unused */ int i; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, mode, address, overflow_vector ); } /* Look for the overflow_vector entry */ for ( i = 0; i < 3; i++ ) { if ( overflow_counts[mode][i].mask == overflow_vector ) { overflow_counts[mode][i].count++; return; } } /* Didn't find it so add it. */ for ( i = 0; i < 3; i++ ) { if ( overflow_counts[mode][i].mask == ( long long ) 0 ) { overflow_counts[mode][i].mask = overflow_vector; overflow_counts[mode][i].count = 1; return; } } /* Unknown entry!?! */ total_unknown++; } void handler_batch( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) EventSet; /*unused */ handler( 0, address, overflow_vector, context ); } void handler_interleaf( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) EventSet; /*unused */ handler( 1, address, overflow_vector, context ); } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long ( values[3] )[2]; int retval; int PAPI_event, k, idx[4]; char event_name[3][PAPI_MAX_STR_LEN]; int num_events1; int threshold = THRESHOLD; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* decide which of PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS to add, depending on the availability and derived status of the event on this platform */ if ( ( PAPI_event = find_nonderived_event( ) ) == 0 ) test_fail( __FILE__, __LINE__, "no PAPI_event", 0 ); if ( ( retval = PAPI_add_event( EventSet, PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( ( retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); if ( ( retval = PAPI_stop( EventSet, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* Set both overflows after adding both events (batch) */ if ( ( retval = PAPI_overflow( EventSet, PAPI_event, threshold, 0, handler_batch ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( ( retval = PAPI_overflow( EventSet, PAPI_TOT_CYC, threshold, 0, handler_batch ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); num_events1 = 1; retval = PAPI_get_overflow_event_index( EventSet, 1, &idx[0], &num_events1 ); if ( retval != PAPI_OK ) printf( "PAPI_get_overflow_event_index error: %s\n", PAPI_strerror( retval ) ); num_events1 = 1; retval = PAPI_get_overflow_event_index( EventSet, 2, &idx[1], &num_events1 ); if ( retval != PAPI_OK ) printf( "PAPI_get_overflow_event_index error: %s\n", PAPI_strerror( retval ) ); if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); /* Add each event and set its overflow (interleaved) */ if ( ( retval = PAPI_add_event( EventSet, PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( ( retval = PAPI_overflow( EventSet, PAPI_event, threshold, 0, handler_interleaf ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( ( retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( ( retval = PAPI_overflow( EventSet, PAPI_TOT_CYC, threshold, 0, handler_interleaf ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); if ( ( retval = PAPI_stop( EventSet, values[2] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); num_events1 = 1; retval = PAPI_get_overflow_event_index( EventSet, 1, &idx[2], &num_events1 ); if ( retval != PAPI_OK ) printf( "PAPI_get_overflow_event_index error: %s\n", PAPI_strerror( retval ) ); num_events1 = 1; retval = PAPI_get_overflow_event_index( EventSet, 2, &idx[3], &num_events1 ); if ( retval != PAPI_OK ) printf( "PAPI_get_overflow_event_index error: %s\n", PAPI_strerror( retval ) ); if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); if ( ( retval = PAPI_event_code_to_name( PAPI_TOT_CYC, event_name[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); strcpy( event_name[2], "Unknown" ); printf ( "Test case: Overflow dispatch of both events in set with 2 events.\n" ); printf ( "---------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", threshold ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-----------------------------------------------\n" ); printf( "Test type : %18s%18s%18s\n", "1 (no overflow)", "2 (batch)", "3 (interleaf)" ); printf( OUT_FMT, event_name[0], ( values[0] )[0], ( values[1] )[0], ( values[2] )[0] ); printf( OUT_FMT, event_name[1], ( values[0] )[1], ( values[1] )[1], ( values[2] )[1] ); printf( "\n" ); printf( "Predicted overflows at event %-12s : %6d\n", event_name[0], ( int ) ( ( values[0] )[0] / threshold ) ); printf( "Predicted overflows at event %-12s : %6d\n", event_name[1], ( int ) ( ( values[0] )[1] / threshold ) ); printf( "\nBatch overflows (add, add, over, over):\n" ); for ( k = 0; k < 2; k++ ) { if ( overflow_counts[0][k].mask ) { printf( VEC_FMT, ( long long ) overflow_counts[0][k].mask, event_name[idx[k]], overflow_counts[0][k].count ); } } printf( "\nInterleaved overflows (add, over, add, over):\n" ); for ( k = 0; k < 2; k++ ) { if ( overflow_counts[1][k].mask ) printf( VEC_FMT, ( long long ) overflow_counts[1][k].mask, event_name[idx[k + 2]], overflow_counts[1][k].count ); } printf( "\nCases 2+3 Unknown overflows: %d\n", total_unknown ); printf( "-----------------------------------------------\n" ); if ( overflow_counts[0][0].count == 0 || overflow_counts[0][1].count == 0 ) test_fail( __FILE__, __LINE__, "a batch counter had no overflows", 1 ); if ( overflow_counts[1][0].count == 0 || overflow_counts[1][1].count == 0 ) test_fail( __FILE__, __LINE__, "an interleaved counter had no overflows", 1 ); if ( total_unknown > 0 ) test_fail( __FILE__, __LINE__, "Unknown counter had overflows", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/cmpinfo.c0000600003276200002170000000554612247131121016041 0ustar ralphundrgrad/* * File: cmpinfo.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ #include #include #include "papi_test.h" int main( int argc, char **argv ) { int retval; const PAPI_component_info_t *cmpinfo; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_component_info", retval ); printf( "name: %s\n", cmpinfo->name ); printf( "component_version: %s\n", cmpinfo->version ); printf( "support_version: %s\n", cmpinfo->support_version ); printf( "kernel_version: %s\n", cmpinfo->kernel_version ); printf( "num_cntrs: %d\n", cmpinfo->num_cntrs ); printf( "num_mpx_cntrs: %d\n", cmpinfo->num_mpx_cntrs ); printf( "num_preset_events: %d\n", cmpinfo->num_preset_events ); /* Number of counters the component supports */ printf( "num_native_events: %d\n", cmpinfo->num_native_events ); /* Number of counters the component supports */ printf( "default_domain: %#x (%s)\n", cmpinfo->default_domain, stringify_all_domains( cmpinfo->default_domain ) ); printf( "available_domains: %#x (%s)\n", cmpinfo->available_domains, stringify_all_domains( cmpinfo->available_domains ) ); /* Available domains */ printf( "default_granularity: %#x (%s)\n", cmpinfo->default_granularity, stringify_granularity( cmpinfo->default_granularity ) ); /* The default granularity when this component is used */ printf( "available_granularities: %#x (%s)\n", cmpinfo->available_granularities, stringify_all_granularities( cmpinfo->available_granularities ) ); /* Available granularities */ printf( "hardware_intr_sig: %d\n", cmpinfo->hardware_intr_sig ); printf( "hardware_intr: %d\n", cmpinfo->hardware_intr ); /* Needs hw overflow intr to be emulated in software */ printf( "precise_intr: %d\n", cmpinfo->precise_intr ); /* Performance interrupts happen precisely */ printf( "posix1b_timers: %d\n", cmpinfo->posix1b_timers ); /* Performance interrupts happen precisely */ printf( "kernel_profile: %d\n", cmpinfo->kernel_profile ); /* Needs kernel profile support (buffered interrupts) to be emulated */ printf( "kernel_multiplex: %d\n", cmpinfo->kernel_multiplex ); /* In kernel multiplexing */ printf( "fast_counter_read: %d\n", cmpinfo->fast_counter_read ); /* Has a fast counter read */ printf( "fast_real_timer: %d\n", cmpinfo->fast_real_timer ); /* Has a fast real timer */ printf( "fast_virtual_timer: %d\n", cmpinfo->fast_virtual_timer ); /* Has a fast virtual timer */ printf( "attach: %d\n", cmpinfo->attach ); /* Supports attach */ printf( "attach_must_ptrace: %d\n", cmpinfo->attach_must_ptrace ); /* */ test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/val_omp.c0000600003276200002170000001217312247131121016035 0ustar ralphundrgrad/* This file performs the following test: each OMP thread measures flops for its provided tasks, and compares this to expected flop counts, each thread having been provided with a random amount of work, such that the time and order that they complete their measurements varies. Specifically tested is the case where the value returned for some threads actually corresponds to that for another thread reading its counter values at the same time. - It is based on zero_omp.c but ignored much of its functionality. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC Each thread inside the Thread routine: - Do prework (MAX_FLOPS - flops) - Get cyc. - Get us. - Start counters - Do flops - Stop and read counters - Get us. - Get cyc. - Return flops */ #include "papi_test.h" #ifdef _OPENMP #include #else #error "This compiler does not understand OPENMP" #endif const int MAX_FLOPS = NUM_FLOPS; extern int TESTS_QUIET; /* Declared in test_utils.c */ const PAPI_hw_info_t *hw_info = NULL; long long Thread( int n ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1; long long flops; long long **values; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN]; /* printf("Thread(n=%d) %#x started\n", n, omp_get_thread_num()); */ num_events1 = 2; /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); values = allocate_test_space( num_tests, num_events1 ); do_flops( MAX_FLOPS - n ); /* prework for balance */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( n ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); flops = ( values[0] )[0]; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { /*printf("Thread %#x %-12s : \t%lld\t%d\n", omp_get_thread_num(), event_name, (values[0])[0], n); */ #if 0 printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", omp_get_thread_num( ), values[0][0] ); printf( "Thread %#x Real usec : \t%lld\n", omp_get_thread_num( ), elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", omp_get_thread_num( ), elapsed_cyc ); #endif } /* It is illegal for the threads to exit in OpenMP */ /* test_pass(__FILE__,0,0); */ free_test_space( values, num_tests ); PAPI_unregister_thread( ); /* printf("Thread %#x finished\n", omp_get_thread_num()); */ return flops; } int main( int argc, char **argv ) { int tid, retval; int maxthr = omp_get_max_threads( ); int flopper = 0; long long *flops = calloc( maxthr, sizeof ( long long ) ); long long *flopi = calloc( maxthr, sizeof ( long long ) ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( maxthr < 2 ) test_skip( __FILE__, __LINE__, "omp_get_num_threads < 2", PAPI_EINVAL ); if ( ( flops == NULL ) || ( flopi == NULL ) ) test_fail( __FILE__, __LINE__, "calloc", PAPI_ENOMEM ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( omp_get_thread_num ) ); if ( retval != PAPI_OK ) if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); flopper = Thread( 65536 ) / 65536; printf( "flopper=%d\n", flopper ); for ( int i = 0; i < 100000; i++ ) #pragma omp parallel private(tid) { tid = omp_get_thread_num( ); flopi[tid] = rand( ) * 3; flops[tid] = Thread( ( flopi[tid] / flopper ) % MAX_FLOPS ); #pragma omp barrier #pragma omp master if ( flops[tid] < flopi[tid] ) { printf( "test iteration=%d\n", i ); for ( int j = 0; j < omp_get_num_threads( ); j++ ) { printf( "Thread %#x Value %6lld %c %6lld", j, flops[j], ( flops[j] < flopi[j] ) ? '<' : '=', flopi[j] ); for ( int k = 0; k < omp_get_num_threads( ); k++ ) if ( ( k != j ) && ( flops[k] == flops[j] ) ) printf( " == Thread %#x!", k ); printf( "\n" ); } test_fail( __FILE__, __LINE__, "value returned for thread", PAPI_EBUG ); } } test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/inherit.c0000600003276200002170000000605012247131121016037 0ustar ralphundrgrad#include #include #if defined(_AIX) || defined (__FreeBSD__) || defined (__APPLE__) #include /* ARGH! */ #else #include #endif #include "papi_test.h" int main( int argc, char **argv ) { int retval, pid, status, EventSet = PAPI_NULL; long long int values[] = {0,0}; PAPI_option_t opt; tests_quiet( argc, argv ); if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( retval = PAPI_assign_eventset_component( EventSet, 0 ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); memset( &opt, 0x0, sizeof ( PAPI_option_t ) ); opt.inherit.inherit = PAPI_INHERIT_ALL; opt.inherit.eventset = EventSet; if ( ( retval = PAPI_set_opt( PAPI_INHERIT, &opt ) ) != PAPI_OK ) { if ( retval == PAPI_ECMP) { test_skip( __FILE__, __LINE__, "Inherit not supported by current component.\n", retval ); } else { test_fail_exit( __FILE__, __LINE__, "PAPI_set_opt", retval ); } } if ( ( retval = PAPI_query_event( PAPI_TOT_CYC ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_query_event", retval ); if ( ( retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_query_event( PAPI_FP_INS ); if ( retval == PAPI_ENOEVNT ) { test_warn( __FILE__, __LINE__, "PAPI_FP_INS", retval); values[1] = NUM_FLOPS; /* fake a return value to pass the test */ } else if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_query_event", retval ); else if ( ( retval = PAPI_add_event( EventSet, PAPI_FP_INS ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_start", retval ); pid = fork( ); if ( pid == 0 ) { do_flops( NUM_FLOPS ); exit( 0 ); } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop", retval ); if (!TESTS_QUIET) { printf( "Test case inherit: parent starts, child works, parent stops.\n" ); printf( "------------------------------------------------------------\n" ); printf( "Test run : \t1\n" ); printf( "PAPI_FP_INS : \t%lld\n", values[1] ); printf( "PAPI_TOT_CYC: \t%lld\n", values[0] ); printf( "------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Row 1 at least %d\n", NUM_FLOPS ); printf( "Row 2 greater than row 1\n"); } if ( values[1] < NUM_FLOPS) { test_fail( __FILE__, __LINE__, "PAPI_FP_INS", 1 ); } if ( values[0] < values[1]) { test_fail( __FILE__, __LINE__, "PAPI_TOT_CYC < PAPI_FP_INS", 1 ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/earprofile.c0000600003276200002170000001223312247131121016525 0ustar ralphundrgrad/* * File: profile.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: Dan Terpstra * terpstra@cs.utk.edu * Mods: * */ /* This file performs the following test: profiling and program info option call - This tests the SVR4 profiling interface of PAPI. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). The Eventset contains: + PAPI_FP_INS (to profile) + PAPI_TOT_CYC - Set up profile - Start eventset 1 - Do both (flops and reads) - Stop eventset 1 */ #include "papi_test.h" #include "prof_utils.h" #undef THRESHOLD #define THRESHOLD 1000 static void ear_no_profile( void ) { int retval; if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_l1misses( 10000 ); if ( ( retval = PAPI_stop( EventSet, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); printf( "Test type : \tNo profiling\n" ); printf( TAB1, event_name, ( values[0] )[0] ); printf( TAB1, "PAPI_TOT_CYC:", ( values[0] )[1] ); } static int do_profile( caddr_t start, unsigned long plength, unsigned scale, int thresh, int bucket ) { int i, retval; unsigned long blength; int num_buckets; char *profstr[2] = { "PAPI_PROFIL_POSIX", "PAPI_PROFIL_INST_EAR" }; int profflags[2] = { PAPI_PROFIL_POSIX, PAPI_PROFIL_POSIX | PAPI_PROFIL_INST_EAR }; int num_profs; do_stuff( ); num_profs = sizeof ( profflags ) / sizeof ( int ); ear_no_profile( ); blength = prof_size( plength, scale, bucket, &num_buckets ); prof_alloc( num_profs, blength ); for ( i = 0; i < num_profs; i++ ) { if ( !TESTS_QUIET ) printf( "Test type : \t%s\n", profstr[i] ); if ( ( retval = PAPI_profil( profbuf[i], blength, start, scale, EventSet, PAPI_event, thresh, profflags[i] | bucket ) ) != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( TAB1, event_name, ( values[1] )[0] ); printf( TAB1, "PAPI_TOT_CYC:", ( values[1] )[1] ); } if ( ( retval = PAPI_profil( profbuf[i], blength, start, scale, EventSet, PAPI_event, 0, profflags[i] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } prof_head( blength, bucket, num_buckets, "address\t\t\tPOSIX\tINST_DEAR\n" ); prof_out( start, num_profs, bucket, num_buckets, scale ); retval = prof_check( num_profs, bucket, num_buckets ); for ( i = 0; i < num_profs; i++ ) { free( profbuf[i] ); } return ( retval ); } int main( int argc, char **argv ) { int num_events, num_tests = 6; long length; int retval, retval2; const PAPI_hw_info_t *hw_info; const PAPI_exe_info_t *prginfo; caddr_t start, end; prof_init( argc, argv, &prginfo ); if ( ( hw_info = PAPI_get_hardware_info( ) ) == NULL ) { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 0 ); } if ( ( strncasecmp( hw_info->model_string, "Itanium", strlen( "Itanium" ) ) != 0 ) && ( strncasecmp( hw_info->model_string, "32", strlen( "32" ) ) != 0 ) ) test_skip( __FILE__, __LINE__, "Test unsupported", PAPI_ENOIMPL ); if ( TESTS_QUIET ) { test_skip( __FILE__, __LINE__, "Test deprecated in quiet mode for PAPI 3.6", 0 ); } sprintf( event_name, "DATA_EAR_CACHE_LAT4" ); if ( ( retval = PAPI_event_name_to_code( event_name, &PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( retval = PAPI_add_event( EventSet, PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( ( retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); num_events = 2; values = allocate_test_space( num_tests, num_events ); /* use these lines to profile entire code address space */ start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; length = end - start; if ( length < 0 ) test_fail( __FILE__, __LINE__, "Profile length < 0!", length ); prof_print_address ( "Test earprofile: POSIX compatible event address register profiling.\n", prginfo ); prof_print_prof_info( start, end, THRESHOLD, event_name ); retval = do_profile( start, length, FULL_SCALE, THRESHOLD, PAPI_PROFIL_BUCKET_16 ); retval2 = PAPI_remove_event( EventSet, PAPI_event ); if ( retval2 == PAPI_OK ) retval2 = PAPI_remove_event( EventSet, PAPI_TOT_CYC ); if ( retval2 != PAPI_OK ) test_fail( __FILE__, __LINE__, "Can't remove events", retval2 ); if ( retval ) test_pass( __FILE__, values, num_tests ); else test_fail( __FILE__, __LINE__, "No information in buffers", 1 ); exit( 1 ); } papi-5.3.0/src/ctests/multiplex1.c0000600003276200002170000002553412247131121016511 0ustar ralphundrgrad/* * File: multiplex.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file tests the multiplex functionality, originally developed by John May of LLNL. */ #include "papi_test.h" /* Event to use in all cases; initialized in init_papi() */ #define TOTAL_EVENTS 6 int solaris_preset_PAPI_events[TOTAL_EVENTS] = { PAPI_TOT_CYC, PAPI_BR_MSP, PAPI_L2_TCM, PAPI_L1_ICM, 0 }; int power6_preset_PAPI_events[TOTAL_EVENTS] = { PAPI_TOT_CYC, PAPI_FP_INS, PAPI_L1_DCM, PAPI_L1_ICM, 0 }; int preset_PAPI_events[TOTAL_EVENTS] = { PAPI_TOT_CYC, PAPI_FP_INS, PAPI_TOT_INS, PAPI_L1_DCM, PAPI_L1_ICM, 0 }; static int PAPI_events[TOTAL_EVENTS] = { 0, }; static int PAPI_events_len = 0; #define CPP_TEST_FAIL(string, retval) test_fail(__FILE__, __LINE__, string, retval) void init_papi( int *out_events, int *len ) { int retval; int i, real_len = 0; int *in_events = preset_PAPI_events; const PAPI_hw_info_t *hw_info; /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) CPP_TEST_FAIL( "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( strstr( hw_info->model_string, "UltraSPARC" ) ) { in_events = solaris_preset_PAPI_events; } if ( strcmp( hw_info->model_string, "POWER6" ) == 0 ) { in_events = power6_preset_PAPI_events; retval = PAPI_set_domain( PAPI_DOM_ALL ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_domain", retval ); } retval = PAPI_multiplex_init( ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_multiplex_init", retval ); for ( i = 0; in_events[i] != 0; i++ ) { char out[PAPI_MAX_STR_LEN]; /* query and set up the right instruction to monitor */ retval = PAPI_query_event( in_events[i] ); if ( retval == PAPI_OK ) { out_events[real_len++] = in_events[i]; PAPI_event_code_to_name( in_events[i], out ); if ( real_len == *len ) break; } else { PAPI_event_code_to_name( in_events[i], out ); if ( !TESTS_QUIET ) printf( "%s does not exist\n", out ); } } if ( real_len < 1 ) CPP_TEST_FAIL( "No counters available", 0 ); *len = real_len; } /* Tests that PAPI_multiplex_init does not mess with normal operation. */ int case1( ) { int retval, i, EventSet = PAPI_NULL; long long values[2]; PAPI_events_len = 2; init_papi( PAPI_events, &PAPI_events_len ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } do_stuff( ); if ( PAPI_start( EventSet ) != PAPI_OK ) CPP_TEST_FAIL( "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_stop", retval ); if ( !TESTS_QUIET ) { test_print_event_header( "case1:", EventSet ); printf( TAB2, "case1:", values[0], values[1] ); } retval = PAPI_cleanup_eventset( EventSet ); /* JT */ if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); PAPI_shutdown( ); return ( SUCCESS ); } /* Tests that PAPI_set_multiplex() works before adding events */ int case2( ) { int retval, i, EventSet = PAPI_NULL; long long values[2]; PAPI_events_len = 2; init_papi( PAPI_events, &PAPI_events_len ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_assign_eventset_component", retval ); retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_multiplex", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } do_stuff( ); if ( PAPI_start( EventSet ) != PAPI_OK ) CPP_TEST_FAIL( "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_stop", retval ); if ( !TESTS_QUIET ) { test_print_event_header( "case2:", EventSet ); printf( TAB2, "case2:", values[0], values[1] ); } retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); PAPI_shutdown( ); return ( SUCCESS ); } /* Tests that PAPI_set_multiplex() works after adding events */ int case3( ) { int retval, i, EventSet = PAPI_NULL; long long values[2]; PAPI_events_len = 2; init_papi( PAPI_events, &PAPI_events_len ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_multiplex", retval ); do_stuff( ); if ( PAPI_start( EventSet ) != PAPI_OK ) CPP_TEST_FAIL( "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_stop", retval ); if ( !TESTS_QUIET ) { test_print_event_header( "case3:", EventSet ); printf( TAB2, "case3:", values[0], values[1] ); } retval = PAPI_cleanup_eventset( EventSet ); /* JT */ if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); PAPI_shutdown( ); return ( SUCCESS ); } /* Tests that PAPI_set_multiplex() works before adding events */ /* Tests that PAPI_add_event() works after PAPI_add_event()/PAPI_set_multiplex() */ int case4( ) { int retval, i, EventSet = PAPI_NULL; long long values[4]; char out[PAPI_MAX_STR_LEN]; PAPI_events_len = 2; init_papi( PAPI_events, &PAPI_events_len ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); i = 0; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_multiplex", retval ); i = 1; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); do_stuff( ); if ( PAPI_start( EventSet ) != PAPI_OK ) CPP_TEST_FAIL( "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_stop", retval ); if ( !TESTS_QUIET ) { test_print_event_header( "case4:", EventSet ); printf( TAB2, "case4:", values[0], values[1] ); } retval = PAPI_cleanup_eventset( EventSet ); /* JT */ if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); PAPI_shutdown( ); return ( SUCCESS ); } /* Tests that PAPI_read() works immediately after PAPI_start() */ int case5( ) { int retval, i, j, EventSet = PAPI_NULL; long long start_values[4] = { 0,0,0,0 }, values[4] = {0,0,0,0}; char out[PAPI_MAX_STR_LEN]; PAPI_events_len = 2; init_papi( PAPI_events, &PAPI_events_len ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_assign_eventset_component", retval ); retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_multiplex", retval ); /* Add 2 events... */ i = 0; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); i++; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); i++; do_stuff( ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_start", retval ); retval = PAPI_read( EventSet, start_values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_read", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_stop", retval ); for (j=0;j #include #include #include #include #include "papiStdEventDefs.h" #include "papi.h" #include "linux-bgp-native-events.h" #define MAX_COUNTERS 256 #define NUMBER_COUNTERS_PER_ROW 8 /* * Prototypes... */ void Do_Tests(void); void Do_Low_Level_Tests(void); void Do_High_Level_Tests(void); void Do_Multiplex_Tests(void); void Run_Cycle(const int pNumEvents); void Zero_Local_Counters(long long* pCounters); void FPUArith(void); void List_PAPI_Events(const int pEventSet, int* pEvents, int* xNumEvents); void Print_Native_Counters(); void Print_Native_Counters_via_Buffer(const BGP_UPC_Read_Counters_Struct_t* pBuffer); void Print_Native_Counters_for_PAPI_Counters(const int pEventSet); void Print_Native_Counters_for_PAPI_Counters_From_List(const int* pEvents, const int pNumEvents); void Print_PAPI_Counters(const int pEventSet, const long long* pCounters); void Print_PAPI_Counters_From_List(const int* pEventList, const int pNumEvents, const long long* pCounters); void Print_Counters(const int pEventSet); void Print_Node_Info(void); void Read_Native_Counters(const int pLength); void Print_PAPI_Events(const int pEventSet); void Print_Counter_Values(const long long* pCounters, const int pNumCounters); void DumpInHex(const char* pBuffer, int pSize); /* * Global variables... */ int PAPI_Events[MAX_COUNTERS]; long long PAPI_Counters[MAX_COUNTERS]; char Native_Buffer[BGP_UPC_MAXIMUM_LENGTH_READ_COUNTERS_STRUCTURE]; double x[32] ALIGN_L3_CACHE; const int NumEventsPerSet = MAX_COUNTERS; const int MaxPresetEventId = 104; const int MaxNativeEventId = 511; int main(int argc, char * argv[]) { _BGP_Personality_t personality; int pRank=0, pMode=-2, pCore=0, pEdge=1, xActiveCore=0, xActiveRank=0, xRC; /* * Check args, print test inputs. */ if ( argc > 1 ) sscanf(argv[1], "%d", &pRank); if ( argc > 2 ) sscanf(argv[2], "%d", &pMode); if ( argc > 3 ) sscanf(argv[3], "%d", &pCore); if ( argc > 4 ) sscanf(argv[4], "%d", &pEdge); /* * Check for valid rank... */ if ( pRank < 0 || pRank > 31 ) { printf("Invalid rank (%d) specified\n", pRank); exit(1); } /* * Check for valid mode... * Mode = -2 means use what was initialized by CNK * Mode = -1 means to initialize with the default * Mode = 0-3 means to initialize with mode 0-3 */ if ( pMode < -2 || pMode > 3 ) { printf("Invalid mode (%d) specified\n", pMode); exit(1); } /* * Check for valid core... */ if ( pCore < 0 || pCore > 3 ) { printf("Invalid core (%d) specified\n", pCore); exit(1); } /* * Check for valid edge... * Edge = 1 means initialize with the default edge * Edge = 0 means initialize with level high * Edge = 4 means initialize with edge rise * Edge = 8 means initialize with edge fall * Edge = 12 means initialize with level low */ if ( pEdge != 0 && pEdge != 1 && pEdge != 4 && pEdge != 8 && pEdge != 12 ) { printf("Invalid edge (%d) specified\n", pEdge); exit(1); } /* * Initialize the UPC environment... * NOTE: Must do this from all 'ranks'... */ // BGP_UPC_Initialize(); xRC = PAPI_library_init(PAPI_VER_CURRENT); if (xRC != 50921472) { printf("PAPI_library_init failed: xRC=%d, ending...\n", xRC); exit(1); } /* * Only run if this is specified rank... */ xRC = Kernel_GetPersonality(&personality, sizeof(_BGP_Personality_t)); if (xRC !=0) { printf(" Kernel_GetPersonality returned %d\n",xRC) ; exit(xRC); } xActiveRank = personality.Network_Config.Rank; xActiveCore = Kernel_PhysicalProcessorID(); printf("Rank %d, core %d reporting...\n", xActiveRank, xActiveCore); if (xActiveRank != pRank) { printf("Rank %d is not to run... Exiting...\n", xActiveRank); exit(0); } if ( xActiveCore == pCore ) { printf("Program is to run on rank %d core %d, using mode= %d, edge= %d\n", pRank, xActiveCore, pMode, pEdge); } else { printf("Program is NOT to run on rank %d core %d... Exiting...\n", pRank, xActiveCore); exit(0); } /* * Main processing... */ printf("************************************************************\n"); printf("* Configuration parameters used: *\n"); printf("* Rank = %d *\n", pRank); printf("* Mode = %d *\n", pMode); printf("* Core = %d *\n", pCore); printf("* Edge = %d *\n", pEdge); printf("************************************************************\n\n"); printf("Print config after PAPI_library_init...\n"); BGP_UPC_Print_Config(); /* * If we are to initialize, do so with user mode and edge... * Otherwise, use what was initialized by CNK... */ if (pMode > -2) { BGP_UPC_Initialize_Counter_Config(pMode, pEdge); printf("UPC unit(s) initialized with mode=%d, edge=%d...\n", pMode, pEdge); } printf("Before running the main test procedure...\n"); BGP_UPC_Print_Config(); BGP_UPC_Print_Counter_Values(BGP_UPC_READ_EXCLUSIVE); /* * Perform the main test procedure... */ Do_Tests(); /* * Print out final configuration and results... */ printf("After running the main test procedure...\n"); BGP_UPC_Print_Config(); BGP_UPC_Print_Counter_Values(BGP_UPC_READ_EXCLUSIVE); exit(0); } /* * Do_Tests */ void Do_Tests(void) { printf("==> Do_Tests(): Beginning of the main body...\n"); // NOTE: PAPI_library_init() has already been done for each participating node // prior to calling this routine... Do_Low_Level_Tests(); Do_High_Level_Tests(); Do_Multiplex_Tests(); // NOTE: Not supported... PAPI_shutdown(); printf("==> Do_Tests(): End of the main body...\n"); fflush(stdout); return; } /* * Do_Low_Level_Tests */ void Do_Low_Level_Tests(void) { int xRC, xEventSet, xEventCode, xState; long long xLLValue; char xName[256]; printf("==> Do_Low_Level_Tests(): Beginning of the main body...\n"); /* * Low-level API tests... */ xRC = PAPI_is_initialized(); if (xRC == 1) printf("SUCCESS: PAPI has been low-level initialized by main()...\n"); else { printf("FAILURE: PAPI has not been properly initialized by main(), xRC=%d, ending...\n", xRC); return; } /* * Print out the node information with respect to UPC units... */ Print_Node_Info(); /* * Zero the buffers for counters... */ Zero_Local_Counters(PAPI_Counters); BGP_UPC_Read_Counters_Struct_t* xTemp; xTemp = (BGP_UPC_Read_Counters_Struct_t*)(void*)Native_Buffer; Zero_Local_Counters(xTemp->counter); /* * Start of real tests... */ xLLValue = -1; xLLValue = PAPI_get_real_cyc(); printf("PAPI_get_real_cyc: xLLValue=%lld...\n", xLLValue); xLLValue = -1; xLLValue = PAPI_get_virt_cyc(); printf("PAPI_get_virt_cyc: xLLValue=%lld...\n", xLLValue); xLLValue = -1; xLLValue = PAPI_get_real_usec(); printf("PAPI_get_real_usec: xLLValue=%lld...\n", xLLValue); xLLValue = -1; xLLValue = PAPI_get_virt_usec(); printf("PAPI_get_virt_usec: xLLValue=%lld...\n", xLLValue); xRC = PAPI_num_hwctrs(); if (xRC == 256) printf("SUCCESS: PAPI_num_hwctrs returned 256 hardware counters...\n"); else printf("FAILURE: PAPI_num_hwctrs failed, returned xRC=%d...\n", xRC); *xName = 0; char* xEventName_1 = "PAPI_L3_LDM"; xRC = PAPI_event_code_to_name(PAPI_L3_LDM, xName); if (xRC == PAPI_OK) { xRC = strcmp(xName,xEventName_1); if (!xRC) printf("SUCCESS: PAPI_event_code_to_name for PAPI_L3_LDM...\n"); else printf("FAILURE: PAPI_event_code_to_name returned incorrect name, xName=%s\n", xName); } else printf("FAILURE: PAPI_event_code_to_name failed, xRC=%d...\n", xRC); *xName = 0; char* xEventName_2 = "PNE_BGP_PU1_IPIPE_INSTRUCTIONS"; xRC = PAPI_event_code_to_name(PNE_BGP_PU1_IPIPE_INSTRUCTIONS, xName); if (xRC == PAPI_OK) { xRC = strcmp(xName,xEventName_2); if (!xRC) printf("SUCCESS: PAPI_event_code_to_name for PNE_BGP_PU1_IPIPE_INSTRUCTIONS...\n"); else printf("FAILURE: PAPI_event_code_to_name returned incorrect name, xName=%s\n", xName); } else printf("FAILURE: PAPI_event_code_to_name failed, xRC=%d...\n", xRC); strcpy(xName,"PAPI_L3_LDM"); xRC = PAPI_event_name_to_code(xName, &xEventCode); if (xRC == PAPI_OK) if (xEventCode == 0x8000000E) printf("SUCCESS: PAPI_event_name_to_code for PAPI_L3_LDM...\n"); else printf("FAILURE: PAPI_event_name_to_code returned incorrect code, xEventCode=%d\n", xEventCode); else printf("FAILURE: PAPI_event_name_to_code failed, xRC=%d...\n", xRC); strcpy(xName,"PNE_BGP_PU1_IPIPE_INSTRUCTIONS"); xRC = PAPI_event_name_to_code(xName, &xEventCode); if (xRC == PAPI_OK) if (xEventCode == 0x40000027) printf("SUCCESS: PAPI_event_name_to_code for PNE_BGP_PU1_IPIPE_INSTRUCTIONS...\n"); else printf("FAILURE: PAPI_event_name_to_code returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_event_name_to_code failed, xRC=%d...\n", xRC); xEventCode = 0x80000000; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x80000001) printf("SUCCESS: PAPI_enum_event for 0x80000000 PAPI_PRESET_ENUM_ALL, returned 0x80000001...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000000 PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000000 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x80000002; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x80000003) printf("SUCCESS: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_ALL, returned 0x80000003...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x80000067; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x80000068) printf("SUCCESS: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_ALL, returned 0x80000068...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x80000068; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_enum_event for 0x80000068 PAPI_PRESET_ENUM_ALL, no next event...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000068 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x40000000; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x40000001) printf("SUCCESS: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_ALL, returned 0x40000001...\n"); else printf("FAILURE: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x40000001; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x40000002) printf("SUCCESS: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_ALL, returned 0x40000002...\n"); else printf("FAILURE: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x400000FC; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x400000FF) printf("SUCCESS: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_ALL, returned 0x400000FF...\n"); else printf("FAILURE: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x400001FD; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_OK) if (xEventCode == 0x400001FF) printf("SUCCESS: PAPI_enum_event for 0x400001FD PAPI_ENUM_ALL, returned 0x400001FF...\n"); else printf("FAILURE: PAPI_enum_event for 0x400001FD PAPI_ENUM_ALL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x400001FD PAPI_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x400001FF; xRC = PAPI_enum_event(&xEventCode, PAPI_ENUM_ALL); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_enum_event for 0x400001FF PAPI_PRESET_ENUM_ALL, no next event...\n"); else printf("FAILURE: PAPI_enum_event for 0x400001FF PAPI_PRESET_ENUM_ALL failed, xRC=%d...\n", xRC); xEventCode = 0x80000000; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x80000001) printf("SUCCESS: PAPI_enum_event for 0x80000000 PAPI_PRESET_ENUM_AVAIL, returned 0x80000001...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000000PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000000PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x80000002; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x80000006) printf("SUCCESS: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_AVAIL, returned 0x80000006...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000002 PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x80000067; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x80000068) printf("SUCCESS: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_AVAIL, returned 0x80000068...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x80000067 PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x80000068; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_enum_event for 0x80000068 PAPI_PRESET_ENUM_AVAIL, no next event...\n"); else printf("FAILURE: PAPI_enum_event for 0x80000068 PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x40000000; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x40000001) printf("SUCCESS: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_AVAIL, returned 0x40000001...\n"); else printf("FAILURE: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x40000000 PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x40000001; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x40000002) printf("SUCCESS: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_AVAIL, returned 0x40000002...\n"); else printf("FAILURE: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x40000001 PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); printf("NOTE: Might get two messages indicating invalid event id specified for 253 and 254. These are OK...\n"); xEventCode = 0x400000FC; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x400000FF) printf("SUCCESS: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_AVAIL, returned 0x400000FF...\n"); else printf("FAILURE: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x400000FC PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); printf("NOTE: Might get one message indicating invalid event id specified for 510. This is OK...\n"); xEventCode = 0x400001FD; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_OK) if (xEventCode == 0x400001FF) printf("SUCCESS: PAPI_enum_event for 0x400001FD PAPI_PRESET_ENUM_AVAIL, returned 0x400001FF...\n"); else printf("FAILURE: PAPI_enum_event for 0x400001FD PAPI_PRESET_ENUM_AVAIL returned incorrect code, xEventCode=%8.8x\n", xEventCode); else printf("FAILURE: PAPI_enum_event for 0x400001FD PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); xEventCode = 0x400001FF; xRC = PAPI_enum_event(&xEventCode, PAPI_PRESET_ENUM_AVAIL); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_enum_event for 0x400001FF PAPI_PRESET_ENUM_AVAIL, no next event...\n"); else printf("FAILURE: PAPI_enum_event for 0x400001FF PAPI_PRESET_ENUM_AVAIL failed, xRC=%d...\n", xRC); PAPI_dmem_info_t xDmemSpace; xRC = PAPI_get_dmem_info(&xDmemSpace); if (xRC == PAPI_OK) { DumpInHex((char*)&xDmemSpace, sizeof( PAPI_dmem_info_t)); printf("SUCCESS: PAPI_get_dmem_info...\n"); } else printf("FAILURE: PAPI_get_dmem_info failed, xRC=%d...\n", xRC); PAPI_event_info_t xInfoSpace; xRC = PAPI_get_event_info(PAPI_L3_LDM, &xInfoSpace); if (xRC == PAPI_OK) { DumpInHex((char*)&xInfoSpace, sizeof( PAPI_event_info_t)); printf("SUCCESS: PAPI_get_event_info for PAPI_L3_LDM...\n"); } else printf("FAILURE: PAPI_get_event_info failed for PAPI_L3_LDM, xRC=%d...\n", xRC); const PAPI_exe_info_t* xExeInfo = NULL; if ((xExeInfo = PAPI_get_executable_info()) != NULL) { DumpInHex((char*)xExeInfo, sizeof( PAPI_exe_info_t)); printf("SUCCESS: PAPI_get_executable_info...\n"); } else printf("FAILURE: PAPI_get_executable_info failed, returned null pointer...\n"); const PAPI_hw_info_t* xHwInfo = NULL; if ((xHwInfo = PAPI_get_hardware_info()) != NULL) { DumpInHex((char*)xHwInfo, sizeof( PAPI_hw_info_t)); printf("SUCCESS: PAPI_get_hardware_info...\n"); } else printf("FAILURE: PAPI_get_hardware_info failed, returned null pointer...\n"); const PAPI_shlib_info_t* xShLibInfo = NULL; if ((xShLibInfo = PAPI_get_shared_lib_info()) != NULL) { DumpInHex((char*)xShLibInfo, sizeof( PAPI_shlib_info_t)); printf("SUCCESS: PAPI_get_shared_lib_info...\n"); } else printf("FAILURE: PAPI_get_shared_lib_info failed, returned null pointer...\n"); xEventSet = PAPI_NULL; xRC = PAPI_create_eventset(&xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_create_eventset created...\n"); else { printf("FAILURE: PAPI_create_eventset failed, xRC=%d...\n", xRC); return; } printf("==> No events should be in the event set...\n"); Print_Counters(xEventSet); xRC = PAPI_num_events(xEventSet); if (xRC == 0) printf("SUCCESS: PAPI_num_events returned 0...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PAPI_L1_DCM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_add_event PAPI_L1_DCM...\n"); else printf("FAILURE: PAPI_add_event PAPI_L1_DCM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 1) printf("SUCCESS: PAPI_num_events returned 1...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PNE_BGP_PU3_L2_MEMORY_WRITES); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_add_event PNE_BGP_PU3_L2_MEMORY_WRITES...\n"); else printf("FAILURE: PAPI_add_event PNE_BGP_PU3_L2_MEMORY_WRITES failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 2) printf("SUCCESS: PAPI_num_events returned 2...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, BGP_PU3_L2_MEMORY_WRITES); if (xRC == PAPI_EINVAL) printf("SUCCESS: PAPI_add_event BGP_PU3_L2_MEMORY_WRITES not allowed...\n"); else printf("FAILURE: PAPI_add_event BGP_PU3_L2_MEMORY_WRITES allowed, or failed incorrectly..., xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 2) printf("SUCCESS: PAPI_num_events returned 2...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, 0x40000208); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_add_event 0x40000208 not allowed...\n"); else printf("FAILURE: PAPI_add_event 0x40000208 allowed, or failed incorrectly..., xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 2) printf("SUCCESS: PAPI_num_events returned 2...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PAPI_L1_ICM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_add_event PAPI_L1_ICM...\n"); else printf("FAILURE: PAPI_add_event PAPI_L1_ICM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 3) printf("SUCCESS: PAPI_num_events returned 3...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PAPI_L1_TCM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_add_event PAPI_L1_TCM...\n"); else printf("FAILURE: PAPI_add_event PAPI_L1_TCM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 4) printf("SUCCESS: PAPI_num_events returned 4...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PAPI_L1_DCM); if (xRC == PAPI_ECNFLCT) printf("SUCCESS: PAPI_add_event, redundantly adding PAPI_L1_DCM not allowed...\n"); else printf("FAILURE: PAPI_add_event PAPI_L1_DCM failed incorrectly, xRC=%d...\n", xRC); xRC = PAPI_add_event(xEventSet, PNE_BGP_PU3_L2_MEMORY_WRITES); if (xRC == PAPI_ECNFLCT) printf("SUCCESS: PAPI_add_event, redundantly adding PNE_BGP_PU3_L2_MEMORY_WRITES not allowed...\n"); else printf("FAILURE: PAPI_add_event PNE_BGP_PU3_L2_MEMORY_WRITES failed incorectly, xRC=%d...\n", xRC); printf("\n==> All events added... Perform a read now...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); printf("\n==> Perform a reset now...\n"); xRC = PAPI_reset(xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_reset...\n"); else printf("FAILURE: PAPI_reset failed, xRC=%d...\n", xRC); printf("\n==> Perform another read now...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); printf("\n==> Should be 4 counters below, preset, native, preset, and preset. All counter values should be zero.\n"); Print_Counters(xEventSet); printf("\n==> Stop the UPC now...\n"); xRC = PAPI_stop(xEventSet, PAPI_Counters); if (xRC == PAPI_ENOTRUN) printf("SUCCESS: PAPI_stop, but not running...\n"); else printf("FAILURE: PAPI_stop failed incorectly, xRC=%d...\n", xRC); printf("\n==> Start the UPC now...\n"); xRC = PAPI_start(xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_start...\n"); else { printf("FAILURE: PAPI_start failed, xRC=%d...\n", xRC); return; } printf("\n==> Try to start it again...\n"); xRC = PAPI_start(xEventSet); if (xRC == PAPI_EISRUN) printf("SUCCESS: PAPI_start, but already running...\n"); else printf("FAILURE: PAPI_start failed incorectly, xRC=%d...\n", xRC); FPUArith(); printf("\n==> Stop the UPC after the arithmetic was performed... The individual native counter values will be greater than the PAPI counters because the PAPI counters are read prior to the UPC(s) being stopped...\n"); xRC = PAPI_stop(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_stop...\n"); else { printf("FAILURE: PAPI_stop failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a read of the counters after performing arithmetic, UPC is stopped... Values should be the same as right after the prior PAPI_Stop()...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); printf("\n==> Zero local counters. Perform a PAPI_accum, UPC is stopped... Native values should be zero, and the local PAPI counters the same as the previous read...\n"); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_read, UPC is stopped... All values should be zero...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_read...\n"); } else { printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a reset after performing arithmetic, UPC is stopped... All values should be zero...\n"); xRC = PAPI_reset(xEventSet); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_reset...\n"); } else { printf("FAILURE: PAPI_reset failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform another read of the counters after resetting the counters, UPC is stopped... All values should be zero...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); printf("\n==> Perform another PAPI_accum after resetting the counters, UPC is stopped... All values should be zero...\n"); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform another PAPI_read after accumulating and resetting the UPC, UPC is stopped... All values should be zero...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_read...\n"); } else { printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Start the UPC again...\n"); xRC = PAPI_start(xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_start...\n"); else { printf("FAILURE: PAPI_start failed, xRC=%d...\n", xRC); return; } FPUArith(); printf("\n==> Get the state of the event set...\n"); xRC = PAPI_state(xEventSet, &xState); if (xRC == PAPI_OK) { if (xState == PAPI_RUNNING) { printf("SUCCESS: PAPI_state is RUNNING...\n"); } else { printf("FAILURE: PAPI_state failed, incorrect state, xState=%d...\n", xState); } } else { printf("FAILURE: PAPI_state failed, xRC=%d...\n", xRC); return; } printf("\n==> Perform a read of the counters, UPC is running... The individual native counter values will be greater than the PAPI counters because the PAPI counters are read prior to the reads for the individual counter values...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); FPUArith(); printf("\n==> Perform another read of the counters, UPC is running... Values should be increasing...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); FPUArith(); printf("\n==> Perform another read of the counters, UPC is running... Values should continue increasing...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); printf("\n==> Perform a reset after performing arithmetic, UPC is still running... Native counter values should be less than prior read, but PAPI counter values should be identical to the prior read (local buffer was not changed)...\n"); xRC = PAPI_reset(xEventSet); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_reset...\n"); } else { printf("FAILURE: PAPI_reset failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Zero local counters. Perform a PAPI_accum, UPC is still running...\n"); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); FPUArith(); printf("\n==> Accumulate local counters. Perform a PAPI_accum, UPC is still running... PAPI counters should show an increase from prior accumulate...\n"); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); FPUArith(); printf("\n==> Accumulate local counters. Perform another PAPI_accum, UPC is still running... PAPI counters should show an increase from prior accumulate...\n"); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Zero local counters. Perform a PAPI_accum, UPC is still running... PAPI counters should be less than the prior accumulate...\n"); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_read, UPC is still running... Native counters and PAPI counters should have both increased from prior accumulate...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_read...\n"); } else { printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_write (not supported when UPC is running)...\n"); xRC = PAPI_write(xEventSet, PAPI_Counters); if (xRC == PAPI_ECMP) { printf("SUCCESS: PAPI_write, not allowed...\n"); } else { printf("FAILURE: PAPI_write failed, xRC=%d...\n", xRC); return; } printf("\n==> Stop the UPC... The individual native counter values will be greater than the PAPI counters because the PAPI counters are read prior to the UPC(s) being stopped...\n"); xRC = PAPI_stop(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_stop...\n"); else { printf("FAILURE: PAPI_stop failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_read with the UPC stopped...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); printf("\n==> Should be same 4 counters below, with the same native and PAPI counters as after the PAPI_stop...\n"); Print_Counters(xEventSet); printf("\n==> Perform a PAPI_accum with the UPC stopped... Native counters sould be zeroed, with the PAPI counters unchanged from prior read (with the UPC already stopped, the accumulate does not add any counter values to the local buffer)...\n"); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_accum...\n"); } else { printf("FAILURE: PAPI_accum failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_read with the UPC stopped... Native and PAPI counters are zero...\n"); xRC = PAPI_read(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read...\n"); else printf("FAILURE: PAPI_read failed, xRC=%d...\n", xRC); Print_Counters(xEventSet); printf("\n==> Perform a reset, UPC is stopped... Native and PAPI counters are zero...\n"); xRC = PAPI_reset(xEventSet); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_reset...\n"); } else { printf("FAILURE: PAPI_reset failed, xRC=%d...\n", xRC); return; } Print_Counters(xEventSet); printf("\n==> Perform a PAPI_write, but only to local memory...\n"); xRC = PAPI_write(xEventSet, PAPI_Counters); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_write, but only to local memory...\n"); } else { printf("FAILURE: PAPI_write failed, xRC=%d...\n", xRC); return; } printf("\n==> Get the state of the event set...\n"); xRC = PAPI_state(xEventSet, &xState); if (xRC == PAPI_OK) { if (xState == PAPI_STOPPED) { printf("SUCCESS: PAPI_state is STOPPED...\n"); } else { printf("FAILURE: PAPI_state failed, incorrect state, xState=%d...\n", xState); } } else { printf("FAILURE: PAPI_state failed, xRC=%d...\n", xRC); return; } printf("\n==> Get the multiplex status of the eventset...\n"); xRC = PAPI_get_multiplex(xEventSet); if (xRC == PAPI_OK) { printf("SUCCESS: PAPI_get_multiplex (NOTE: The rest of the multiplex path is untested)...\n"); } else { printf("FAILURE: PAPI_get_multiplex failed, xRC=%d...\n", xRC); return; } printf("\n==> Remove the events, and clean up the event set...\n"); xRC = PAPI_remove_event(xEventSet, PNE_BGP_PU1_IPIPE_INSTRUCTIONS); if (xRC == PAPI_EINVAL) printf("SUCCESS: PAPI_remove_event could not find PNE_BGP_PU1_IPIPE_INSTRUCTIONS...\n"); else printf("FAILURE: PAPI_remove_event PNE_BGP_PU1_IPIPE_INSTRUCTIONS failed, xRC=%d...\n", xRC); xRC = PAPI_remove_event(xEventSet, PAPI_L3_LDM); if (xRC == PAPI_EINVAL) printf("SUCCESS: PAPI_remove_event could not find PAPI_L3_LDM...\n"); else printf("FAILURE: PAPI_remove_event PAPI_L3_LDM failed, xRC=%d...\n", xRC); xRC = PAPI_remove_event(xEventSet, PAPI_L1_TCM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_remove_event PAPI_L1_TCM...\n"); else printf("FAILURE: PAPI_remove_event PAPI_L1_TCM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 3) printf("SUCCESS: PAPI_num_events returned 3...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_remove_event(xEventSet, PAPI_L1_ICM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_remove_event PAPI_L1_ICM...\n"); else printf("FAILURE: PAPI_remove_event PAPI_L1_ICM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 2) printf("SUCCESS: PAPI_num_events returned 2...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_remove_event(xEventSet, PNE_BGP_PU3_L2_MEMORY_WRITES); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_remove_event PNE_BGP_PU3_L2_MEMORY_WRITES...\n"); else printf("FAILURE: PAPI_remove_event PNE_BGP_PU3_L2_MEMORY_WRITES failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 1) printf("SUCCESS: PAPI_num_events returned 1...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_remove_event(xEventSet, PAPI_L1_DCM); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_remove_event PAPI_L1_DCM...\n"); else printf("FAILURE: PAPI_remove_event PAPI_L1_DCM failed, xRC=%d...\n", xRC); xRC = PAPI_num_events(xEventSet); if (xRC == 0) printf("SUCCESS: PAPI_num_events returned 0...\n"); else printf("FAILURE: PAPI_num_events failed, returned xRC=%d...\n", xRC); xRC = PAPI_cleanup_eventset(xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_cleanup_eventset...\n"); else printf("FAILURE: PAPI_cleanup_eventset failed, xRC=%d...\n", xRC); xRC = PAPI_destroy_eventset(&xEventSet); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_destroy_eventset...\n"); else printf("FAILURE: PAPI_destroy_eventset failed, xRC=%d...\n", xRC); printf("==> Do_Low_Level_Tests(): End of the main body...\n"); return; } /* * Do_High_Level_Tests */ void Do_High_Level_Tests(void) { uint xEventId, xEventCode; int xRC, xNumEvents; printf("==> Do_High_Level_Tests(): Beginning of the main body...\n"); xRC = PAPI_num_counters(); if (xRC == 256) printf("SUCCESS: PAPI_num_counters returned 256 hardware counters...\n"); else printf("FAILURE: PAPI_num_counters failed, returned xRC=%d...\n", xRC); xRC = PAPI_num_components(); if (xRC == 1) printf("SUCCESS: PAPI_num_components returned 256 hardware counters...\n"); else printf("FAILURE: PAPI_num_components failed, returned xRC=%d...\n", xRC); xEventId = 0; while (xEventId < MaxPresetEventId) { xNumEvents = 0; while (xEventId <= MaxPresetEventId && xNumEvents < NumEventsPerSet) { xEventCode = xEventId | 0x80000000; xRC = PAPI_query_event(xEventCode); if (xRC == PAPI_OK) { switch(xEventCode) { case 0x80000003: case 0x80000004: case 0x80000005: case 0x80000007: case 0x80000008: case 0x8000000A: case 0x8000000B: case 0x8000000C: case 0x8000000D: case 0x8000000F: case 0x80000010: case 0x80000011: case 0x80000012: case 0x80000013: case 0x80000014: case 0x80000015: case 0x80000016: case 0x80000017: case 0x80000018: case 0x80000019: case 0x8000001A: case 0x8000001B: case 0x8000001D: case 0x8000001E: case 0x8000001F: case 0x80000020: case 0x80000021: case 0x80000022: case 0x80000023: case 0x80000024: case 0x80000025: case 0x80000026: case 0x80000027: case 0x80000028: case 0x80000029: case 0x8000002A: case 0x8000002B: case 0x8000002C: case 0x8000002D: case 0x8000002E: case 0x8000002F: case 0x80000031: case 0x80000032: case 0x80000033: case 0x80000037: case 0x80000038: case 0x80000039: case 0x8000003A: case 0x8000003D: case 0x80000042: case 0x80000045: case 0x80000046: case 0x80000048: case 0x8000004A: case 0x8000004B: case 0x8000004D: case 0x8000004E: case 0x80000050: case 0x80000051: case 0x80000053: case 0x80000054: case 0x80000056: case 0x80000057: case 0x80000059: case 0x8000005c: case 0x8000005f: case 0x80000061: case 0x80000062: case 0x80000063: case 0x80000064: case 0x80000065: printf("FAILURE: Do_High_Level_Tests, preset event code 0x%8.8x added to list of events to be started, but should not be allowed...\n", xEventCode); break; default: printf("SUCCESS: Do_High_Level_Tests, preset event code 0x%8.8x added to list of events to be started...\n", xEventCode); } PAPI_Events[xNumEvents] = xEventCode; xNumEvents++; } else { switch(xEventCode) { case 0x80000003: case 0x80000004: case 0x80000005: case 0x80000007: case 0x80000008: case 0x8000000A: case 0x8000000B: case 0x8000000C: case 0x8000000D: case 0x8000000F: case 0x80000010: case 0x80000011: case 0x80000012: case 0x80000013: case 0x80000014: case 0x80000015: case 0x80000016: case 0x80000017: case 0x80000018: case 0x80000019: case 0x8000001A: case 0x8000001B: case 0x8000001D: case 0x8000001E: case 0x8000001F: case 0x80000020: case 0x80000021: case 0x80000022: case 0x80000023: case 0x80000024: case 0x80000025: case 0x80000026: case 0x80000027: case 0x80000028: case 0x80000029: case 0x8000002A: case 0x8000002B: case 0x8000002C: case 0x8000002D: case 0x8000002E: case 0x8000002F: case 0x80000031: case 0x80000032: case 0x80000033: case 0x80000037: case 0x80000038: case 0x80000039: case 0x8000003A: case 0x8000003D: case 0x80000042: case 0x80000045: case 0x80000046: case 0x80000048: case 0x8000004A: case 0x8000004B: case 0x8000004D: case 0x8000004E: case 0x80000050: case 0x80000051: case 0x80000053: case 0x80000054: case 0x80000056: case 0x80000057: case 0x80000059: case 0x8000005c: case 0x8000005f: case 0x80000061: case 0x80000062: case 0x80000063: case 0x80000064: case 0x80000065: printf("SUCCESS: Do_High_Level_Tests, preset event code 0x%8.8x cannot be added to list of events to be started, xRC = %d...\n", xEventCode, xRC); break; default: printf("FAILURE: Do_High_Level_Tests, preset event code 0x%8.8x cannot be added to list of events to be started, xRC = %d...\n", xEventCode, xRC); } } xEventId++; } if (xNumEvents) Run_Cycle(xNumEvents); } xEventId = 0; while (xEventId < MaxNativeEventId) { xNumEvents = 0; while (xEventId <= MaxNativeEventId && xNumEvents < NumEventsPerSet) { xEventCode = xEventId | 0x40000000; xRC = PAPI_query_event(xEventCode); if (xRC == PAPI_OK) { switch(xEventCode) { case 0x4000005C: case 0x4000005D: case 0x4000005E: case 0x4000005F: case 0x40000060: case 0x40000061: case 0x40000062: case 0x40000063: case 0x40000064: case 0x4000007C: case 0x4000007D: case 0x4000007E: case 0x4000007F: case 0x40000080: case 0x40000081: case 0x40000082: case 0x40000083: case 0x40000084: case 0x400000D8: case 0x400000D9: case 0x400000DA: case 0x400000DB: case 0x400000DC: case 0x400000DD: case 0x400000FD: case 0x400000FE: case 0x40000198: case 0x40000199: case 0x4000019A: case 0x4000019B: case 0x4000019C: case 0x4000019D: case 0x4000019E: case 0x4000019F: case 0x400001A0: case 0x400001B8: case 0x400001B9: case 0x400001BA: case 0x400001BB: case 0x400001BC: case 0x400001BD: case 0x400001BE: case 0x400001BF: case 0x400001C0: case 0x400001D2: case 0x400001D3: case 0x400001D4: case 0x400001D5: case 0x400001D6: case 0x400001D7: case 0x400001E6: case 0x400001E7: case 0x400001E8: case 0x400001E9: case 0x400001EA: case 0x400001EB: case 0x400001FE: printf("FAILURE: Do_High_Level_Tests, native event code 0x%8.8x added to list of events to be started, but should not be allowed...\n", xEventCode); break; default: printf("SUCCESS: Do_High_Level_Tests, native event code 0x%8.8x added to list of events to be started...\n", xEventCode); } PAPI_Events[xNumEvents] = xEventCode; xNumEvents++; } else { switch(xEventCode) { case 0x4000005C: case 0x4000005D: case 0x4000005E: case 0x4000005F: case 0x40000060: case 0x40000061: case 0x40000062: case 0x40000063: case 0x40000064: case 0x4000007C: case 0x4000007D: case 0x4000007E: case 0x4000007F: case 0x40000080: case 0x40000081: case 0x40000082: case 0x40000083: case 0x40000084: case 0x400000D8: case 0x400000D9: case 0x400000DA: case 0x400000DB: case 0x400000DC: case 0x400000DD: case 0x400000FD: case 0x400000FE: case 0x40000198: case 0x40000199: case 0x4000019A: case 0x4000019B: case 0x4000019C: case 0x4000019D: case 0x4000019E: case 0x4000019F: case 0x400001A0: case 0x400001B8: case 0x400001B9: case 0x400001BA: case 0x400001BB: case 0x400001BC: case 0x400001BD: case 0x400001BE: case 0x400001BF: case 0x400001C0: case 0x400001D2: case 0x400001D3: case 0x400001D4: case 0x400001D5: case 0x400001D6: case 0x400001D7: case 0x400001E6: case 0x400001E7: case 0x400001E8: case 0x400001E9: case 0x400001EA: case 0x400001EB: case 0x400001FE: printf("SUCCESS: Do_High_Level_Tests, native event code 0x%8.8x cannot be added to list of events to be started, xRC = %d...\n", xEventCode, xRC); break; default: printf("FAILURE: Do_High_Level_Tests, native event code 0x%8.8x cannot be added to list of events to be started, xRC = %d...\n", xEventCode, xRC); } } xEventId++; } if (xNumEvents) Run_Cycle(xNumEvents); } float xRtime, xPtime, xMflips, xMflops, xIpc; long long xFlpins, xFlpops, xIns; long long values[3] = {PAPI_FP_INS, PAPI_FP_OPS, PAPI_TOT_CYC}; xRC = PAPI_flips(&xRtime, &xPtime, &xFlpins, &xMflips); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flips started.\n"); else printf("FAILURE: PAPI_flips failed, returned xRC=%d...\n", xRC); FPUArith(); xRC = PAPI_flips(&xRtime, &xPtime, &xFlpins, &xMflips); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flips Rtime=%e Ptime=%e, Flpins=%lld, Mflips=%e\n", xRtime, xPtime, xFlpins, xMflips); else printf("FAILURE: PAPI_flips failed, returned xRC=%d...\n", xRC); FPUArith(); FPUArith(); xRC = PAPI_flips(&xRtime, &xPtime, &xFlpins, &xMflips); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flips Rtime=%e Ptime=%e, Flpins=%lld, Mflips=%e\n", xRtime, xPtime, xFlpins, xMflips); else printf("FAILURE: PAPI_flips failed, returned xRC=%d...\n", xRC); xRC = PAPI_stop_counters(values, 3); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_stop_counters stopped counters.\n"); else printf("FAILURE: PAPI_stop_counters failed, returned xRC=%d...\n", xRC); xRC = PAPI_flops(&xRtime, &xPtime, &xFlpops, &xMflops); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flops started.\n"); else printf("FAILURE: PAPI_flops failed, returned xRC=%d...\n", xRC); FPUArith(); xRC = PAPI_flops(&xRtime, &xPtime, &xFlpops, &xMflops); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flops Rtime=%e Ptime=%e Flpops=%lld Mflops=%e\n", xRtime, xPtime, xFlpops, xMflops); else printf("FAILURE: PAPI_flops failed, returned xRC=%d...\n", xRC); FPUArith(); FPUArith(); xRC = PAPI_flops(&xRtime, &xPtime, &xFlpops, &xMflops); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_flops Rtime=%e Ptime=%e Flpops=%lld Mflops=%e\n", xRtime, xPtime, xFlpops, xMflops); else printf("FAILURE: PAPI_flops failed, returned xRC=%d...\n", xRC); xRC = PAPI_stop_counters(values, 3); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_stop_counters stopped counters.\n"); else printf("FAILURE: PAPI_stop_counters failed, returned xRC=%d...\n", xRC); xRC = PAPI_ipc(&xRtime, &xPtime, &xIns, &xIpc); if (xRC == PAPI_ENOEVNT) printf("SUCCESS: PAPI_ipc, no event found...\n"); else printf("FAILURE: PAPI_ipc failed, returned xRC=%d...\n", xRC); printf("==> Do_High_Level_Tests(): End of the main body...\n"); return; } /* * Do_Multiplex_Tests */ void Do_Multiplex_Tests(void) { int xRC; printf("==> Do_Multiplex_Tests(): Beginning of the main body...\n"); xRC = PAPI_multiplex_init(); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_multiplex_init...\n"); else printf("FAILURE: PAPI_multiplex_init failed, returned xRC=%d...\n", xRC); printf("==> Do_Multiplex_Tests(): End of the main body...\n"); return; } void Run_Cycle(const int pNumEvents) { int xRC; // BGP_UPC_Zero_Counter_Values(); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_start_counters(PAPI_Events, pNumEvents); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_start_counters...\n"); else printf("FAILURE: PAPI_start_counters failed, returned xRC=%d...\n", xRC); Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); FPUArith(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); Print_PAPI_Counters_From_List(PAPI_Events, pNumEvents, PAPI_Counters); FPUArith(); xRC = PAPI_read_counters(PAPI_Counters, pNumEvents); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read_counters...\n"); else printf("FAILURE: PAPI_read_counters failed, returned xRC=%d...\n", xRC); Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); FPUArith(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); Print_PAPI_Counters_From_List(PAPI_Events, pNumEvents, PAPI_Counters); FPUArith(); Zero_Local_Counters(PAPI_Counters); xRC = PAPI_accum_counters(PAPI_Counters, pNumEvents); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_accum_counters...\n"); else printf("FAILURE: PAPI_accum_counters failed, returned xRC=%d...\n", xRC); Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); FPUArith(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); Print_PAPI_Counters_From_List(PAPI_Events, pNumEvents, PAPI_Counters); FPUArith(); xRC = PAPI_read_counters(PAPI_Counters, pNumEvents); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_read_counters...\n"); else printf("FAILURE: PAPI_read_counters failed, returned xRC=%d...\n", xRC); Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); FPUArith(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); Print_PAPI_Counters_From_List(PAPI_Events, pNumEvents, PAPI_Counters); FPUArith(); xRC = PAPI_stop_counters(PAPI_Counters, pNumEvents); if (xRC == PAPI_OK) printf("SUCCESS: PAPI_stop_counters...\n"); else printf("FAILURE: PAPI_stop_counters failed, returned xRC=%d...\n", xRC); Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); FPUArith(); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, pNumEvents); Print_PAPI_Counters_From_List(PAPI_Events, pNumEvents, PAPI_Counters); FPUArith(); return; } /* * Zero_Local_Counters */ void Zero_Local_Counters(long long* pCounters) { int i; for (i=0; i<255; i++) pCounters[i] = 0; return; } /* * FPU Arithmetic... */ void FPUArith(void) { int i; printf("\n==> Start: Performing arithmetic...\n"); register unsigned int zero = 0; register double *x_p = &x[0]; for ( i = 0; i < 32; i++ ) x[i] = 1.0; // Single Hummer Instructions: #if 1 asm volatile ("fabs 1,2"); asm volatile ("fmr 1,2"); asm volatile ("fnabs 1,2"); asm volatile ("fneg 1,2"); asm volatile ("fadd 1,2,3"); asm volatile ("fadds 1,2,3"); asm volatile ("fdiv 1,2,3"); asm volatile ("fdivs 1,2,3"); asm volatile ("fmul 1,2,3"); asm volatile ("fmuls 1,2,3"); asm volatile ("fres 1,2"); asm volatile ("frsqrte 1,2"); //asm volatile ("fsqrt 1,2"); // gives exception //asm volatile ("fsqrts 1,2"); // gives exception asm volatile ("fsub 1,2,3"); asm volatile ("fsubs 1,2,3"); asm volatile ("fmadd 3,4,5,6"); asm volatile ("fmadds 3,4,5,6"); asm volatile ("fmsub 3,4,5,6"); asm volatile ("fmsubs 3,4,5,6"); asm volatile ("fnmadd 3,4,5,6"); asm volatile ("fnmadds 3,4,5,6"); asm volatile ("fnmsub 3,4,5,6"); asm volatile ("fnmsubs 3,4,5,6"); //asm volatile ("fcfid 5,6"); // invalid instruction //asm volatile ("fctid 5,6"); // invalid instruction //asm volatile ("fctidz 5,6"); // invalid instruction asm volatile ("fctiw 5,6"); asm volatile ("fctiwz 5,6"); asm volatile ("frsp 5,6"); asm volatile ("fcmpo 0,1,2"); asm volatile ("fcmpu 0,1,2"); asm volatile ("fsel 0,1,2,3"); #endif #if 1 asm volatile("fpadd 9,10,11"); asm volatile("fpsub 9,10,11"); #endif #if 1 asm volatile("fpmul 23,24,25"); asm volatile("fxmul 26, 27, 28"); asm volatile("fxpmul 28, 29, 30"); asm volatile("fxsmul 2, 3, 4"); #endif #if 1 asm volatile("fpmadd 10,11,12,13"); asm volatile("fpmsub 18, 19, 20, 21"); asm volatile("fpnmadd 26, 27, 28, 29"); asm volatile("fpnmsub 16,17,18,19"); asm volatile("fxmadd 10,11,12,13"); asm volatile("fxmsub 18, 19, 20, 21"); asm volatile("fxnmadd 26, 27, 28, 29"); asm volatile("fxnmsub 16,17,18,19"); asm volatile("fxcpmadd 10,11,12,13"); asm volatile("fxcpmsub 18, 19, 20, 21"); asm volatile("fxcpnmadd 26, 27, 28, 29"); asm volatile("fxcpnmsub 16,17,18,19"); asm volatile("fxcsmadd 10,11,12,13"); asm volatile("fxcsmsub 18, 19, 20, 21"); asm volatile("fxcsnmadd 26, 27, 28, 29"); asm volatile("fxcsnmsub 16,17,18,19"); asm volatile("fxcpnpma 1,2,3,4"); asm volatile("fxcsnpma 5,6,7,8"); asm volatile("fxcpnsma 9,10,11,12"); asm volatile("fxcsnsma 3,4,5,6"); asm volatile("fxcxnpma 9,10,11,12"); asm volatile("fxcxnsma 8,9,10,11"); asm volatile("fxcxma 3,4,5,6"); asm volatile("fxcxnms 8,9,10,11"); #endif #if 1 asm volatile("fpre 12, 13"); asm volatile("fprsqrte 15, 16"); asm volatile("fpsel 17, 18, 19, 20"); asm volatile("fpctiw 1,2"); asm volatile("fpctiwz 3,4"); asm volatile("fprsp 5,6"); asm volatile("fscmp 1,2,3"); asm volatile("fpmr 1,2"); asm volatile("fpneg 1,2"); asm volatile("fpabs 1,2"); asm volatile("fpnabs 1,2"); asm volatile("fsmr 1,2"); asm volatile("fsneg 1,2"); asm volatile("fsabs 1,2"); asm volatile("fsnabs 1,2"); asm volatile("fxmr 1,2"); asm volatile("fsmfp 1,2"); asm volatile("fsmtp 1,2"); #endif #if 1 asm volatile("lfdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfsdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfsdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfssx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfssux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfpsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfpsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfxsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfxsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); #endif #if 1 asm volatile("lfpdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfpdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfxdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("lfxdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); #endif #if 1 asm volatile("stfdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfsdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfsdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfssx 16,%0,%1" : "+b"(x_p) : "b"(zero)); //asm volatile("stfssux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfpsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfpsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfxsx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfxsux 16,%0,%1" : "+b"(x_p) : "b"(zero)); #endif #if 1 asm volatile("stfpdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfpdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfxdx 16,%0,%1" : "+b"(x_p) : "b"(zero)); asm volatile("stfxdux 16,%0,%1" : "+b"(x_p) : "b"(zero)); #endif printf("==> End: Performing arithmetic...\n"); return; } /* * Print_Counters */ void Print_Counters(const int pEventSet) { printf("\n***** Start Print Counter Values *****\n"); // Print_Native_Counters_via_Buffer((BGP_UPC_Read_Counters_Struct_t*)Native_Buffer); // Print_Native_Counters(); Print_Native_Counters_for_PAPI_Counters(pEventSet); Print_PAPI_Counters(pEventSet, PAPI_Counters); printf("\n***** End Print Counter Values *****\n"); return; } /* * Print_Native_Counters */ void Print_Native_Counters() { printf("\n***** Start Print of Native Counter Values *****\n"); BGP_UPC_Print_Counter_Values(BGP_UPC_READ_EXCLUSIVE); printf("***** End Print of Native Counter Values *****\n"); return; } /* * Print_Native_Counters_for_PAPI_Counters */ void Print_Native_Counters_for_PAPI_Counters(const int pEventSet) { printf("\n***** Start Print of Native Counter Values for PAPI Counters *****\n"); int xNumEvents = PAPI_num_events(pEventSet); if (xNumEvents) { List_PAPI_Events(pEventSet, PAPI_Events, &xNumEvents); Print_Native_Counters_for_PAPI_Counters_From_List(PAPI_Events, xNumEvents); } else { printf("No events are present in the event set.\n"); } printf("***** End Print of Native Counter Values for PAPI Counters *****\n"); return; } /* * Print_Native_Counters_for_PAPI_Counters_From_List */ void Print_Native_Counters_for_PAPI_Counters_From_List(const int* pEvents, const int pNumEvents) { int i, j, xRC; char xName[256]; BGP_UPC_Event_Id_t xNativeEventId; PAPI_event_info_t xEventInfo; // BGP_UPC_Print_Counter_Values(); // DLH for (i=0; inumber_of_counters); printf("***** End Print of Native Counter Values *****\n"); return; } /* * Print_PAPI_Counters */ void Print_PAPI_Counters(const int pEventSet, const long long* pCounters) { int i; char xName[256]; printf("\n***** Start Print of PAPI Counter Values *****\n"); // printf("Print_PAPI_Counters: PAPI_Counters*=%p, pCounters*=%p\n", PAPI_Counters, pCounters); int pNumEvents = PAPI_num_events(pEventSet); printf("Number of Counters = %d\n", pNumEvents); if (pNumEvents) { printf(" Calculated Value Location Event Number Event Name\n"); printf("-------------------- -------- ------------ --------------------------------------------\n"); List_PAPI_Events(pEventSet, PAPI_Events, &pNumEvents); for (i=0; irank); printf("Core = %d\n", xTemp->core); printf("UPC Number = %d\n", xTemp->upc_number); printf("Number of Processes per UPC = %d\n", xTemp->number_processes_per_upc); printf("User Mode = %d\n", (int) xTemp->mode); printf("Location = %s\n", xTemp->location); printf("\n***** End Print of Node Information *****\n\n"); return; } /* * Read_Native_Counters */ void Read_Native_Counters(const int pLength) { int xRC = BGP_UPC_Read_Counter_Values(Native_Buffer, pLength, BGP_UPC_READ_EXCLUSIVE); if (xRC < 0) { printf("FAILURE: BGP_UPC_Read_Counter_Values failed, xRC=%d...\n", xRC); exit(1); } return; } /* * Print_PAPI_Events */ void Print_PAPI_Events(const int pEventSet) { int i; char xName[256]; int pNumEvents = PAPI_num_events(pEventSet); List_PAPI_Events(pEventSet, PAPI_Events, &pNumEvents); for (i=0; i * */ /* This file performs the following test: PAPI_library_init() PAPI_shutdown() fork() / \ parent child wait() PAPI_library_init() **unlike forkexec2, no shutdown here** execlp() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); PAPI_shutdown( ); if ( fork( ) == 0 ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "forked PAPI_library_init", retval ); if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/describe.c0000600003276200002170000000740512247131121016162 0ustar ralphundrgrad/* From Paul Drongowski at HP. Thanks. */ /* I have not been able to call PAPI_describe_event without incurring a segv, including the sample code on the man page. I noticed that PAPI_describe_event is not exercised by the PAPI test programs, so I haven't been able to check the function call using known good code. (Or steal your code for that matter. :-) */ /* PAPI_describe_event has been deprecated in PAPI 3, since its functionality exists in other API calls. Below shows several ways that this call was used, with replacement code compatible with PAPI 3. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int EventSet = PAPI_NULL; int retval; long long g1[2]; int eventcode = PAPI_TOT_INS; char eventname[PAPI_MAX_STR_LEN]; PAPI_event_info_t info, info1, info2; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( retval = PAPI_query_event( eventcode ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_query_event(PAPI_TOT_INS)", retval ); if ( ( retval = PAPI_add_event( EventSet, eventcode ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_INS)", retval ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); if ( ( retval = PAPI_stop( EventSet, g1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* Case 0, no info, should fail */ eventname[0] = '\0'; eventcode = 0; /* if ( ( retval = PAPI_describe_event(eventname,(int *)&eventcode,eventdesc) ) == PAPI_OK) test_fail(__FILE__,__LINE__,"PAPI_describe_event",retval); */ if ( ( retval = PAPI_get_event_info( eventcode, &info ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info", retval ); /* Case 1, fill in name field. */ eventcode = PAPI_TOT_INS; eventname[0] = '\0'; /* if ( ( retval = PAPI_describe_event(eventname,(int *)&eventcode,eventdesc) ) != PAPI_OK) test_fail(__FILE__,__LINE__,"PAPI_describe_event",retval); */ if ( ( retval = PAPI_get_event_info( eventcode, &info1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info", retval ); if ( strcmp( info1.symbol, "PAPI_TOT_INS" ) != 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info symbol value is bogus", retval ); if ( strlen( info1.long_descr ) == 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info long_descr value is bogus", retval ); eventcode = 0; /* Case 2, fill in code field. */ /* if ( ( retval = PAPI_describe_event(eventname,(int *)&eventcode,eventdesc) ) != PAPI_OK) test_fail(__FILE__,__LINE__,"PAPI_describe_event",retval); */ strcpy( eventname, info1.symbol ); if ( ( retval = PAPI_event_name_to_code( eventname, ( int * ) &eventcode ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); if ( eventcode != PAPI_TOT_INS ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code code value is bogus", retval ); if ( ( retval = PAPI_get_event_info( eventcode, &info2 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info", retval ); if ( strcmp( info2.symbol, "PAPI_TOT_INS" ) != 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info symbol value is bogus", retval ); if ( strlen( info2.long_descr ) == 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info long_descr value is bogus", retval ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/pthrtough2.c0000600003276200002170000000463312247131121016510 0ustar ralphundrgrad#include #include #include #include "papi_test.h" #define NITER 2000 void * Thread( void *data ) { int ret, evtset; ( void ) data; if ( ( ret = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_thread_init", ret ); evtset = PAPI_NULL; if ( ( ret = PAPI_create_eventset( &evtset ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", ret ); if ( ( ret = PAPI_destroy_eventset( &evtset ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", ret ); if ( ( ret = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", ret ); return ( NULL ); } int main( int argc, char *argv[] ) { int j; pthread_t *th = NULL; pthread_attr_t attr; int ret; long nthr; tests_quiet( argc, argv ); /*Set TESTS_QUIET variable */ ret = PAPI_library_init( PAPI_VER_CURRENT ); if ( ret != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", ret ); if ( ( ret = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_thread_init", ret ); pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM ret = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( ret != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", ret ); #endif nthr = NITER; if ( !TESTS_QUIET ) { printf( "Creating %d threads for %d iterations each of:\n", ( int ) nthr, 1 ); printf( "\tregister\n" ); printf( "\tcreate_eventset\n" ); printf( "\tdestroy_eventset\n" ); printf( "\tunregister\n" ); } th = ( pthread_t * ) malloc( ( size_t ) nthr * sizeof ( pthread_t ) ); if ( th == NULL ) test_fail( __FILE__, __LINE__, "malloc", PAPI_ESYS ); for ( j = 0; j < nthr; j++ ) { ret = pthread_create( &th[j], &attr, &Thread, NULL ); if ( ret ) { printf( "Failed to create thread: %d\n", j ); if ( j < 10 ) test_fail( __FILE__, __LINE__, "pthread_create", PAPI_ESYS ); printf( "Continuing test with %d threads.\n", j - 1 ); nthr = j - 1; th = ( pthread_t * ) realloc( th, ( size_t ) nthr * sizeof ( pthread_t ) ); break; } } for ( j = 0; j < nthr; j++ ) { pthread_join( th[j], NULL ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow2.c0000600003276200002170000001227712247131121016332 0ustar ralphundrgrad/* * File: overflow.c * CVS: $Id$ * Author: Nils Smeds [Based on tests/overflow.c by Philip Mucci] * smeds@pdc.kth.se * Mods: * */ /* This file performs the following test: overflow dispatch The Eventset contains: + PAPI_TOT_CYC (overflow monitor) + PAPI_FP_INS - Start eventset 1 - Do flops - Stop and measure eventset 1 - Set up overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 */ #include "papi_test.h" #define OVER_FMT "handler(%d ) Overflow at %p! bit=0x%llx \n" #define OUT_FMT "%-12s : %16lld%16lld\n" int total = 0; /* total overflows */ extern int TESTS_QUIET; /* Declared in test_utils.c */ void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } total++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long ( values[2] )[2]; long long min, max; int num_flops, retval; int PAPI_event, mythreshold; char event_name[PAPI_MAX_STR_LEN]; const PAPI_hw_info_t *hw_info = NULL; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); #if defined(POWER3) || defined(__sparc__) PAPI_event = PAPI_TOT_INS; #else /* query and set up the right instruction to monitor */ PAPI_event = find_nonderived_event( ); #endif if (( PAPI_event == PAPI_FP_OPS ) || ( PAPI_event == PAPI_FP_INS )) mythreshold = THRESHOLD; else #if defined(linux) mythreshold = ( int ) hw_info->cpu_max_mhz * 10000 * 2; #else mythreshold = THRESHOLD * 2; #endif retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_add_event( EventSet, PAPI_event ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, 0, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); num_flops = NUM_FLOPS; #if defined(linux) || defined(__ia64__) || defined(_POWER4) num_flops *= 2; #endif if ( !TESTS_QUIET ) { if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf ( "Test case: Overflow dispatch of 1st event in set with 2 events.\n" ); printf ( "---------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", mythreshold ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-----------------------------------------------\n" ); printf( "Test type : %16d%16d\n", 1, 2 ); printf( OUT_FMT, event_name, ( values[0] )[0], ( values[1] )[0] ); printf( OUT_FMT, "PAPI_TOT_CYC", ( values[0] )[1], ( values[1] )[1] ); printf( "Overflows : %16s%16d\n", "", total ); printf( "-----------------------------------------------\n" ); printf( "Verification:\n" ); /* if (PAPI_event == PAPI_FP_INS) printf("Row 1 approximately equals %d %d\n", num_flops, num_flops); */ /* Note that the second run prints output on stdout. On some systems * this is costly. PAPI_TOT_INS or PAPI_TOT_CYC are likely to be _very_ * different between the two runs. * printf("Column 1 approximately equals column 2\n"); */ printf( "Row 3 approximately equals %u +- %u %%\n", ( unsigned ) ( ( values[0] )[0] / ( long long ) mythreshold ), ( unsigned ) ( OVR_TOLERANCE * 100.0 ) ); } /* min = (long long)((values[0])[0]*(1.0-TOLERANCE)); max = (long long)((values[0])[0]*(1.0+TOLERANCE)); if ( (values[1])[0] > max || (values[1])[0] < min ) test_fail(__FILE__, __LINE__, event_name, 1); */ min = ( long long ) ( ( ( double ) values[0][0] * ( 1.0 - OVR_TOLERANCE ) ) / ( double ) mythreshold ); max = ( long long ) ( ( ( double ) values[0][0] * ( 1.0 + OVR_TOLERANCE ) ) / ( double ) mythreshold ); if ( total > max || total < min ) test_fail( __FILE__, __LINE__, "Overflows", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/pthrtough.c0000600003276200002170000000454012247131121016423 0ustar ralphundrgrad#include #include #include #include "papi_test.h" #define NITER 1000 void * Thread( void *data ) { int i, ret, evtset; ( void ) data; for ( i = 0; i < NITER; i++ ) { if ( ( ret = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_thread_init", ret ); evtset = PAPI_NULL; if ( ( ret = PAPI_create_eventset( &evtset ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", ret ); if ( ( ret = PAPI_destroy_eventset( &evtset ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", ret ); if ( ( ret = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", ret ); } return ( NULL ); } int main( int argc, char *argv[] ) { int j; pthread_t *th = NULL; pthread_attr_t attr; int ret; long nthr; const PAPI_hw_info_t *hwinfo; tests_quiet( argc, argv ); /*Set TESTS_QUIET variable */ ret = PAPI_library_init( PAPI_VER_CURRENT ); if ( ret != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", ret ); if ( ( ret = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_thread_init", ret ); pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM ret=pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( ret != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", ret ); #endif if ( ( hwinfo = PAPI_get_hardware_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 0 ); nthr = hwinfo->ncpu; if ( !TESTS_QUIET ) { printf( "Creating %ld threads for %d iterations each of:\n", nthr, NITER ); printf( "\tregister\n" ); printf( "\tcreate_eventset\n" ); printf( "\tdestroy_eventset\n" ); printf( "\tunregister\n" ); } th = ( pthread_t * ) malloc( ( size_t ) nthr * sizeof ( pthread_t ) ); if ( th == NULL ) test_fail( __FILE__, __LINE__, "malloc", PAPI_ESYS ); for ( j = 0; j < nthr; j++ ) { ret = pthread_create( &th[j], &attr, &Thread, NULL ); if ( ret ) test_fail( __FILE__, __LINE__, "pthread_create", PAPI_ESYS ); } for ( j = 0; j < nthr; j++ ) { pthread_join( th[j], NULL ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/Makefile0000600003276200002170000000122012247131121015663 0ustar ralphundrgrad# File: ctests/Makefile INCLUDE = -I.. -I../testlib PAPILIB=$(LIBRARY) CC = gcc CC_R = $(CC) -pthread CFLAGS = -g -O -Wall testlibdir=../testlib TESTLIB= $(testlibdir)/libtestlib.a include Makefile.recipies .PHONY : install install: default @echo "C tests (DATADIR) being installed in: \"$(DATADIR)\""; -mkdir -p $(DATADIR)/ctests -chmod go+rx $(DATADIR) -chmod go+rx $(DATADIR)/ctests -find . -perm -100 -type f -exec cp {} $(DATADIR)/ctests \; -chmod go+rx $(DATADIR)/ctests/* -find . -name "*.[ch]" -type f -exec cp {} $(DATADIR)/ctests \; -cp Makefile.target $(DATADIR)/ctests/Makefile -cat Makefile.recipies >> $(DATADIR)/ctests/Makefile papi-5.3.0/src/ctests/calibrate.c0000600003276200002170000003026512247131121016330 0ustar ralphundrgrad/* Calibrate.c A program to perform one or all of three tests to count flops. Test 1. Inner Product: 2*n operations for i = 1:n; a = a + x(i)*y(i); end Test 2. Matrix Vector Product: 2*n^2 operations for i = 1:n; for j = 1:n; x(i) = x(i) + a(i,j)*y(j); end; end; Test 3. Matrix Matrix Multiply: 2*n^3 operations for i = 1:n; for j = 1:n; for k = 1:n; c(i,j) = c(i,j) + a(i,k)*b(k,j); end; end; end; Supply a command line argument of 1, 2, or 3 to perform each test, or no argument to perform all three. Each test initializes PAPI and presents a header with processor information. Then it performs 500 iterations, printing result lines containing: n, measured counts, theoretical counts, (measured - theory), % error */ #include "papi_test.h" static void resultline( int i, int j, int EventSet, int fail ); static void headerlines( char *title, int TESTS_QUIET ); #define INDEX1 100 #define INDEX5 500 #define MAX_WARN 10 #define MAX_ERROR 80 #define MAX_DIFF 14 extern int TESTS_QUIET; static void print_help( char **argv ) { printf( "Usage: %s [-ivmdh] [-e event]\n", argv[0] ); printf( "Options:\n\n" ); printf( "\t-i Inner Product test.\n" ); printf( "\t-v Matrix-Vector multiply test.\n" ); printf( "\t-m Matrix-Matrix multiply test.\n" ); printf( "\t-d Double precision data. Default is float.\n" ); printf ( "\t-e event Use as PAPI event instead of PAPI_FP_OPS\n" ); printf( "\t-f Suppress failures\n" ); printf( "\t-h Print this help message\n" ); printf( "\n" ); printf ( "This test measures floating point operations for the specified test.\n" ); printf( "Operations can be performed in single or double precision.\n" ); printf( "Default operation is all three tests in single precision.\n" ); } static float inner_single( int n, float *x, float *y ) { float aa = 0.0; int i; for ( i = 0; i <= n; i++ ) aa = aa + x[i] * y[i]; return ( aa ); } static double inner_double( int n, double *x, double *y ) { double aa = 0.0; int i; for ( i = 0; i <= n; i++ ) aa = aa + x[i] * y[i]; return ( aa ); } static void vector_single( int n, float *a, float *x, float *y ) { int i, j; for ( i = 0; i <= n; i++ ) for ( j = 0; j <= n; j++ ) y[i] = y[i] + a[i * n + j] * x[i]; } static void vector_double( int n, double *a, double *x, double *y ) { int i, j; for ( i = 0; i <= n; i++ ) for ( j = 0; j <= n; j++ ) y[i] = y[i] + a[i * n + j] * x[i]; } static void matrix_single( int n, float *c, float *a, float *b ) { int i, j, k; for ( i = 0; i <= n; i++ ) for ( j = 0; j <= n; j++ ) for ( k = 0; k <= n; k++ ) c[i * n + j] = c[i * n + j] + a[i * n + k] * b[k * n + j]; } static void matrix_double( int n, double *c, double *a, double *b ) { int i, j, k; for ( i = 0; i <= n; i++ ) for ( j = 0; j <= n; j++ ) for ( k = 0; k <= n; k++ ) c[i * n + j] = c[i * n + j] + a[i * n + k] * b[k * n + j]; } static void reset_flops( char *title, int EventSet ) { int retval; char err_str[PAPI_MAX_STR_LEN]; retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { sprintf( err_str, "%s: PAPI_start", title ); test_fail( __FILE__, __LINE__, err_str, retval ); } } int main( int argc, char *argv[] ) { extern void dummy( void * ); float aa, *a, *b, *c, *x, *y; double aad, *ad, *bd, *cd, *xd, *yd; int i, j, n; int inner = 0; int vector = 0; int matrix = 0; int double_precision = 0; int fail = 1; int retval = PAPI_OK; char papi_event_str[PAPI_MIN_STR_LEN] = "PAPI_FP_OPS"; int papi_event; int EventSet = PAPI_NULL; /* Parse the input arguments */ for ( i = 0; i < argc; i++ ) { if ( strstr( argv[i], "-i" ) ) inner = 1; else if ( strstr( argv[i], "-f" ) ) fail = 0; else if ( strstr( argv[i], "-v" ) ) vector = 1; else if ( strstr( argv[i], "-m" ) ) matrix = 1; else if ( strstr( argv[i], "-e" ) ) { if ( ( argv[i + 1] == NULL ) || ( strlen( argv[i + 1] ) == 0 ) ) { print_help( argv ); exit( 1 ); } strncpy( papi_event_str, argv[i + 1], sizeof ( papi_event_str ) ); i++; } else if ( strstr( argv[i], "-d" ) ) double_precision = 1; else if ( strstr( argv[i], "-h" ) ) { print_help( argv ); exit( 1 ); } } /* if no options specified, set all tests to TRUE */ if ( inner + vector + matrix == 0 ) inner = vector = matrix = 1; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( !TESTS_QUIET ) printf( "Initializing..." ); /* Initialize PAPI */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* Translate name */ retval = PAPI_event_name_to_code( papi_event_str, &papi_event ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); if ( PAPI_query_event( papi_event ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", PAPI_ENOEVNT ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( retval = PAPI_add_event( EventSet, papi_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); printf( "\n" ); retval = PAPI_OK; /* Inner Product test */ if ( inner ) { /* Allocate the linear arrays */ if (double_precision) { xd = malloc( INDEX5 * sizeof(double) ); yd = malloc( INDEX5 * sizeof(double) ); if ( !( xd && yd ) ) retval = PAPI_ENOMEM; } else { x = malloc( INDEX5 * sizeof(float) ); y = malloc( INDEX5 * sizeof(float) ); if ( !( x && y ) ) retval = PAPI_ENOMEM; } if ( retval == PAPI_OK ) { headerlines( "Inner Product Test", TESTS_QUIET ); /* step through the different array sizes */ for ( n = 0; n < INDEX5; n++ ) { if ( n < INDEX1 || ( ( n + 1 ) % 50 ) == 0 ) { /* Initialize the needed arrays at this size */ if ( double_precision ) { for ( i = 0; i <= n; i++ ) { xd[i] = ( double ) rand( ) * ( double ) 1.1; yd[i] = ( double ) rand( ) * ( double ) 1.1; } } else { for ( i = 0; i <= n; i++ ) { x[i] = ( float ) rand( ) * ( float ) 1.1; y[i] = ( float ) rand( ) * ( float ) 1.1; } } /* reset PAPI flops count */ reset_flops( "Inner Product Test", EventSet ); /* do the multiplication */ if ( double_precision ) { aad = inner_double( n, xd, yd ); dummy( ( void * ) &aad ); } else { aa = inner_single( n, x, y ); dummy( ( void * ) &aa ); } resultline( n, 1, EventSet, fail ); } } } if (double_precision) { free( xd ); free( yd ); } else { free( x ); free( y ); } } /* Matrix Vector test */ if ( vector && retval != PAPI_ENOMEM ) { /* Allocate the needed arrays */ if (double_precision) { ad = malloc( INDEX5 * INDEX5 * sizeof(double) ); xd = malloc( INDEX5 * sizeof(double) ); yd = malloc( INDEX5 * sizeof(double) ); if ( !( ad && xd && yd ) ) retval = PAPI_ENOMEM; } else { a = malloc( INDEX5 * INDEX5 * sizeof(float) ); x = malloc( INDEX5 * sizeof(float) ); y = malloc( INDEX5 * sizeof(float) ); if ( !( a && x && y ) ) retval = PAPI_ENOMEM; } if ( retval == PAPI_OK ) { headerlines( "Matrix Vector Test", TESTS_QUIET ); /* step through the different array sizes */ for ( n = 0; n < INDEX5; n++ ) { if ( n < INDEX1 || ( ( n + 1 ) % 50 ) == 0 ) { /* Initialize the needed arrays at this size */ if ( double_precision ) { for ( i = 0; i <= n; i++ ) { yd[i] = 0.0; xd[i] = ( double ) rand( ) * ( double ) 1.1; for ( j = 0; j <= n; j++ ) ad[i * n + j] = ( double ) rand( ) * ( double ) 1.1; } } else { for ( i = 0; i <= n; i++ ) { y[i] = 0.0; x[i] = ( float ) rand( ) * ( float ) 1.1; for ( j = 0; j <= n; j++ ) a[i * n + j] = ( float ) rand( ) * ( float ) 1.1; } } /* reset PAPI flops count */ reset_flops( "Matrix Vector Test", EventSet ); /* compute the resultant vector */ if ( double_precision ) { vector_double( n, ad, xd, yd ); dummy( ( void * ) yd ); } else { vector_single( n, a, x, y ); dummy( ( void * ) y ); } resultline( n, 2, EventSet, fail ); } } } if (double_precision) { free( ad ); free( xd ); free( yd ); } else { free( a ); free( x ); free( y ); } } /* Matrix Multiply test */ if ( matrix && retval != PAPI_ENOMEM ) { /* Allocate the needed arrays */ if (double_precision) { ad = malloc( INDEX5 * INDEX5 * sizeof(double) ); bd = malloc( INDEX5 * INDEX5 * sizeof(double) ); cd = malloc( INDEX5 * INDEX5 * sizeof(double) ); if ( !( ad && bd && cd ) ) retval = PAPI_ENOMEM; } else { a = malloc( INDEX5 * INDEX5 * sizeof(float) ); b = malloc( INDEX5 * INDEX5 * sizeof(float) ); c = malloc( INDEX5 * INDEX5 * sizeof(float) ); if ( !( a && b && c ) ) retval = PAPI_ENOMEM; } if ( retval == PAPI_OK ) { headerlines( "Matrix Multiply Test", TESTS_QUIET ); /* step through the different array sizes */ for ( n = 0; n < INDEX5; n++ ) { if ( n < INDEX1 || ( ( n + 1 ) % 50 ) == 0 ) { /* Initialize the needed arrays at this size */ if ( double_precision ) { for ( i = 0; i <= n * n + n; i++ ) { cd[i] = 0.0; ad[i] = ( double ) rand( ) * ( double ) 1.1; bd[i] = ( double ) rand( ) * ( double ) 1.1; } } else { for ( i = 0; i <= n * n + n; i++ ) { c[i] = 0.0; a[i] = ( float ) rand( ) * ( float ) 1.1; b[i] = ( float ) rand( ) * ( float ) 1.1; } } /* reset PAPI flops count */ reset_flops( "Matrix Multiply Test", EventSet ); /* compute the resultant matrix */ if ( double_precision ) { matrix_double( n, cd, ad, bd ); dummy( ( void * ) c ); } else { matrix_single( n, c, a, b ); dummy( ( void * ) c ); } resultline( n, 3, EventSet, fail ); } } } if (double_precision) { free( ad ); free( bd ); free( cd ); } else { free( a ); free( b ); free( c ); } } /* exit with status code */ if ( retval == PAPI_ENOMEM ) test_fail( __FILE__, __LINE__, "malloc", retval ); else test_pass( __FILE__, NULL, 0 ); exit( 1 ); } /* Extract and display hardware information for this processor. (Re)Initialize PAPI_flops() and begin counting floating ops. */ static void headerlines( char *title, int TESTS_QUIET ) { const PAPI_hw_info_t *hwinfo = NULL; if ( !TESTS_QUIET ) { if ( papi_print_header( "", &hwinfo ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); printf( "\n%s:\n%8s %12s %12s %8s %8s\n", title, "i", "papi", "theory", "diff", "%error" ); printf ( "-------------------------------------------------------------------------\n" ); } } /* Read PAPI_flops. Format and display results. Compute error without using floating ops. */ #if defined(mips) #define FMA 1 #elif (defined(sparc) && defined(sun)) #define FMA 1 #else #define FMA 0 #endif static void resultline( int i, int j, int EventSet, int fail ) { float ferror = 0; long long flpins = 0; long long papi, theory; int diff, retval; char err_str[PAPI_MAX_STR_LEN]; retval = PAPI_stop( EventSet, &flpins ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); i++; /* convert to 1s base */ theory = 2; while ( j-- ) theory *= i; /* theoretical ops */ papi = flpins << FMA; diff = ( int ) ( papi - theory ); ferror = ( ( float ) abs( diff ) ) / ( ( float ) theory ) * 100; printf( "%8d %12lld %12lld %8d %10.4f\n", i, papi, theory, diff, ferror ); if ( ferror > MAX_WARN && abs( diff ) > MAX_DIFF && i > 20 ) { sprintf( err_str, "Calibrate: difference exceeds %d percent", MAX_WARN ); test_warn( __FILE__, __LINE__, err_str, 0 ); } if (fail) { if ( ferror > MAX_ERROR && abs( diff ) > MAX_DIFF && i > 20 ) { sprintf( err_str, "Calibrate: error exceeds %d percent", MAX_ERROR ); test_fail( __FILE__, __LINE__, err_str, PAPI_EMISC ); } } } papi-5.3.0/src/ctests/zero_flip.c0000600003276200002170000001171512247131121016372 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 2, eventcnt, events[2], i, tmp; int EventSet1 = PAPI_NULL, EventSet2 = PAPI_NULL; int PAPI_event; long long values1[2], values2[2]; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* query and set up the right instruction to monitor */ if ( PAPI_query_event( PAPI_FP_OPS ) == PAPI_OK ) PAPI_event = PAPI_FP_OPS; else PAPI_event = PAPI_TOT_INS; retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); retval = PAPI_create_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* Add the events */ printf( "Adding: %s\n", event_name ); retval = PAPI_add_event( EventSet1, PAPI_event ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_add_event( EventSet1, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); /* Add them reversed to EventSet2 */ retval = PAPI_create_eventset( &EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); eventcnt = 2; retval = PAPI_list_events( EventSet1, events, &eventcnt ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_list_events", retval ); for ( i = eventcnt - 1; i >= 0; i-- ) { retval = PAPI_event_code_to_name( events[i], event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); retval = PAPI_add_event( EventSet2, events[i] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet1, values1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_start( EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet2, values2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_cleanup_eventset( EventSet1 ); /* JT */ if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); retval = PAPI_cleanup_eventset( EventSet2 ); /* JT */ if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( !TESTS_QUIET ) { printf( "Test case 0: start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\t 2\n" ); sprintf( add_event_str, "%-12s : \t", event_name ); printf( TAB2, add_event_str, values1[0], values2[1] ); printf( TAB2, "PAPI_TOT_CYC : \t", values1[1], values2[0] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification: none\n" ); } test_pass( __FILE__, NULL, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/multiplex2.c0000600003276200002170000001145712247131121016511 0ustar ralphundrgrad/* * File: multiplex.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file tests the multiplex functionality, originally developed by John May of LLNL. */ #include "papi_test.h" void init_papi( void ) { int retval; /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* Tests that we can really multiplex a lot. */ int case1( void ) { int retval, i, EventSet = PAPI_NULL, j = 0, k = 0, allvalid = 1; int max_mux, nev, *events; long long *values; PAPI_event_info_t pset; char evname[PAPI_MAX_STR_LEN]; init_papi( ); init_multiplex( ); #if 0 if ( PAPI_set_domain( PAPI_DOM_KERNEL ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); #endif retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); #if 0 if ( PAPI_set_domain( PAPI_DOM_KERNEL ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); #endif /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); #if 0 if ( PAPI_set_domain( PAPI_DOM_KERNEL ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); #endif retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); max_mux = PAPI_get_opt( PAPI_MAX_MPX_CTRS, NULL ); if ( max_mux > 32 ) max_mux = 32; #if 0 if ( PAPI_set_domain( PAPI_DOM_KERNEL ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); #endif /* Fill up the event set with as many non-derived events as we can */ printf ( "\nFilling the event set with as many non-derived events as we can...\n" ); i = PAPI_PRESET_MASK; do { if ( PAPI_get_event_info( i, &pset ) == PAPI_OK ) { if ( pset.count && ( strcmp( pset.derived, "NOT_DERIVED" ) == 0 ) ) { retval = PAPI_add_event( EventSet, ( int ) pset.event_code ); if ( retval != PAPI_OK ) { printf("Failed trying to add %s\n",pset.symbol); break; } else { printf( "Added %s\n", pset.symbol ); j++; } } } } while ( ( PAPI_enum_event( &i, PAPI_PRESET_ENUM_AVAIL ) == PAPI_OK ) && ( j < max_mux ) ); events = ( int * ) malloc( ( size_t ) j * sizeof ( int ) ); if ( events == NULL ) test_fail( __FILE__, __LINE__, "malloc events", 0 ); values = ( long long * ) malloc( ( size_t ) j * sizeof ( long long ) ); if ( values == NULL ) test_fail( __FILE__, __LINE__, "malloc values", 0 ); do_stuff( ); #if 0 if ( PAPI_set_domain( PAPI_DOM_KERNEL ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); #endif if ( PAPI_start( EventSet ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); nev = j; retval = PAPI_list_events( EventSet, events, &nev ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_list_events", retval ); printf( "\nEvent Counts:\n" ); for ( i = 0, allvalid = 0; i < j; i++ ) { PAPI_event_code_to_name( events[i], evname ); printf( TAB1, evname, values[i] ); if ( values[i] == 0 ) allvalid++; } printf( "\n" ); if ( allvalid ) { printf( "Caution: %d counters had zero values\n", allvalid ); } if (allvalid==j) { test_fail( __FILE__, __LINE__, "All counters returned zero", 5 ); } for ( i = 0, allvalid = 0; i < j; i++ ) { for ( k = i + 1; k < j; k++ ) { if ( ( i != k ) && ( values[i] == values[k] ) ) { allvalid++; break; } } } if ( allvalid ) { printf( "Caution: %d counter pair(s) had identical values\n", allvalid ); } free( events ); free( values ); retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); return ( SUCCESS ); } int main( int argc, char **argv ) { tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ printf( "%s: Does PAPI_multiplex_init() handle lots of events?\n", argv[0] ); printf( "Using %d iterations\n", NUM_ITERS ); case1( ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/all_native_events.c0000600003276200002170000001227412247131121020104 0ustar ralphundrgrad/* * File: all_native_events.c * Author: Haihang You * you@cs.utk.edu * Mods: * */ /* This file hardware info and performs the following test: - Start and stop all native events. This is a good preliminary way to validate native event tables. In its current form this test also stresses the number of events sets the library can handle outstanding. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ extern unsigned char PENTIUM4; static int check_event( int event_code, char *name ) { int retval; long long values; int EventSet = PAPI_NULL; /* Is there an issue with older machines? */ /* Disable for now, add back once we can reproduce */ // if ( PENTIUM4 ) { // if ( strcmp( name, "REPLAY_EVENT:BR_MSP" ) == 0 ) { // return 1; // } // } retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_add_event( EventSet, event_code ); if ( retval != PAPI_OK ) { printf( "Error adding %s %d\n", name, retval ); return 0; } else { // printf( "Added %s successfully ", name ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { PAPI_perror( "PAPI_start" ); } else { retval = PAPI_stop( EventSet, &values ); if ( retval != PAPI_OK ) { PAPI_perror( "PAPI_stop" ); return 0; } else { printf( "Added and Stopped %s successfully.\n", name ); } } retval=PAPI_cleanup_eventset( EventSet ); if (retval != PAPI_OK ) { test_warn( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval); } retval=PAPI_destroy_eventset( &EventSet ); if (retval != PAPI_OK ) { test_warn( __FILE__, __LINE__, "PAPI_destroy_eventset", retval); } return ( 1 ); } int main( int argc, char **argv ) { int i, k, add_count = 0, err_count = 0, unc_count = 0, offcore_count = 0; int retval; PAPI_event_info_t info, info1; const PAPI_hw_info_t *hwinfo = NULL; const PAPI_component_info_t* cmpinfo; char *Intel_i7; int event_code; int numcmp, cid; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = papi_print_header( "Test case ALL_NATIVE_EVENTS: Available " "native events and hardware " "information.\n", &hwinfo ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); } numcmp = PAPI_num_components( ); /* we need a little exception processing if it's a Core i7 */ /* Unfortunately, this test never succeeds... */ Intel_i7 = strstr( hwinfo->model_string, "Intel Core i7" ); /* Loop through all components */ for( cid = 0; cid < numcmp; cid++ ) { cmpinfo = PAPI_get_component_info( cid ); if (cmpinfo == NULL) { test_fail( __FILE__, __LINE__, "PAPI_get_component_info", 2 ); } if (cmpinfo->disabled) { printf( "Name: %-23s %s\n", cmpinfo->name ,cmpinfo->description); printf(" \\-> Disabled: %s\n",cmpinfo->disabled_reason); continue; } /* For platform independence, always ASK FOR the first event */ /* Don't just assume it'll be the first numeric value */ i = 0 | PAPI_NATIVE_MASK; retval = PAPI_enum_cmp_event( &i, PAPI_ENUM_FIRST, cid ); do { retval = PAPI_get_event_info( i, &info ); /* Skip OFFCORE and UNCORE events */ /* Adding them will fail currently */ if ( Intel_i7 || ( hwinfo->vendor == PAPI_VENDOR_INTEL ) ) { if ( !strncmp( info.symbol, "UNC_", 4 ) ) { unc_count++; continue; } if ( !strncmp( info.symbol, "OFFCORE_RESPONSE_0", 18 ) ) { offcore_count++; continue; } } /* Enumerate all umasks */ k = i; if ( PAPI_enum_cmp_event(&k, PAPI_NTV_ENUM_UMASKS, cid )==PAPI_OK ) { do { retval = PAPI_get_event_info( k, &info1 ); event_code = ( int ) info1.event_code; if ( check_event( event_code, info1.symbol ) ) { add_count++; } else { err_count++; } } while ( PAPI_enum_cmp_event( &k, PAPI_NTV_ENUM_UMASKS, cid ) == PAPI_OK ); } else { /* Event didn't have any umasks */ event_code = ( int ) info.event_code; if ( check_event( event_code, info.symbol ) ) { add_count++; } else { err_count++; } } } while ( PAPI_enum_cmp_event( &i, PAPI_ENUM_EVENTS, cid ) == PAPI_OK ); } printf( "\n\nSuccessfully found and added %d events " "(in %d eventsets).\n", add_count , add_count); if ( err_count ) { printf( "Failed to add %d events.\n", err_count ); } if (( unc_count ) || (offcore_count)) { char warning[BUFSIZ]; sprintf(warning,"%d Uncore and %d Offcore events were ignored", unc_count,offcore_count); test_warn( __FILE__, __LINE__, warning, 1 ); } if ( add_count > 0 ) { test_pass( __FILE__, NULL, 0 ); } else { test_fail( __FILE__, __LINE__, "No events added", 1 ); } exit( 1 ); } papi-5.3.0/src/ctests/version.c0000600003276200002170000000364712247131121016073 0ustar ralphundrgrad/* This file performs the following test: compare and report versions from papi.h and the papi library */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, init_version, lib_version; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ init_version = PAPI_library_init( PAPI_VER_CURRENT ); if ( init_version != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", init_version ); if ( ( lib_version = PAPI_get_opt( PAPI_LIB_VERSION, NULL ) ) == PAPI_EINVAL ) test_fail( __FILE__, __LINE__, "PAPI_get_opt", PAPI_EINVAL ); if ( !TESTS_QUIET ) { printf ( "Version.c: Compare and report versions from papi.h and the papi library.\n" ); printf ( "-------------------------------------------------------------------------\n" ); printf( " MAJOR MINOR REVISION\n" ); printf ( "-------------------------------------------------------------------------\n" ); printf( "PAPI_VER_CURRENT : %4d %6d %7d\n", PAPI_VERSION_MAJOR( PAPI_VER_CURRENT ), PAPI_VERSION_MINOR( PAPI_VER_CURRENT ), PAPI_VERSION_REVISION( PAPI_VER_CURRENT ) ); printf( "PAPI_library_init: %4d %6d %7d\n", PAPI_VERSION_MAJOR( init_version ), PAPI_VERSION_MINOR( init_version ), PAPI_VERSION_REVISION( init_version ) ); printf( "PAPI_VERSION : %4d %6d %7d\n", PAPI_VERSION_MAJOR( PAPI_VERSION ), PAPI_VERSION_MINOR( PAPI_VERSION ), PAPI_VERSION_REVISION( PAPI_VERSION ) ); printf( "PAPI_get_opt : %4d %6d %7d\n", PAPI_VERSION_MAJOR( lib_version ), PAPI_VERSION_MINOR( lib_version ), PAPI_VERSION_REVISION( lib_version ) ); printf ( "-------------------------------------------------------------------------\n" ); } if ( lib_version != PAPI_VERSION ) test_fail( __FILE__, __LINE__, "Version Mismatch", PAPI_EINVAL ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/hl_rates.c0000600003276200002170000001624712247131121016207 0ustar ralphundrgrad/* file hl_rates.c * This test exercises the four PAPI High Level rate calls: * PAPI_flops, PAPI_flips, PAPI_ipc, and PAPI_epc * flops and flips report cumulative real and process time since the first call, * and either floating point operations or instructions since the first call. * Also reported is incremental flop or flip rate since the last call. * * PAPI_ipc reports the same cumulative information, substituting total instructions * for flops or flips, and also reports instructions per (process) cycle as * a measure of execution efficiency. * * PAPI_epc is new in PAPI 5.2. It reports the same information as PAPI_IPC, but * for an arbitrary event instead of total cycles. It also reports incremental * core and (where available) reference cycles to allow the computation of * effective clock rates in the presence of clock scaling like speed step or turbo-boost. * * This test computes a 1000 x 1000 matrix multiply for orders of indexing for * each of the four rate calls. It also accepts a command line parameter for the * event to be measured for PAPI_epc. If not provided, PAPI_TOT_INS is measured. */ #include "papi_test.h" #define ROWS 1000 // Number of rows in each matrix #define COLUMNS 1000 // Number of columns in each matrix static float matrix_a[ROWS][COLUMNS], matrix_b[ROWS][COLUMNS],matrix_c[ROWS][COLUMNS]; static void init_mat() { // Initialize the two matrices int i, j; for (i = 0; i < ROWS; i++) { for (j = 0; j < COLUMNS; j++) { matrix_a[i][j] = (float) rand() / RAND_MAX; matrix_b[i][j] = (float) rand() / RAND_MAX; } } } static void classic_matmul() { // Multiply the two matrices int i, j, k; for (i = 0; i < ROWS; i++) { for (j = 0; j < COLUMNS; j++) { float sum = 0.0; for (k = 0; k < COLUMNS; k++) { sum += matrix_a[i][k] * matrix_b[k][j]; } matrix_c[i][j] = sum; } } } static void swapped_matmul() { // Multiply the two matrices int i, j, k; for (i = 0; i < ROWS; i++) { for (k = 0; k < COLUMNS; k++) { for (j = 0; j < COLUMNS; j++) { matrix_c[i][j] += matrix_a[i][k] * matrix_b[k][j]; } } } } int main( int argc, char **argv ) { int retval, event = 0; float rtime, ptime, mflips, mflops, ipc, epc; long long flpins, flpops, ins, ref, core, evt; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ init_mat(); printf( "\n----------------------------------\n" ); printf( "PAPI_flips\n"); if ( PAPI_flips(&rtime, &ptime, &flpins, &mflips) != PAPI_OK ) PAPI_perror( "PAPI_flips" ); printf( "\nStart\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Instructions: %lld\n", flpins); printf( "MFLIPS %f\n", mflips); classic_matmul(); if ( PAPI_flips(&rtime, &ptime, &flpins, &mflips) != PAPI_OK ) PAPI_perror( "PAPI_flips" ); printf( "\nClassic\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Instructions: %lld\n", flpins); printf( "MFLIPS %f\n", mflips); swapped_matmul(); if ( PAPI_flips(&rtime, &ptime, &flpins, &mflips) != PAPI_OK ) PAPI_perror( "PAPI_flips" ); printf( "\nSwapped\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Instructions: %lld\n", flpins); printf( "MFLIPS %f\n", mflips); // turn off flips if ( PAPI_stop_counters(NULL, 0) != PAPI_OK ) PAPI_perror( "PAPI_stop_counters" ); printf( "\n----------------------------------\n" ); printf( "PAPI_flops\n"); if ( PAPI_flops(&rtime, &ptime, &flpops, &mflops) != PAPI_OK ) PAPI_perror( "PAPI_flops" ); printf( "\nStart\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Operations: %lld\n", flpops); printf( "MFLOPS %f\n", mflops); classic_matmul(); if ( PAPI_flops(&rtime, &ptime, &flpops, &mflops) != PAPI_OK ) PAPI_perror( "PAPI_flops" ); printf( "\nClassic\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Operations: %lld\n", flpops); printf( "MFLOPS %f\n", mflops); swapped_matmul(); if ( PAPI_flops(&rtime, &ptime, &flpops, &mflops) != PAPI_OK ) PAPI_perror( "PAPI_flops" ); printf( "\nSwapped\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "FP Operations: %lld\n", flpops); printf( "MFLOPS %f\n", mflops); // turn off flops if ( PAPI_stop_counters(NULL, 0) != PAPI_OK ) PAPI_perror( "PAPI_stop_counters" ); printf( "\n----------------------------------\n" ); printf( "PAPI_ipc\n"); if ( PAPI_ipc(&rtime, &ptime, &ins, &ipc) != PAPI_OK ) PAPI_perror( "PAPI_ipc" ); printf( "\nStart\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Instructions: %lld\n", ins); printf( "IPC %f\n", ipc); classic_matmul(); if ( PAPI_ipc(&rtime, &ptime, &ins, &ipc) != PAPI_OK ) PAPI_perror( "PAPI_ipc" ); printf( "\nClassic\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Instructions: %lld\n", ins); printf( "IPC %f\n", ipc); swapped_matmul(); if ( PAPI_ipc(&rtime, &ptime, &ins, &ipc) != PAPI_OK ) PAPI_perror( "PAPI_ipc" ); printf( "\nSwapped\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Instructions: %lld\n", ins); printf( "IPC %f\n", ipc); // turn off ipc if ( PAPI_stop_counters(NULL, 0) != PAPI_OK ) PAPI_perror( "PAPI_stop_counters" ); printf( "\n----------------------------------\n" ); printf( "PAPI_epc\n"); if ( argc >= 2) { retval = PAPI_event_name_to_code( argv[1], &event ); if (retval != PAPI_OK) { PAPI_perror("PAPI_event_name_to_code"); printf("Can't find %s; Using PAPI_TOT_INS\n", argv[1]); event = 0; } else { printf("Using event %s\n", argv[1]); } } if ( PAPI_epc(event, &rtime, &ptime, &ref, &core, &evt, &epc) != PAPI_OK ) PAPI_perror( "PAPI_epc" ); printf( "\nStart\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Ref Cycles: %lld\n", ref); printf( "Core Cycles: %lld\n", core); printf( "Events: %lld\n", evt); printf( "EPC: %f\n", epc); classic_matmul(); if ( PAPI_epc(event, &rtime, &ptime, &ref, &core, &evt, &epc) != PAPI_OK ) PAPI_perror( "PAPI_epc" ); printf( "\nClassic\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Ref Cycles: %lld\n", ref); printf( "Core Cycles: %lld\n", core); printf( "Events: %lld\n", evt); printf( "EPC: %f\n", epc); swapped_matmul(); if ( PAPI_epc(event, &rtime, &ptime, &ref, &core, &evt, &epc) != PAPI_OK ) PAPI_perror( "PAPI_epc" ); printf( "\nSwapped\n"); printf( "real time: %f\n", rtime); printf( "process time: %f\n", ptime); printf( "Ref Cycles: %lld\n", ref); printf( "Core Cycles: %lld\n", core); printf( "Events: %lld\n", evt); printf( "EPC: %f\n", epc); // turn off epc if ( PAPI_stop_counters(NULL, 0) != PAPI_OK ) PAPI_perror( "PAPI_stop_counters" ); printf( "\n----------------------------------\n" ); exit( 1 ); } papi-5.3.0/src/ctests/ipc.c0000600003276200002170000000367412247131121015161 0ustar ralphundrgrad/* * A simple example for the use of PAPI, using PAPI_ipc * -Kevin London */ #include "papi_test.h" #define INDEX 500 extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { extern void dummy( void * ); float matrixa[INDEX][INDEX], matrixb[INDEX][INDEX], mresult[INDEX][INDEX]; float real_time, proc_time, ipc; long long ins; int retval; int i, j, k; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ /* Initialize the Matrix arrays */ for( i = 0; i < INDEX; i++ ) { for( j= 0; j < INDEX; j++ ) { mresult[i][j] = 0.0; matrixa[i][j] = matrixb[i][j] = ( float ) rand( ) * ( float ) 1.1; } } /* Setup PAPI library and begin collecting data from the counters */ if ( ( retval = PAPI_ipc( &real_time, &proc_time, &ins, &ipc ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_ipc", retval ); /* Matrix-Matrix multiply */ for ( i = 0; i < INDEX; i++ ) for ( j = 0; j < INDEX; j++ ) for ( k = 0; k < INDEX; k++ ) mresult[i][j] = mresult[i][j] + matrixa[i][k] * matrixb[k][j]; /* Collect the data into the variables passed in */ if ( ( retval = PAPI_ipc( &real_time, &proc_time, &ins, &ipc ) ) < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_ipc", retval ); dummy( ( void * ) mresult ); if ( !TESTS_QUIET ) { printf( "Real_time: %f Proc_time: %f Total ins: ", real_time, proc_time ); printf( LLDFMT, ins ); printf( " IPC: %f\n", ipc ); } /* This should not happen unless the optimizer */ /* gets too good */ if (ins < INDEX*INDEX) { test_fail( __FILE__, __LINE__, "Instruction count too low.", 5 ); } /* Something is broken, or else you have a really */ /* slow processor */ if (ipc<0.01 ) { test_fail( __FILE__, __LINE__, "IPC equals zero.", 5 ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/multiattach2.c0000600003276200002170000001657112247131121017007 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for an attached process as well as itself. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #include #include #ifdef _AIX #define _LINUX_SOURCE_COMPAT #endif #if defined(__FreeBSD__) # define PTRACE_ATTACH PT_ATTACH # define PTRACE_CONT PT_CONTINUE #endif int wait_for_attach_and_loop( int num ) { kill( getpid( ), SIGSTOP ); do_flops( NUM_FLOPS * num ); kill( getpid( ), SIGSTOP ); return 0; } int main( int argc, char **argv ) { int status, retval, num_tests = 2, tmp; int EventSet1 = PAPI_NULL, EventSet2 = PAPI_NULL; int PAPI_event, PAPI_event2, mask1, mask2; int num_events1, num_events2; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; const PAPI_component_info_t *cmpinfo; pid_t pid; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* init the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* get component info */ if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) { test_fail_exit( __FILE__, __LINE__, "PAPI_get_component_info", 0 ); } /* see if we support attach */ if ( cmpinfo->attach == 0 ) { test_skip( __FILE__, __LINE__, "Platform does not support attaching",0 ); } /* fork! */ pid = fork( ); if ( pid < 0 ) { test_fail_exit( __FILE__, __LINE__, "fork()", PAPI_ESYS ); } /* if child, wait_for_attach_and_loop */ if ( pid == 0 ) { exit( wait_for_attach_and_loop( 2 ) ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); EventSet2 = add_two_events( &num_events2, &PAPI_event2, &mask2 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_ATTACH, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didnt return true to WIFSTOPPED", 0 ); } } retval = PAPI_attach( EventSet2, ( unsigned long ) pid ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_attach", retval ); } retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); /* num_events1 is greater than num_events2 so don't worry. */ values = allocate_test_space( num_tests, num_events1 ); /* get before values */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } retval = PAPI_start( EventSet2 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* Wait for the SIGSTOP. */ if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_ATTACH)" ); return 1; } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFSTOPPED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFSTOPPED", 0 ); } if ( WSTOPSIG( status ) != SIGSTOP ) { test_fail( __FILE__, __LINE__, "Child process didn't stop on SIGSTOP", 0 ); } } elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { printf( "Warning: PAPI_stop returned error %d, probably ok.\n", retval ); } retval = PAPI_stop( EventSet2, values[1] ); if ( retval != PAPI_OK ) { printf( "Warning: PAPI_stop returned error %d, probably ok.\n", retval ); } remove_test_events( &EventSet1, mask1 ); remove_test_events( &EventSet2, mask2 ); if ( cmpinfo->attach_must_ptrace ) { if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } } if ( waitpid( pid, &status, 0 ) == -1 ) { perror( "waitpid()" ); exit( 1 ); } if ( WIFEXITED( status ) == 0 ) { test_fail( __FILE__, __LINE__, "Child process didn't return true to WIFEXITED", 0 ); } /* This code isn't necessary as we know the child has exited, it *may* return an error if the component so chooses. You should use read() instead. */ printf( "Test case: multiple 3rd party attach start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); sprintf( add_event_str, "(PID self) %-12s : \t", event_name ); printf( TAB1, add_event_str, values[0][1] ); sprintf( add_event_str, "(PID self) PAPI_TOT_CYC : \t" ); printf( TAB1, add_event_str, values[0][0] ); sprintf( add_event_str, "(PID %jd) %-12s : \t", ( intmax_t ) pid, event_name ); printf( TAB1, add_event_str, values[1][1] ); sprintf( add_event_str, "(PID %jd) PAPI_TOT_CYC : \t", ( intmax_t ) pid ); printf( TAB1, add_event_str, values[1][0] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification: none\n" ); test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/overflow_values.c0000600003276200002170000001171512247131121017623 0ustar ralphundrgrad/* * File: overflow_values.c * CVS: $Id$ * Author: Harald Servat * harald@cepba.upc.edu * Mods: * */ /* This file performs the following test: overflow values check The Eventset contains: + PAPI_TOT_INS (overflow monitor) + PAPI_TOT_CYC + PAPI_L1_DCM - Start eventset - Read and report event counts mod 1000 - report overflow event counts - visually inspect for consistency - Stop eventset */ #include "papi_test.h" #define OVRFLOW 5000000 #define LOWERFLOW (OVRFLOW - (OVRFLOW/100)) #define UPPERFLOW (OVRFLOW/100) #define ERRORFLOW (UPPERFLOW/5) static long long ovrflow = 0; void handler( int EventSet, void *address, long long overflow_vector, void *context ) { int ret; int i; long long vals[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; printf( "\nOverflow at %p! bit=0x%llx \n", address, overflow_vector ); ret = PAPI_read( EventSet, vals ); printf( "Overflow read vals :" ); for ( i = 0; i < 3 /* 8 */ ; i++ ) printf( "%lld ", vals[i] ); printf( "\n\n" ); ovrflow = vals[0]; } int main( int argc, char *argv[] ) { int EventSet = PAPI_NULL; int retval, i, dash = 0, evt3 = PAPI_L1_DCM; PAPI_option_t options; PAPI_option_t options2; const PAPI_hw_info_t *hwinfo; long long lwrflow = 0, error, max_error = 0; long long vals[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT && retval > 0 ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = PAPI_get_opt( PAPI_HWINFO, &options ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_opt", retval ); printf( "ovf_info = %d (%#x)\n", options.ovf_info.type, options.ovf_info.type ); retval = PAPI_get_opt( PAPI_SUBSTRATEINFO, &options2 ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_opt", retval ); printf( "sub_info->hardware_intr = %d\n\n", options2.sub_info->hardware_intr ); if ( ( hwinfo = PAPI_get_hardware_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", PAPI_EMISC ); printf( "Architecture %s, %d\n", hwinfo->model_string, hwinfo->model ); /* processing exceptions is a pain */ #if ((defined(linux) && (defined(__i386__) || (defined __x86_64__))) ) if ( !strncmp( hwinfo->model_string, "Intel Pentium 4", 15 ) ) { evt3 = PAPI_L2_TCM; } else if ( !strncmp( hwinfo->model_string, "AMD K7", 6 ) ) { /* do nothing */ } else if ( !strncmp( hwinfo->model_string, "AMD K8", 6 ) ) { /* do nothing */ } else if ( !strncmp( hwinfo->model_string, "Intel Core", 10 ) ) { evt3 = 0; } else evt3 = 0; /* for default PIII */ #endif retval = PAPI_create_eventset( &EventSet ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_add_event( EventSet, PAPI_TOT_INS ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_add_event:PAPI_TOT_INS", retval ); retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_add_event:PAPI_TOT_CYC", retval ); if ( evt3 ) { retval = PAPI_add_event( EventSet, evt3 ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_add_event:evt3", retval ); } retval = PAPI_overflow( EventSet, PAPI_TOT_INS, OVRFLOW, 0, handler ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval < 0 ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); for ( i = 0; i < 1000000; i++ ) { if ( i % 1000 == 0 ) { int i; PAPI_read( EventSet, vals ); if ( vals[0] % OVRFLOW > LOWERFLOW || vals[0] % OVRFLOW < UPPERFLOW ) { dash = 0; printf( "Main loop read vals :" ); for ( i = 0; i < 3 /* 8 */ ; i++ ) printf( "%lld ", vals[i] ); printf( "\n" ); if ( ovrflow ) { error = ovrflow - ( lwrflow + vals[0] ) / 2; printf( "Difference: %lld\n", error ); ovrflow = 0; if ( abs( error ) > max_error ) max_error = abs( error ); } lwrflow = vals[0]; } else if ( vals[0] % OVRFLOW > UPPERFLOW && !dash ) { dash = 1; printf( "---------------------\n" ); } } } retval = PAPI_stop( EventSet, vals ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); printf( "Verification:\n" ); printf ( "Maximum absolute difference between overflow value\nand adjacent measured values is: %lld\n", max_error ); if ( max_error >= ERRORFLOW ) { printf( "This exceeds the error limit: %d\n", ERRORFLOW ); test_fail( __FILE__, __LINE__, "Overflows", 1 ); } printf( "This is within the error limit: %d\n", ERRORFLOW ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/virttime.c0000600003276200002170000000374712247131121016252 0ustar ralphundrgrad#include "papi_test.h" int main( int argc, char **argv ) { int retval; long long elapsed_us, elapsed_cyc; const PAPI_hw_info_t *hw_info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); elapsed_us = PAPI_get_virt_usec( ); elapsed_cyc = PAPI_get_virt_cyc( ); printf( "Testing virt time clock. (CPU Max %d MHz, CPU Min %d MHz)\n", hw_info->cpu_max_mhz, hw_info->cpu_min_mhz ); printf( "Sleeping for 10 seconds.\n" ); sleep( 10 ); elapsed_us = PAPI_get_virt_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_virt_cyc( ) - elapsed_cyc; printf( "%lld us. %lld cyc.\n", elapsed_us, elapsed_cyc ); /* Elapsed microseconds and elapsed cycles are not as unambiguous as they appear. On Pentium III and 4, for example, cycles is a measured value, while useconds is computed from cycles and mhz. MHz is read from /proc/cpuinfo (on linux). Thus, any error in MHz is propagated to useconds. Conversely, on ultrasparc useconds are extracted from a system call (gethrtime()) and cycles are computed from useconds. Also, MHz comes from a scan of system info, Thus any error in gethrtime() propagates to both cycles and useconds, and cycles can be further impacted by errors in reported MHz. Without knowing the error bars on these system values, we can't really specify error ranges for our reported values, but we *DO* know that errors for at least one instance of Pentium 4 (torc17@utk) are on the order of one part per thousand. */ /* We'll accept 1.5 part per thousand error here (to allow Pentium 4 and Alpha to pass) */ if ( elapsed_us > 100000 ) test_fail( __FILE__, __LINE__, "Virt time greater than .1 seconds!", PAPI_EMISC ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_allcounters.c0000600003276200002170000001675212247131121020665 0ustar ralphundrgrad/* * File: overflow_allcounters.c * Author: Haihang You * you@cs.utk.edu * Mods: Vince Weaver * vweaver1@eecs.utk.edu * Mods: * */ /* This file performs the following test: overflow all counters to test availability of overflow of all counters - Start eventset 1 - Do flops - Stop and measure eventset 1 - Set up overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 */ #include "papi_test.h" #define OVER_FMT "handler(%d ) Overflow at %p! bit=0x%llx \n" #define OUT_FMT "%-12s : %16lld%16lld\n" static int total = 0; /* total overflows */ void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) context; if ( !TESTS_QUIET ) { printf( OVER_FMT, EventSet, address, overflow_vector ); } total++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long *values; int num_flops, retval, i, j; int *events, mythreshold; char **names; const PAPI_hw_info_t *hw_info = NULL; int num_events, *ovt; char name[PAPI_MAX_STR_LEN]; int using_perfmon = 0; int using_aix = 0; int cid; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", retval ); } cid = PAPI_get_component_index("perfmon"); if (cid>=0) using_perfmon = 1; cid = PAPI_get_component_index("aix"); if (cid>=0) using_aix = 1; /* add PAPI_TOT_CYC and one of the events in */ /* PAPI_FP_INS, PAPI_FP_OPS PAPI_TOT_INS, */ /* depending on the availability of the event*/ /* on the platform */ EventSet = enum_add_native_events( &num_events, &events, 1 , 1, 0); if (!TESTS_QUIET) printf("Trying %d events\n",num_events); names = ( char ** ) calloc( ( unsigned int ) num_events, sizeof ( char * ) ); for ( i = 0; i < num_events; i++ ) { if ( PAPI_event_code_to_name( events[i], name ) != PAPI_OK ) { test_fail( __FILE__, __LINE__,"PAPI_event_code_to_name", retval); } else { names[i] = strdup( name ); if (!TESTS_QUIET) printf("%i: %s\n",i,names[i]); } } values = ( long long * ) calloc( ( unsigned int ) ( num_events * ( num_events + 1 ) ), sizeof ( long long ) ); ovt = ( int * ) calloc( ( unsigned int ) num_events, sizeof ( int ) ); #if defined(linux) { char *tmp = getenv( "THRESHOLD" ); if ( tmp ) { mythreshold = atoi( tmp ); } else if (hw_info->cpu_max_mhz!=0) { mythreshold = ( int ) hw_info->cpu_max_mhz * 20000; if (!TESTS_QUIET) printf("Using a threshold of %d (20,000 * MHz)\n",mythreshold); } else { if (!TESTS_QUIET) printf("Using default threshold of %d\n",THRESHOLD); mythreshold = THRESHOLD; } } #else mythreshold = THRESHOLD; #endif num_flops = NUM_FLOPS * 2; /* initial test to make sure they all work */ if (!TESTS_QUIET) printf("Testing that the events all work with no overflow\n"); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( num_flops ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* done with initial test */ /* keep adding events? */ for ( i = 0; i < num_events; i++ ) { /* Enable overflow */ if (!TESTS_QUIET) printf("Testing with overflow set on %s\n", names[i]); retval = PAPI_overflow( EventSet, events[i], mythreshold, 0, handler ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( num_flops ); retval = PAPI_stop( EventSet, values + ( i + 1 ) * num_events ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* Disable overflow */ retval = PAPI_overflow( EventSet, events[i], 0, 0, handler ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); } ovt[i] = total; total = 0; } if ( !TESTS_QUIET ) { printf("\nResults in Matrix-view:\n"); printf( "Test Overflow on %d counters with %d events.\n", num_events,num_events ); printf( "-----------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", mythreshold ); printf( "Using %d iterations of c += a*b\n", num_flops ); printf( "-----------------------------------------------\n" ); printf( "Test type : " ); for ( i = 0; i < num_events + 1; i++ ) { printf( "%16d", i ); } printf( "\n" ); for ( j = 0; j < num_events; j++ ) { printf( "%-27s : ", names[j] ); for ( i = 0; i < num_events + 1; i++ ) { printf( "%16lld", *( values + j + num_events * i ) ); } printf( "\n" ); } printf( "Overflows : %16s", "" ); for ( i = 0; i < num_events; i++ ) { printf( "%16d", ovt[i] ); } printf( "\n" ); printf( "-----------------------------------------------\n" ); } /* validation */ if ( !TESTS_QUIET ) { printf("\nResults broken out for validation\n"); } if (!TESTS_QUIET) { for ( j = 0; j < num_events+1; j++ ) { if (j==0) { printf("Test results, no overflow:\n\t"); } else { printf("Overflow of event %d, %s\n\t",j-1,names[j-1]); } for(i=0; i < num_events; i++) { if (i==j-1) { printf("*%lld* ",values[(num_events*j)+i]); } else { printf("%lld ",values[(num_events*j)+i]); } } printf("\n"); if (j!=0) { printf("\tOverflow should be %lld / %d = %lld\n", values[(num_events*j)+(j-1)], mythreshold, values[(num_events*j)+(j-1)]/mythreshold); printf("\tOverflow was %d\n",ovt[j-1]); } } } for ( j = 0; j < num_events; j++ ) { //printf("Validation: %lld / %d != %d (%lld)\n", // *( values + j + num_events * (j+1) ) , // mythreshold, // ovt[j], // *(values+j+num_events*(j+1))/mythreshold); if ( *( values + j + num_events * ( j + 1 ) ) / mythreshold != ovt[j] ) { char error_string[BUFSIZ]; if ( using_perfmon ) test_warn( __FILE__, __LINE__, "perfmon component handles overflow differently than perf_events", 1 ); else if ( using_aix ) test_warn( __FILE__, __LINE__, "AIX (pmapi) component handles overflow differently than various other components", 1 ); else { sprintf( error_string, "Overflow value differs from expected %lld / %d != %d (%lld)", *( values + j + num_events * ( j + 1 ) ), mythreshold, ovt[j], *( values + j + num_events * ( j + 1 ) ) / mythreshold ); test_fail( __FILE__, __LINE__, error_string, 1 ); } } } retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); free( ovt ); for ( i = 0; i < num_events; i++ ) free( names[i] ); free( names ); free( events ); free( values ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/high-level.c0000600003276200002170000000700412247131121016421 0ustar ralphundrgrad/* These examples show the essentials in using the PAPI high-level interface. The program consists of 4 work-loops. The programmer intends to count the total events for loop 1, 2 and 4, but not include the number of events in loop 3. To accomplish this PAPI_read_counters is used as a counter reset function, while PAPI_accum_counters is used to sum the contributions of loops 2 and 4 into the total count. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { #define NUM_EVENTS 2 int retval; long long values[NUM_EVENTS], dummyvalues[NUM_EVENTS]; long long myvalues[NUM_EVENTS]; int Events[NUM_EVENTS]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* query and set up the right events to monitor */ if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { Events[0] = PAPI_FP_INS; } else { Events[0] = PAPI_TOT_INS; } Events[1] = PAPI_TOT_CYC; retval = PAPI_start_counters( ( int * ) Events, NUM_EVENTS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start_counters", retval ); /* Loop 1 */ do_flops( NUM_FLOPS ); retval = PAPI_read_counters( values, NUM_EVENTS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); if ( !TESTS_QUIET ) printf( TWO12, values[0], values[1], "(Counters continuing...)\n" ); myvalues[0] = values[0]; myvalues[1] = values[1]; /* Loop 2 */ do_flops( NUM_FLOPS ); retval = PAPI_accum_counters( values, NUM_EVENTS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_accum_counters", retval ); if ( !TESTS_QUIET ) printf( TWO12, values[0], values[1], "(Counters being ''held'')\n" ); /* Loop 3 */ /* Simulated code that should not be counted */ do_flops( NUM_FLOPS ); retval = PAPI_read_counters( dummyvalues, NUM_EVENTS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); if ( !TESTS_QUIET ) printf( TWO12, dummyvalues[0], dummyvalues[1], "(Skipped counts)\n" ); if ( !TESTS_QUIET ) printf( "%12s %12s (''Continuing'' counting)\n", "xxx", "xxx" ); /* Loop 4 */ do_flops( NUM_FLOPS ); retval = PAPI_accum_counters( values, NUM_EVENTS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_accum_counters", retval ); if ( !TESTS_QUIET ) printf( TWO12, values[0], values[1], "" ); if ( !TESTS_QUIET ) { printf( "----------------------------------\n" ); printf( "Verification: The last line in each experiment should be\n" ); printf( "approximately three times the value of the first line.\n" ); } { long long min, max; min = ( long long ) ( ( double ) myvalues[0] * .9 ); max = ( long long ) ( ( double ) myvalues[0] * 1.1 ); if ( values[0] < ( 3 * min ) || values[0] > ( 3 * max ) ) { retval = 1; if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_FP_INS", 1 ); } else { test_fail( __FILE__, __LINE__, "PAPI_TOT_INS", 1 ); } } min = ( long long ) ( ( double ) myvalues[1] * .9 ); max = ( long long ) ( ( double ) myvalues[1] * 1.1 ); if ( values[1] < ( 3 * min ) || values[1] > ( 3 * max ) ) { retval = 1; test_fail( __FILE__, __LINE__, "PAPI_TOT_CYC", 1 ); } } /* The values array is not allocated through allocate_test_space * so we need to pass NULL here */ test_pass( __FILE__, NULL, NUM_EVENTS ); exit( 1 ); } papi-5.3.0/src/ctests/high-level2.c0000600003276200002170000000663012247131121016507 0ustar ralphundrgrad/* This test checks that mixing PAPI_flips and the other high * level calls does the right thing. * Kevin */ #include "papi_test.h" extern int TESTS_QUIET; /*Declared in test_utils.c */ int main( int argc, char **argv ) { int retval; int Events, fip = 0; long long values, flpins; float real_time, proc_time, mflops; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { fip = 1; Events = PAPI_FP_INS; } else if ( PAPI_query_event( PAPI_FP_OPS ) == PAPI_OK ) { fip = 2; Events = PAPI_FP_OPS; } else { if ( !TESTS_QUIET ) printf ( "PAPI_FP_INS and PAPI_FP_OPS are not defined for this platform.\n" ); } if ( fip > 0 ) { if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } if ( ( retval = PAPI_start_counters( &Events, 1 ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start_counters", retval ); if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } if ( ( retval = PAPI_read_counters( &values, 1 ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); if ( ( retval = PAPI_stop_counters( &values, 1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop_counters", retval ); if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } if ( ( retval = PAPI_read_counters( &values, 1 ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); if ( ( retval = PAPI_stop_counters( &values, 1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop_counters", retval ); if ( ( retval = PAPI_start_counters( &Events, 1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start_counters", retval ); if ( ( retval = PAPI_read_counters( &values, 1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); if ( fip == 1 ) { if ( ( retval = PAPI_flips( &real_time, &proc_time, &flpins, &mflops ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flips", retval ); } else { if ( ( retval = PAPI_flops( &real_time, &proc_time, &flpins, &mflops ) ) == PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_flops", retval ); } if ( ( retval = PAPI_stop_counters( &values, 1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop_counters", retval ); } test_pass( __FILE__, NULL, 0 ); exit( 0 ); /* just to make the compiler happy... */ } papi-5.3.0/src/ctests/cycle_ratio.c0000600003276200002170000001320012247131121016665 0ustar ralphundrgrad/* This test exercises the PAPI_TOT_CYC and PAPI_REF_CYC counters. PAPI_TOT_CYC should measure the number of cycles required to do a fixed amount of work. It should be roughly constant for constant work, regardless of the speed state a core is in. PAPI_REF_CYC should measure the number of cycles at a constant reference clock rate, independent of the actual clock rate of the core. Thus if the core is running at nominal clock rate, PAPI_TOT_CYC and PAPI_REF_CYC should match and the ratio should be approximately 1. If the core is in an idle state (such as at startup), the ratio of TOT / REF should be less than 1. If the core is accelerated above nominal, such as TurboBoost when only one core is active, the ratio of TOT / REF should be greater than 1. This test measures the ratio first from a roughly idle state. It then does floating point intensive work to push this core into a fully active or accelerated state, and then it measures the ratio again. Using this technique allows you to measure the effective clock rate of the processor over a specific region of code, allowing you to infer the state of acceleration. */ #include "papi_test.h" static void work (int EventSet, int mhz); int main( int argc, char **argv ) { int retval; int EventSet = PAPI_NULL; int numflops = NUM_FLOPS; const PAPI_hw_info_t *hwinfo = NULL; long long elapsed_cyc; long long values[2]; int mhz; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = PAPI_query_named_event("PAPI_REF_CYC"); if (PAPI_OK!=retval) { test_skip( __FILE__, __LINE__, "PAPI_REF_CYC is not defined on this platform.", PAPI_OK ); } /* create an eventset */ retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); } /* add core cycle event */ retval = PAPI_add_named_event( EventSet, "PAPI_TOT_CYC"); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_add_named_event: PAPI_TOT_CYC", retval ); } /* add ref cycle event */ retval = PAPI_add_named_event( EventSet, "PAPI_REF_CYC"); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_add_events: PAPI_REF_CYC", retval ); } retval = papi_print_header ( "Test case CycleRatio.c: Compute the ratio of TOT and REF cycles.\n", &hwinfo ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); /* compute a nominal bus clock frequency */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } elapsed_cyc = PAPI_get_real_cyc( ); usleep(1000000); elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; mhz = elapsed_cyc / 1000000; retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } if ( values[1] == 0 ) { test_warn( __FILE__, __LINE__, "PAPI_REF_CYC = 0\nTry upgrading your kernel.", 0 ); } printf( "CPU Computed Megahertz : %d\n", mhz ); printf( "Measure TOT and REF cycles from a cold start\n" ); work(EventSet, mhz); do_flops(10*numflops); printf( "\nMeasure again after working for a while\n" ); work(EventSet, mhz); test_pass( __FILE__, NULL, 0 ); return ( 0 ); } static void work (int EventSet, int mhz) { int retval; long long values[2]; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; double cycles_error; float ratio, ratio1; int numflops = NUM_FLOPS; ratio = ratio1 = 0; /* Gather before stats */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Start PAPI */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* our test code */ do_flops( numflops ); /* Stop PAPI */ retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } ratio = (float)values[0]/(float)values[1]; /* Calculate total values */ elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; printf( "-------------------------------------------------------------------------\n" ); printf( "Using %d iterations of c += a*b\n", numflops ); printf( "-------------------------------------------------------------------------\n" ); printf( TAB1, "PAPI_TOT_CYC : \t", values[0] ); printf( TAB1, "PAPI_REF_CYC : \t", values[1] ); printf( "%-12s %12f\n", "Cycle Ratio : \t", ratio ); printf( "%-12s %12d\n", "Effective MHz : \t", (int)(ratio * mhz) ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification: PAPI_REF_CYC should be roughly equal to real_cycles\n" ); cycles_error=100.0*((double)values[1] - (double)elapsed_cyc)/(double)elapsed_cyc; if ((cycles_error>10.0) || (cycles_error<-10.0)) { printf("Error of %.2f%%\n",cycles_error); test_warn( __FILE__, __LINE__, "validation", 0 ); } } papi-5.3.0/src/ctests/byte_profile.c0000600003276200002170000001355012247131121017063 0ustar ralphundrgrad/* * File: byte_profile.c * CVS: $Id$ * Author: Dan Terpstra * terpstra@cs.utk.edu * Mods: Maynard Johnson * maynardj@us.ibm.com * Mods: * */ /* This file profiles multiple events with byte level address resolution. It's patterned after code suggested by John Mellor-Crummey, Rob Fowler, and Nathan Tallent. It is intended to illustrate the use of Multiprofiling on a very tight block of code at byte level resolution of the instruction addresses. */ #include "papi_test.h" #include "prof_utils.h" #define PROFILE_ALL static const PAPI_hw_info_t *hw_info; static int num_events = 0; #define N (1 << 23) #define T (10) double aa[N], bb[N]; double s = 0, s2 = 0; static void cleara( double a[N] ) { int i; for ( i = 0; i < N; i++ ) { a[i] = 0; } } static int my_dummy( int i ) { return ( i + 1 ); } static void my_main( ) { int i, j; for ( j = 0; j < T; j++ ) { for ( i = 0; i < N; i++ ) { bb[i] = 0; } cleara( aa ); memset( aa, 0, sizeof ( aa ) ); for ( i = 0; i < N; i++ ) { s += aa[i] * bb[i]; s2 += aa[i] * aa[i] + bb[i] * bb[i]; } } } static int do_profile( caddr_t start, unsigned long plength, unsigned scale, int thresh, int bucket, unsigned int mask ) { int i, retval; unsigned long blength; int num_buckets,j=0; int num_bufs = num_events; int event = num_events; int events[MAX_TEST_EVENTS]; char header[BUFSIZ]; strncpy(header,"address\t\t",BUFSIZ); //= "address\t\t\tcyc\tins\tfp_ins\n"; for(i=0;imodel_string, "POWER6" ) != 0 ) { printf( TAB1, "PAPI_TOT_INS:", ( values[0] )[--event] ); } #if defined(__powerpc__) printf( TAB1, "PAPI_FP_INS", ( values[0] )[--event] ); #else if ( strcmp( hw_info->model_string, "Intel Pentium III" ) != 0 ) { printf( TAB1, "PAPI_FP_OPS:", ( values[0] )[--event] ); printf( TAB1, "PAPI_L2_TCM:", ( values[0] )[--event] ); } #endif } for ( i = 0; i < num_events; i++ ) { if ( ( retval = PAPI_profil( profbuf[i], ( unsigned int ) blength, start, scale, EventSet, events[i], 0, PAPI_PROFIL_POSIX ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } prof_head( blength, bucket, num_buckets, header ); prof_out( start, num_events, bucket, num_buckets, scale ); retval = prof_check( num_bufs, bucket, num_buckets ); for ( i = 0; i < num_bufs; i++ ) { free( profbuf[i] ); } return retval; } int main( int argc, char **argv ) { long length; int mask; int retval; const PAPI_exe_info_t *prginfo; caddr_t start, end; prof_init( argc, argv, &prginfo ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); mask = MASK_TOT_CYC | MASK_TOT_INS | MASK_FP_OPS | MASK_L2_TCM; #if defined(__powerpc__) if ( strcmp( hw_info->model_string, "POWER6" ) == 0 ) mask = MASK_TOT_CYC | MASK_FP_INS; else mask = MASK_TOT_CYC | MASK_TOT_INS | MASK_FP_INS; #endif #if defined(ITANIUM2) mask = MASK_TOT_CYC | MASK_FP_OPS | MASK_L2_TCM | MASK_L1_DCM; #endif EventSet = add_test_events( &num_events, &mask, 0 ); values = allocate_test_space( 1, num_events ); /* profile the cleara and my_main address space */ start = ( caddr_t ) cleara; end = ( caddr_t ) my_dummy; /* Itanium and PowerPC64 processors return function descriptors instead * of function addresses. You must dereference the descriptor to get the address. */ #if defined(ITANIUM1) || defined(ITANIUM2) || defined(__powerpc64__) start = ( caddr_t ) ( ( ( struct fdesc * ) start )->ip ); end = ( caddr_t ) ( ( ( struct fdesc * ) end )->ip ); #endif /* call dummy so it doesn't get optimized away */ retval = my_dummy( 1 ); length = end - start; if ( length < 0 ) test_fail( __FILE__, __LINE__, "Profile length < 0!", ( int ) length ); prof_print_address ( "Test case byte_profile: Multi-event profiling at byte resolution.\n", prginfo ); prof_print_prof_info( start, end, THRESHOLD, event_name ); retval = do_profile( start, ( unsigned ) length, FULL_SCALE * 2, THRESHOLD, PAPI_PROFIL_BUCKET_32, mask ); remove_test_events( &EventSet, mask ); if ( retval ) test_pass( __FILE__, values, 1 ); else test_fail( __FILE__, __LINE__, "No information in buffers", 1 ); return 1; } papi-5.3.0/src/ctests/zero_fork.c0000600003276200002170000000721412247131121016400 0ustar ralphundrgrad/* * File: zero_fork.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: PAPI_library_init() Add two events PAPI_start() fork() / \ parent child | PAPI_library_init() | Add two events | PAPI_start() | PAPI_stop() | fork()-----\ | child parent PAPI_library_init() | Add two events | PAPI_start() | PAPI_stop() | wait() wait() | PAPI_stop() No validation is done */ #include "papi_test.h" #include int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1 = 2; long long elapsed_us, elapsed_cyc; long long **values; char event_name[PAPI_MAX_STR_LEN]; int retval, num_tests = 1; void process_init( void ) { printf( "Process %d \n", ( int ) getpid( ) ); /* Initialize PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); values = allocate_test_space( num_tests, num_events1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } } void process_fini( void ) { retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); printf( "Process %d %-12s : \t%lld\n", ( int ) getpid( ), event_name, values[0][1] ); printf( "Process %d PAPI_TOT_CYC : \t%lld\n", ( int ) getpid( ), values[0][0] ); printf( "Process %d Real usec : \t%lld\n", ( int ) getpid( ), elapsed_us ); printf( "Process %d Real cycles : \t%lld\n", ( int ) getpid( ), elapsed_cyc ); free_test_space( values, num_tests ); } int main( int argc, char **argv ) { int flops1; int retval; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ # if (defined(__ALPHA) && defined(__osf__)) test_skip( __FILE__, __LINE__, "main: fork not supported.", 0 ); #endif printf( "This tests if PAPI_library_init(),2*fork(),PAPI_library_init() works.\n" ); /* Initialize PAPI for this process */ process_init( ); flops1 = 1000000; if ( fork( ) == 0 ) { /* Initialize PAPI for the child process */ process_init( ); /* Let the child process do work */ do_flops( flops1 ); /* Measure the child process */ process_fini( ); exit( 0 ); } flops1 = 2000000; if ( fork( ) == 0 ) { /* Initialize PAPI for the child process */ process_init( ); /* Let the child process do work */ do_flops( flops1 ); /* Measure the child process */ process_fini( ); exit( 0 ); } /* Let this process do work */ flops1 = 4000000; do_flops( flops1 ); /* Wait for child to finish */ wait( &retval ); /* Wait for child to finish */ wait( &retval ); /* Measure this process */ process_fini( ); test_pass( __FILE__, NULL, 0 ); return 0; } papi-5.3.0/src/ctests/thrspecific.c0000600003276200002170000000727512247131121016712 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for 2 slave pthreads */ #include #include "papi_test.h" static int processing = 1; void * Thread( void *arg ) { int retval; void *arg2; retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); printf( "Thread %#x started, specific data is at %p\n", ( int ) pthread_self( ), arg ); retval = PAPI_set_thr_specific( PAPI_USR1_TLS, arg ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_thr_specific", retval ); retval = PAPI_get_thr_specific( PAPI_USR1_TLS, &arg2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_thr_specific", retval ); if ( arg != arg2 ) test_fail( __FILE__, __LINE__, "set vs get specific", 0 ); while ( processing ) { if ( *( ( int * ) arg ) == 500000 ) { sleep( 1 ); int i; PAPI_all_thr_spec_t data; data.num = 10; data.id = ( unsigned long * ) malloc( ( size_t ) data.num * sizeof ( unsigned long ) ); data.data = ( void ** ) malloc( ( size_t ) data.num * sizeof ( void * ) ); retval = PAPI_get_thr_specific( PAPI_USR1_TLS | PAPI_TLS_ALL_THREADS, ( void ** ) &data ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_thr_specific", retval ); if ( data.num != 5 ) test_fail( __FILE__, __LINE__, "data.num != 5", 0 ); for ( i = 0; i < data.num; i++ ) printf( "Entry %d, Thread 0x%lx, Data Pointer %p, Value %d\n", i, data.id[i], data.data[i], *( int * ) data.data[i] ); processing = 0; } } retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( NULL ); } int main( int argc, char **argv ) { pthread_t e_th, f_th, g_th, h_th; int flops1, flops2, flops3, flops4, flops5; int retval, rc; pthread_attr_t attr; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif flops1 = 1000000; rc = pthread_create( &e_th, &attr, Thread, ( void * ) &flops1 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops2 = 2000000; rc = pthread_create( &f_th, &attr, Thread, ( void * ) &flops2 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops3 = 4000000; rc = pthread_create( &g_th, &attr, Thread, ( void * ) &flops3 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } flops4 = 8000000; rc = pthread_create( &h_th, &attr, Thread, ( void * ) &flops4 ); if ( rc ) { retval = PAPI_ESYS; test_fail( __FILE__, __LINE__, "pthread_create", retval ); } pthread_attr_destroy( &attr ); flops5 = 500000; Thread( &flops5 ); pthread_join( h_th, NULL ); pthread_join( g_th, NULL ); pthread_join( f_th, NULL ); pthread_join( e_th, NULL ); test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); exit( 1 ); } papi-5.3.0/src/ctests/attach2.c0000600003276200002170000001520312247131121015723 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for attached processes. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #include #include #ifdef _AIX #define _LINUX_SOURCE_COMPAT #endif #if defined(__FreeBSD__) # define PTRACE_ATTACH PT_ATTACH # define PTRACE_CONT PT_CONTINUE #endif int wait_for_attach_and_loop( void ) { char *path; char newpath[PATH_MAX]; path = getenv("PATH"); sprintf(newpath, "PATH=./:%s", (path)?path:'\0' ); putenv(newpath); if (ptrace(PTRACE_TRACEME, 0, 0, 0) == 0) { execlp("attach_target","attach_target","100000000",NULL); perror("execl(attach_target) failed"); } perror("PTRACE_TRACEME"); return ( 1 ); } int main( int argc, char **argv ) { int status, retval, num_tests = 1, tmp; int EventSet1 = PAPI_NULL; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN];; const PAPI_hw_info_t *hw_info; const PAPI_component_info_t *cmpinfo; pid_t pid; /* Fork before doing anything with the PMU */ setbuf(stdout,NULL); pid = fork( ); if ( pid < 0 ) test_fail( __FILE__, __LINE__, "fork()", PAPI_ESYS ); if ( pid == 0 ) exit( wait_for_attach_and_loop( ) ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ /* Master only process below here */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) test_fail_exit( __FILE__, __LINE__, "PAPI_get_component_info", 0 ); if ( cmpinfo->attach == 0 ) test_skip( __FILE__, __LINE__, "Platform does not support attaching", 0 ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail_exit( __FILE__, __LINE__, "PAPI_get_hardware_info", 0 ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ retval = PAPI_create_eventset(&EventSet1); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* Here we are testing that this does not cause a fail */ retval = PAPI_assign_eventset_component( EventSet1, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); retval = PAPI_attach( EventSet1, ( unsigned long ) pid ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_attach", retval ); sprintf(event_name,"PAPI_TOT_CYC"); retval = PAPI_add_event(EventSet1, PAPI_TOT_CYC); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_add_event(EventSet1, PAPI_FP_INS); if ( retval == PAPI_ENOEVNT ) { test_warn( __FILE__, __LINE__, "PAPI_FP_INS", retval); } else if ( retval != PAPI_OK ) { test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); } values = allocate_test_space( 1, 2); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); printf("must_ptrace is %d\n",cmpinfo->attach_must_ptrace); pid_t child = wait( &status ); printf( "Debugger exited wait() with %d\n",child ); if (WIFSTOPPED( status )) { printf( "Child has stopped due to signal %d (%s)\n", WSTOPSIG( status ), strsignal(WSTOPSIG( status )) ); } if (WIFSIGNALED( status )) { printf( "Child %ld received signal %d (%s)\n", (long)child, WTERMSIG(status) , strsignal(WTERMSIG( status )) ); } printf("After %d\n",retval); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_start", retval ); printf("Continuing\n"); if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } do { child = wait( &status ); printf( "Debugger exited wait() with %d\n", child); if (WIFSTOPPED( status )) { printf( "Child has stopped due to signal %d (%s)\n", WSTOPSIG( status ), strsignal(WSTOPSIG( status )) ); } if (WIFSIGNALED( status )) { printf( "Child %ld received signal %d (%s)\n", (long)child, WTERMSIG(status) , strsignal(WTERMSIG( status )) ); } } while (!WIFEXITED( status )); printf("Child exited with value %d\n",WEXITSTATUS(status)); if (WEXITSTATUS(status) != 0) test_fail_exit( __FILE__, __LINE__, "Exit status of child to attach to", PAPI_EMISC); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_cleanup_eventset(EventSet1); if (retval != PAPI_OK) test_fail_exit( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset(&EventSet1); if (retval != PAPI_OK) test_fail_exit( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); printf( "Test case: 3rd party attach start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\n" ); printf( TAB1, "PAPI_TOT_CYC : \t", ( values[0] )[0] ); printf( TAB1, "PAPI_FP_INS : \t", ( values[0] )[1] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification: none\n" ); test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/dmem_info.c0000600003276200002170000000467512247131121016345 0ustar ralphundrgrad/* * This file perfoms the following test: dynamic memory info * The pages used should increase steadily. * * Author: Kevin London * london@cs.utk.edu */ #include "papi_test.h" #define ALLOCMEM 200000 static void dump_memory_info( FILE * output, PAPI_dmem_info_t * d ) { fprintf( output, "\n--------\n" ); fprintf( output, "Mem Size:\t\t%lld\n", d->size ); fprintf( output, "Mem Peak Size:\t\t%lld\n", d->peak ); fprintf( output, "Mem Resident:\t\t%lld\n", d->resident ); fprintf( output, "Mem High Water Mark:\t%lld\n", d->high_water_mark ); fprintf( output, "Mem Shared:\t\t%lld\n", d->shared ); fprintf( output, "Mem Text:\t\t%lld\n", d->text ); fprintf( output, "Mem Library:\t\t%lld\n", d->library ); fprintf( output, "Mem Heap:\t\t%lld\n", d->heap ); fprintf( output, "Mem Locked:\t\t%lld\n", d->locked ); fprintf( output, "Mem Stack:\t\t%lld\n", d->stack ); fprintf( output, "Mem Pagesize:\t\t%lld\n", d->pagesize ); fprintf( output, "Mem Page Table Entries:\t\t%lld\n", d->pte ); fprintf( output, "--------\n\n" ); } int main( int argc, char **argv ) { PAPI_dmem_info_t dmem; long long value[7]; int retval, i = 0, j = 0; double *m[7]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); for ( i = 0; i < 7; i++ ) { retval = PAPI_get_dmem_info( &dmem ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_dmem_info", retval ); /* dump_memory_info(stdout,&dmem); */ value[i] = dmem.size; m[i] = ( double * ) malloc( ALLOCMEM * sizeof ( double ) ); touch_dummy( m[j], ALLOCMEM ); } if ( !TESTS_QUIET ) { printf( "Test case: Dynamic Memory Information.\n" ); dump_memory_info( stdout, &dmem ); printf ( "------------------------------------------------------------------------\n" ); for ( i = 0; i < 7; i++ ) printf( "Malloc additional: %d KB Memory Size in KB: %d\n", ( int ) ( ( sizeof ( double ) * ALLOCMEM ) / 1024 ), ( int ) value[i] ); printf ( "------------------------------------------------------------------------\n" ); } if ( value[6] >= value[5] && value[5] >= value[4] && value[4] >= value[3] && value[3] >= value[2] && value[2] >= value[1] && value[1] >= value[0] ) test_pass( __FILE__, NULL, 0 ); else test_fail( __FILE__, __LINE__, "Calculating Resident Memory", ( int ) value[6] ); exit( 1 ); } papi-5.3.0/src/ctests/johnmay2.c0000600003276200002170000000676112247131121016135 0ustar ralphundrgrad#include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int FPEventSet = PAPI_NULL; long long values; int PAPI_event, retval; char event_name[PAPI_MAX_STR_LEN]; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* init PAPI */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* Use PAPI_FP_INS if available, otherwise use PAPI_TOT_INS */ if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) PAPI_event = PAPI_FP_INS; else PAPI_event = PAPI_TOT_INS; if ( ( retval = PAPI_query_event( PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_query_event", retval ); /* Create the eventset */ if ( ( retval = PAPI_create_eventset( &FPEventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* Add event to the eventset */ if ( ( retval = PAPI_add_event( FPEventSet, PAPI_event ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); /* Start counting */ if ( ( retval = PAPI_start( FPEventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); /* Try to cleanup while running */ /* Fail test if this isn't refused */ if ( ( retval = PAPI_cleanup_eventset( FPEventSet ) ) != PAPI_EISRUN ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); /* Try to destroy eventset while running */ /* Fail test if this isn't refused */ if ( ( retval = PAPI_destroy_eventset( &FPEventSet ) ) != PAPI_EISRUN ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); /* do some work */ do_flops( 1000000 ); /* stop counting */ if ( ( retval = PAPI_stop( FPEventSet, &values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* Try to destroy eventset without cleaning first */ /* Fail test if this isn't refused */ if ( ( retval = PAPI_destroy_eventset( &FPEventSet ) ) != PAPI_EINVAL ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); /* Try to cleanup eventset. */ /* This should pass. */ if ( ( retval = PAPI_cleanup_eventset( FPEventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); /* Try to destroy eventset. */ /* This should pass. */ if ( ( retval = PAPI_destroy_eventset( &FPEventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); /* Make sure eventset was set to PAPI_NULL */ if ( FPEventSet != PAPI_NULL ) test_fail( __FILE__, __LINE__, "FPEventSet != PAPI_NULL", retval ); if ( !TESTS_QUIET ) { if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf( "Test case John May 2: cleanup / destroy eventset.\n" ); printf( "-------------------------------------------------\n" ); printf( "Test run : \t1\n" ); printf( "%s : \t", event_name ); printf( LLDFMT, values ); printf( "\n" ); printf( "-------------------------------------------------\n" ); printf( "The following messages will appear if PAPI is compiled with debug enabled:\n" ); printf ( "\tPAPI Error Code -10: PAPI_EISRUN: EventSet is currently counting\n" ); printf ( "\tPAPI Error Code -10: PAPI_EISRUN: EventSet is currently counting\n" ); printf( "\tPAPI Error Code -1: PAPI_EINVAL: Invalid argument\n" ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/mpifirst.c0000600003276200002170000001236612247131121016241 0ustar ralphundrgrad/* This file performs the following test: start, read, stop and again functionality - It attempts to use the following three counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS or PAPI_TOT_INS if PAPI_FP_INS doesn't exist + PAPI_TOT_CYC - Start counters - Do flops - Read counters - Reset counters - Do flops - Read counters - Do flops - Read counters - Do flops - Stop and read counters - Read counters */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 5, num_events, tmp; long long **values; int EventSet = PAPI_NULL; int PAPI_event, mask; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ MPI_Init( argc, argv ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* query and set up the right instruction to monitor */ if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { PAPI_event = PAPI_FP_INS; mask = MASK_FP_INS | MASK_TOT_CYC; } else { PAPI_event = PAPI_TOT_INS; mask = MASK_TOT_INS | MASK_TOT_CYC; } retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); EventSet = add_test_events( &num_events, &mask ); values = allocate_test_space( num_tests, num_events ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_read( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); do_flops( NUM_FLOPS ); retval = PAPI_read( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); do_flops( NUM_FLOPS ); retval = PAPI_read( EventSet, values[2] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[3] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_read( EventSet, values[4] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); remove_test_events( &EventSet, mask ); if ( !TESTS_QUIET ) { printf( "Test case 1: Non-overlapping start, stop, read.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t1\t\t2\t\t3\t\t4\t\t5\n" ); sprintf( add_event_str, "%s : ", event_name ); printf( TAB5, add_event_str, ( values[0] )[0], ( values[1] )[0], ( values[2] )[0], ( values[3] )[0], ( values[4] )[0] ); printf( TAB5, "PAPI_TOT_CYC: ", ( values[0] )[1], ( values[1] )[1], ( values[2] )[1], ( values[3] )[1], ( values[4] )[1] ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Column 1 approximately equals column 2\n" ); printf( "Column 3 approximately equals 2 * column 2\n" ); printf( "Column 4 approximately equals 3 * column 2\n" ); printf( "Column 4 exactly equals column 5\n" ); } { long long min, max; min = ( long long ) ( values[1][0] * .9 ); max = ( long long ) ( values[1][0] * 1.1 ); if ( values[0][0] > max || values[0][0] < min || values[2][0] > ( 2 * max ) || values[2][0] < ( 2 * min ) || values[3][0] > ( 3 * max ) || values[3][0] < ( 3 * min ) || values[3][0] != values[4][0] ) { printf( "min: " ); printf( LLDFMT, min ); printf( "max: " ); printf( LLDFMT, max ); printf( "1st: " ); printf( LLDFMT, values[0][0] ); printf( "2nd: " ); printf( LLDFMT, values[1][0] ); printf( "3rd: " ); printf( LLDFMT, values[2][0] ); printf( "4th: " ); printf( LLDFMT, values[3][0] ); printf( "5th: " ); printf( LLDFMT, values[4][0] ); printf( "\n" ); test_fail( __FILE__, __LINE__, event_name, 1 ); } min = ( long long ) ( values[1][1] * .9 ); max = ( long long ) ( values[1][1] * 1.1 ); if ( values[0][1] > max || values[0][1] < min || values[2][1] > ( 2 * max ) || values[2][1] < ( 2 * min ) || values[3][1] > ( 3 * max ) || values[3][1] < ( 3 * min ) || values[3][1] != values[4][1] ) { test_fail( __FILE__, __LINE__, "PAPI_TOT_CYC", 1 ); } } test_pass( __FILE__, values, num_tests ); MPI_Finalize( ); exit( 1 ); } papi-5.3.0/src/ctests/p4_lst_ins.c0000600003276200002170000001705212247131121016457 0ustar ralphundrgrad/* This code demonstrates the behavior of PAPI_LD_INS, PAPI_SR_INS and PAPI_LST_INS on a Pentium 4 processor. Because of the way these events are implemented in hardware, LD and SR cannot be counted in the presence of either of the other two events. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 6, tmp; long long **values; int EventSet = PAPI_NULL; const PAPI_hw_info_t *hw_info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( hw_info->vendor == PAPI_VENDOR_INTEL ) { /* Check for Pentium4 */ if ( hw_info->cpuid_family != 15 ) { test_skip( __FILE__, __LINE__, "This test is intended only for Pentium 4.", 1 ); } } else { test_skip( __FILE__, __LINE__, "This test is intended only for Pentium 4.", 1 ); } retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); values = allocate_test_space( num_tests, 2 ); /* First test: just PAPI_LD_INS */ retval = PAPI_add_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_LD_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_remove_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_LD_INS", retval ); /* Second test: just PAPI_SR_INS */ retval = PAPI_add_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_SR_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_remove_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_SR_INS", retval ); /* Third test: just PAPI_LST_INS */ retval = PAPI_add_event( EventSet, PAPI_LST_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_LST_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[2] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* Fourth test: PAPI_LST_INS and PAPI_LD_INS */ retval = PAPI_add_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_LD_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[3] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_remove_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_LD_INS", retval ); /* Fifth test: PAPI_LST_INS and PAPI_SR_INS */ retval = PAPI_add_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_SR_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[4] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_remove_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_SR_INS", retval ); retval = PAPI_remove_event( EventSet, PAPI_LST_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_LST_INS", retval ); /* Sixth test: PAPI_LD_INS and PAPI_SR_INS */ retval = PAPI_add_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_LD_INS", retval ); retval = PAPI_add_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event: PAPI_SR_INS", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS / 10 ); retval = PAPI_stop( EventSet, values[5] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_remove_event( EventSet, PAPI_LD_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_LD_INS", retval ); retval = PAPI_remove_event( EventSet, PAPI_SR_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event: PAPI_SR_INS", retval ); if ( !TESTS_QUIET ) { printf( "Pentium 4 Load / Store tests.\n" ); printf ( "These PAPI events are counted by setting a tag at the front of the pipeline,\n" ); printf ( "and counting tags at the back of the pipeline. All the tags are the same 'color'\n" ); printf ( "and can't be distinguished from each other. Therefore, PAPI_LD_INS and PAPI_SR_INS\n" ); printf ( "cannot be counted with the other two events, or the answer will always == PAPI_LST_INS.\n" ); printf ( "-------------------------------------------------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS / 10 ); printf ( "-------------------------------------------------------------------------------------------\n" ); printf ( "Test: 1 2 3 4 5 6\n" ); printf( "%s %12lld %12s %12s %12lld %12s %12lld\n", "PAPI_LD_INS: ", ( values[0] )[0], "------", "------", ( values[3] )[1], "------", ( values[5] )[0] ); printf( "%s %12s %12lld %12s %12s %12lld %12lld\n", "PAPI_SR_INS: ", "------", ( values[1] )[0], "------", "------", ( values[4] )[1], ( values[5] )[1] ); printf( "%s %12s %12s %12lld %12lld %12lld %12s\n", "PAPI_LST_INS:", "------", "------", ( values[2] )[0], ( values[3] )[0], ( values[4] )[0], "------" ); printf ( "-------------------------------------------------------------------------------------------\n" ); printf( "Test 1: PAPI_LD_INS only.\n" ); printf( "Test 2: PAPI_SR_INS only.\n" ); printf( "Test 3: PAPI_LST_INS only.\n" ); printf( "Test 4: PAPI_LD_INS and PAPI_LST_INS.\n" ); printf( "Test 5: PAPI_SR_INS and PAPI_LST_INS.\n" ); printf( "Test 6: PAPI_LD_INS and PAPI_SR_INS.\n" ); printf ( "Verification: Values within each column should be the same.\n" ); printf( " R3C3 ~= (R1C1 + R2C2) ~= all other entries.\n" ); } test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/nineth.c0000600003276200002170000000765512247131121015676 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for derived events NOTE: This test becomes useless when rate events like PAPI_FLOPS are removed. - It tests the derived metric FLOPS using the following two counters. They are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval, num_tests = 2, tmp; int EventSet1 = PAPI_NULL; int EventSet2 = PAPI_NULL; int mask1 = 0x80001; /* FP_OPS and TOT_CYC */ int mask2 = 0x8; /* FLOPS */ int num_events1; int num_events2; long long **values; int clockrate; double test_flops; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* gotta count flops to run this test */ if ( ( retval = PAPI_query_event( PAPI_FP_OPS ) ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", retval ); EventSet1 = add_test_events( &num_events1, &mask1 ); /* EventSet2 = add_test_events(&num_events2, &mask2); */ if ( num_events1 == 0 || num_events2 == 0 ) test_skip( __FILE__, __LINE__, "add_test_events", PAPI_ENOEVNT ); /* num_events1 is greater than num_events2 so don't worry. */ values = allocate_test_space( num_tests, num_events1 ); clockrate = PAPI_get_opt( PAPI_CLOCKRATE, NULL ); if ( clockrate < 1 ) test_fail( __FILE__, __LINE__, "PAPI_get_opt", retval ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* retval = PAPI_start(EventSet2); if (retval != PAPI_OK) test_fail(__FILE__, __LINE__, "PAPI_start", retval); do_flops(NUM_FLOPS); retval = PAPI_stop(EventSet2, values[1]); if (retval != PAPI_OK) test_fail(__FILE__, __LINE__, "PAPI_stop", retval); */ remove_test_events( &EventSet1, mask1 ); /* remove_test_events(&EventSet2, mask2); */ test_flops = ( double ) ( values[0] )[0] * ( double ) clockrate *( double ) 1000000.0; test_flops = test_flops / ( double ) ( values[0] )[1]; if ( !TESTS_QUIET ) { printf( "Test case 9: start, stop for derived event PAPI_FLOPS.\n" ); printf( "------------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Test type : %12s%12s\n", "1", "2" ); printf( TAB2, "PAPI_FP_OPS : ", ( values[0] )[0], ( long long ) 0 ); printf( TAB2, "PAPI_TOT_CYC: ", ( values[0] )[1], ( long long ) 0 ); printf( TAB2, "PAPI_FLOPS : ", ( long long ) 0, ( values[1] )[0] ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Last number in row 3 approximately equals %f\n", test_flops ); printf( "This test is no longer valid: PAPI_FLOPS is deprecated.\n" ); } /* { double min, max; min = values[1][0] * .9; max = values[1][0] * 1.1; if (test_flops > max || test_flops < min) test_fail(__FILE__, __LINE__, "PAPI_FLOPS", 1); } */ test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/second.c0000600003276200002170000004424112247131121015654 0ustar ralphundrgrad/* This file performs the following test: counter domain testing - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. + PAPI_TOT_INS + PAPI_TOT_CYC - Start system domain counters - Do flops - Stop and read system domain counters - Start kernel domain counters - Do flops - Stop and read kernel domain counters - Start user domain counters - Do flops - Stop and read user domain counters */ #include "papi_test.h" #define TAB_DOM "%s%12lld%15lld%17lld\n" #define CASE2 0 #define CREATE 1 #define ADD 2 #define MIDDLE 3 #define CHANGE 4 #define SUPERVISOR 5 void dump_and_verify( int test_case, long long **values ) { long long min, max, min2, max2; printf ( "-----------------------------------------------------------------\n" ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------\n" ); if ( test_case == CASE2 ) { printf ( "Test type : Before Create Before Add Between Adds\n" ); printf( TAB_DOM, "PAPI_TOT_INS: ", ( values[0] )[0], ( values[1] )[0], ( values[2] )[0] ); printf( TAB_DOM, "PAPI_TOT_CYC: ", ( values[0] )[1], ( values[1] )[1], ( values[2] )[1] ); printf ( "-------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Both rows equal 'n N N' where n << N\n" ); return; } else if ( test_case == CHANGE ) { min = ( long long ) ( ( double ) values[0][0] * ( 1 - TOLERANCE ) ); max = ( long long ) ( ( double ) values[0][0] * ( 1 + TOLERANCE ) ); if ( values[1][0] > max || values[1][0] < min ) test_fail( __FILE__, __LINE__, "PAPI_TOT_INS", 1 ); min = ( long long ) ( ( double ) values[1][1] * ( 1 - TOLERANCE ) ); max = ( long long ) ( ( double ) values[1][1] * ( 1 + TOLERANCE ) ); if ( ( values[2][1] + values[0][1] ) > max || ( values[2][1] + values[0][1] ) < min ) test_fail( __FILE__, __LINE__, "PAPI_TOT_CYC", 1 ); printf ( "Test type : PAPI_DOM_ALL PAPI_DOM_KERNEL PAPI_DOM_USER\n" ); printf( TAB_DOM, "PAPI_TOT_INS: ", ( values[1] )[0], ( values[2] )[0], ( values[0] )[0] ); printf( TAB_DOM, "PAPI_TOT_CYC: ", ( values[1] )[1], ( values[2] )[1], ( values[0] )[1] ); printf ( "-------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Both rows approximately equal '(N+n) n N', where n << N\n" ); printf( "Column 1 approximately equals column 2 plus column 3\n" ); } else if ( test_case == SUPERVISOR ) { printf ( "Test type : PAPI_DOM_ALL All-minus-supervisor Supervisor-only\n" ); printf( TAB_DOM, "PAPI_TOT_INS: ", ( values[0] )[0], ( values[1] )[0], ( values[2] )[0] ); printf( TAB_DOM, "PAPI_TOT_CYC: ", ( values[0] )[1], ( values[1] )[1], ( values[2] )[1] ); printf ( "-------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Both rows approximately equal '(N+n) n N', where n << N\n" ); printf( "Column 1 approximately equals column 2 plus column 3\n" ); } else { min = ( long long ) ( ( double ) values[2][0] * ( 1 - TOLERANCE ) ); max = ( long long ) ( ( double ) values[2][0] * ( 1 + TOLERANCE ) ); min2 = ( long long ) ( ( double ) values[0][1] * ( 1 - TOLERANCE ) ); max2 = ( long long ) ( ( double ) ( double ) values[0][1] * ( 1 + TOLERANCE ) ); printf ( "Test type : PAPI_DOM_ALL PAPI_DOM_KERNEL PAPI_DOM_USER\n" ); printf( TAB_DOM, "PAPI_TOT_INS: ", ( values[0] )[0], ( values[1] )[0], ( values[2] )[0] ); printf( TAB_DOM, "PAPI_TOT_CYC: ", ( values[0] )[1], ( values[1] )[1], ( values[2] )[1] ); printf ( "-------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Both rows approximately equal '(N+n) n N', where n << N\n" ); printf( "Column 1 approximately equals column 2 plus column 3\n" ); if ( values[0][0] > max || values[0][0] < min ) test_fail( __FILE__, __LINE__, "PAPI_TOT_INS", 1 ); if ( ( values[1][1] + values[2][1] ) > max2 || ( values[1][1] + values[2][1] ) < min2 ) test_fail( __FILE__, __LINE__, "PAPI_TOT_CYC", 1 ); } if ( values[0][0] == 0 || values[0][1] == 0 || values[1][0] == 0 || values[1][1] == 0 ) test_fail( __FILE__, __LINE__, "Verify non-zero count for all domain types", 1 ); if ( values[2][0] == 0 || values[2][1] == 0 ) { if ( test_case == SUPERVISOR ) { printf ( "WARNING: No events counted in supervisor context. This is expected in a non-virtualized environment.\n" ); } else { test_fail( __FILE__, __LINE__, "Verify non-zero count for all domain types", 1 ); } } } /* Do the set_domain on the eventset before adding events */ void case1( int num ) { int retval, num_tests = 3; long long **values; int EventSet1 = PAPI_NULL, EventSet2 = PAPI_NULL, EventSet3 = PAPI_NULL; PAPI_option_t options; const PAPI_component_info_t *cmpinfo; memset( &options, 0x0, sizeof ( options ) ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* get info from cpu component */ cmpinfo = PAPI_get_component_info( 0 ); if ( cmpinfo == NULL ) { test_fail( __FILE__, __LINE__,"PAPI_get_component_info", PAPI_ECMP); } if ( ( retval = PAPI_query_event( PAPI_TOT_INS ) ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", retval ); if ( ( retval = PAPI_query_event( PAPI_TOT_CYC ) ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", retval ); retval = PAPI_create_eventset( &EventSet1 ); if ( retval == PAPI_OK ) retval = PAPI_create_eventset( &EventSet2 ); if ( retval == PAPI_OK ) retval = PAPI_create_eventset( &EventSet3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet1, 0 ); if ( retval == PAPI_OK ) retval = PAPI_assign_eventset_component( EventSet2, 0 ); if ( retval == PAPI_OK ) retval = PAPI_assign_eventset_component( EventSet3, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); if ( num == CREATE ) { printf ( "\nTest case CREATE: Call PAPI_set_opt(PAPI_DOMAIN) on EventSet before add\n" ); options.domain.eventset = EventSet1; options.domain.domain = PAPI_DOM_ALL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); options.domain.eventset = EventSet2; options.domain.domain = PAPI_DOM_KERNEL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); options.domain.eventset = EventSet3; options.domain.domain = PAPI_DOM_USER; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); } retval = PAPI_add_event( EventSet1, PAPI_TOT_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_INS)", retval ); retval = PAPI_add_event( EventSet1, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_CYC)", retval ); retval = PAPI_add_event( EventSet2, PAPI_TOT_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_INS)", retval ); retval = PAPI_add_event( EventSet2, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_CYC)", retval ); retval = PAPI_add_event( EventSet3, PAPI_TOT_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_INS)", retval ); if ( num == MIDDLE ) { printf ( "\nTest case MIDDLE: Call PAPI_set_opt(PAPI_DOMAIN) on EventSet between adds\n" ); options.domain.eventset = EventSet1; options.domain.domain = PAPI_DOM_ALL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK && retval != PAPI_ECMP ) { test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); } options.domain.eventset = EventSet2; options.domain.domain = PAPI_DOM_KERNEL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); options.domain.eventset = EventSet3; options.domain.domain = PAPI_DOM_USER; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); } retval = PAPI_add_event( EventSet3, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_CYC)", retval ); if ( num == ADD ) { printf ( "\nTest case ADD: Call PAPI_set_opt(PAPI_DOMAIN) on EventSet after add\n" ); options.domain.eventset = EventSet1; options.domain.domain = PAPI_DOM_ALL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK && retval != PAPI_ECMP ) { test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); } options.domain.eventset = EventSet2; options.domain.domain = PAPI_DOM_KERNEL; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); options.domain.eventset = EventSet3; options.domain.domain = PAPI_DOM_USER; retval = PAPI_set_opt( PAPI_DOMAIN, &options ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); } /* 2 events */ values = allocate_test_space( num_tests, 2 ); if ( num == CHANGE ) { /* This testcase is dependent on the CREATE testcase running immediately before it, using * domain settings of "All", "Kernel" and "User", on event sets 1, 2, and 3, respectively. */ PAPI_option_t option; printf ( "\nTest case CHANGE 1: Change domain on EventSet between runs, using generic domain options:\n" ); PAPI_start( EventSet1 ); PAPI_stop( EventSet1, values[0] ); // change EventSet1 domain from All to User option.domain.domain = PAPI_DOM_USER; option.domain.eventset = EventSet1; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); PAPI_start( EventSet2 ); PAPI_stop( EventSet2, values[1] ); // change EventSet2 domain from Kernel to All option.domain.domain = PAPI_DOM_ALL; option.domain.eventset = EventSet2; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); PAPI_start( EventSet3 ); PAPI_stop( EventSet3, values[2] ); // change EventSet3 domain from User to Kernel option.domain.domain = PAPI_DOM_KERNEL; option.domain.eventset = EventSet3; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); free_test_space( values, num_tests ); values = allocate_test_space( num_tests, 2 ); } if ( num == SUPERVISOR && ( cmpinfo->available_domains & PAPI_DOM_SUPERVISOR ) ) { PAPI_option_t option; printf ( "\nTest case CHANGE 2: Change domain on EventSets to include/exclude supervisor events:\n" ); option.domain.domain = PAPI_DOM_ALL; option.domain.eventset = EventSet1; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain ALL ", retval ); option.domain.domain = PAPI_DOM_ALL ^ PAPI_DOM_SUPERVISOR; option.domain.eventset = EventSet2; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) { /* DOM_ALL is special-cased as domains_available */ /* in papi.c . Some machines don't like DOM_OTHER */ /* so try that if the above case fails. */ option.domain.domain ^= PAPI_DOM_OTHER; option.domain.eventset = EventSet2; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if (retval != PAPI_OK) { test_fail( __FILE__, __LINE__, "PAPI_set_domain ALL^SUPERVISOR ", retval ); } } option.domain.domain = PAPI_DOM_SUPERVISOR; option.domain.eventset = EventSet3; retval = PAPI_set_opt( PAPI_DOMAIN, &option ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain SUPERVISOR ", retval ); free_test_space( values, num_tests ); values = allocate_test_space( num_tests, 2 ); } /* Warm it up dude */ PAPI_start( EventSet1 ); do_flops( NUM_FLOPS ); PAPI_stop( EventSet1, NULL ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_start( EventSet2 ); do_flops( NUM_FLOPS ); if ( retval == PAPI_OK ) { retval = PAPI_stop( EventSet2, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } else { values[1][0] = retval; values[1][1] = retval; } retval = PAPI_start( EventSet3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet3, values[2] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_cleanup_eventset( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup", retval ); retval = PAPI_destroy_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy", retval ); retval = PAPI_cleanup_eventset( EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup", retval ); retval = PAPI_destroy_eventset( &EventSet2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy", retval ); retval = PAPI_cleanup_eventset( EventSet3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup", retval ); retval = PAPI_destroy_eventset( &EventSet3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy", retval ); dump_and_verify( num, values ); free(values); PAPI_shutdown( ); } void case2( int num, int domain, long long *values ) { int retval; int EventSet1 = PAPI_NULL; PAPI_option_t options; memset( &options, 0x0, sizeof ( options ) ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_query_event( PAPI_TOT_INS ) ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", retval ); if ( ( retval = PAPI_query_event( PAPI_TOT_CYC ) ) != PAPI_OK ) test_skip( __FILE__, __LINE__, "PAPI_query_event", retval ); if ( num == CREATE ) { printf ( "\nTest case 2, CREATE: Call PAPI_set_domain(%s) before create\n", stringify_domain( domain ) ); printf ( "This should override the domain setting for this EventSet.\n" ); retval = PAPI_set_domain( domain ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); } retval = PAPI_create_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( num == ADD ) { printf( "\nTest case 2, ADD: Call PAPI_set_domain(%s) before add\n", stringify_domain( domain ) ); printf ( "This should have no effect on the domain setting for this EventSet.\n" ); retval = PAPI_set_domain( domain ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); } retval = PAPI_add_event( EventSet1, PAPI_TOT_INS ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_INS)", retval ); if ( num == MIDDLE ) { printf ( "\nTest case 2, MIDDLE: Call PAPI_set_domain(%s) between adds\n", stringify_domain( domain ) ); printf ( "This should have no effect on the domain setting for this EventSet.\n" ); retval = PAPI_set_domain( domain ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_domain", retval ); } retval = PAPI_add_event( EventSet1, PAPI_TOT_CYC ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event(PAPI_TOT_CYC)", retval ); /* Warm it up dude */ PAPI_start( EventSet1 ); do_flops( NUM_FLOPS ); PAPI_stop( EventSet1, NULL ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet1, values ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_cleanup_eventset( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup", retval ); retval = PAPI_destroy_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy", retval ); PAPI_shutdown( ); } void case2_driver( void ) { long long **values; /* 3 tests, 2 events */ values = allocate_test_space( 3, 2 ); case2( CREATE, PAPI_DOM_KERNEL, values[0] ); case2( ADD, PAPI_DOM_KERNEL, values[1] ); case2( MIDDLE, PAPI_DOM_KERNEL, values[2] ); dump_and_verify( CASE2, values ); free(values); } void case1_driver( void ) { case1( ADD ); case1( MIDDLE ); case1( CREATE ); case1( CHANGE ); case1( SUPERVISOR ); } int main( int argc, char **argv ) { tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ #if defined(sgi) && defined(host_mips) uid_t id; id = getuid( ); if ( id != 0 ) { printf( "IRIX requires root for PAPI_DOM_KERNEL and PAPI_DOM_ALL.\n" ); test_skip( __FILE__, __LINE__, "", 1 ); } #endif printf ( "Test second.c: set domain of eventset via PAPI_set_domain and PAPI_set_opt.\n\n" ); printf ( "* PAPI_set_domain(DOMAIN) sets the default domain \napplied to subsequently created EventSets.\n" ); printf( "It should have no effect on existing EventSets.\n\n" ); printf ( "* PAPI_set_opt(DOMAIN,xxx) sets the domain for a specific EventSet.\n" ); printf ( "It should always override the default setting for that EventSet.\n" ); case2_driver( ); case1_driver( ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/disable_component.c0000600003276200002170000000413312247131121020062 0ustar ralphundrgrad/* * File: disable_component.c * Author: Vince Weaver * vweaver1@eecs.utk.edu */ /* This tests the functionality of PAPI_disable_component() */ #include "papi_test.h" int main( int argc, char **argv ) { int retval; const PAPI_component_info_t* cmpinfo; int numcmp, cid, active_components=0; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Disable All Compiled-in Components */ numcmp = PAPI_num_components( ); if (!TESTS_QUIET) printf("Compiled-in components:\n"); for( cid = 0; cid < numcmp; cid++ ) { cmpinfo = PAPI_get_component_info( cid ); if (!TESTS_QUIET) { printf( "Name: %-23s %s\n", cmpinfo->name, cmpinfo->description); } retval=PAPI_disable_component( cid ); if (retval!=PAPI_OK) { test_fail(__FILE__,__LINE__,"Error disabling component",retval); } } /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* Try to disable after init, should fail */ retval=PAPI_disable_component( 0 ); if (retval==PAPI_OK) { test_fail( __FILE__, __LINE__, "PAPI_disable_component should fail", retval ); } if (!TESTS_QUIET) printf("\nAfter init components:\n"); for( cid = 0; cid < numcmp; cid++ ) { cmpinfo = PAPI_get_component_info( cid ); if (!TESTS_QUIET) { printf( "%d %d Name: %-23s %s\n", cid, PAPI_get_component_index((char *)cmpinfo->name), cmpinfo->name ,cmpinfo->description); } if (cid!=PAPI_get_component_index((char *)cmpinfo->name)) { test_fail( __FILE__, __LINE__, "PAPI_get_component_index mismatch", 2 ); } if (cmpinfo->disabled) { if (!TESTS_QUIET) { printf(" \\-> Disabled: %s\n",cmpinfo->disabled_reason); } } else { active_components++; } } if (active_components>0) { test_fail( __FILE__, __LINE__, "too many active components", retval ); } test_pass( __FILE__, NULL, 0 ); return PAPI_OK; } papi-5.3.0/src/ctests/reset_multiplex.c0000600003276200002170000002015712247131121017626 0ustar ralphundrgrad/* This file performs the same tests as the reset test but does it with the events multiplexed. This is mostly to test perf_event, where resetting multiplexed events is handled differently than grouped events. */ #include "papi_test.h" int main( int argc, char **argv ) { int retval, num_tests = 9, num_events, tmp, i; long long **values; int EventSet = PAPI_NULL; int PAPI_event, mask; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = PAPI_multiplex_init( ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_multiplex_init", retval ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet = add_two_events( &num_events, &PAPI_event, &mask ); /* Set multiplexing on the eventset */ retval = PAPI_set_multiplex( EventSet ); if ( retval != PAPI_OK) { test_fail(__FILE__, __LINE__, "Setting multiplex", retval); } retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); values = allocate_test_space( num_tests, num_events ); /*===== Test 1: Start/Stop =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 2 Start/Stop =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 3: Reset/Start/Stop =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[2] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 4: Start/Read =======================*/ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[3] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 5: Read =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[4] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 6: Read/Accum =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[5] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } retval = PAPI_accum( EventSet, values[5] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_accum", retval ); } /*===== Test 7: Read =======================*/ do_flops( NUM_FLOPS / 2 ); retval = PAPI_read( EventSet, values[6] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } /*===== Test 8 Reset/Stop =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } do_flops( NUM_FLOPS / 2 ); retval = PAPI_stop( EventSet, values[7] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /*===== Test 9: Reset/Read =======================*/ retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); } retval = PAPI_read( EventSet, values[8] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_read", retval ); } remove_test_events( &EventSet, mask ); printf( "Test case: Start/Stop/Read/Accum/Reset.\n" ); printf( "----------------------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); sprintf( add_event_str, "%s:", event_name ); printf( " PAPI_TOT_CYC %s\n", event_name ); printf( "1. start,ops,stop %10lld %10lld\n", values[0][0], values[0][1] ); printf( "2. start,ops,stop %10lld %10lld\n", values[1][0], values[1][1] ); printf( "3. reset,start,ops,stop %10lld %10lld\n", values[2][0], values[2][1] ); printf( "4. start,ops/2,read %10lld %10lld\n", values[3][0], values[3][1] ); printf( "5. ops/2,read %10lld %10lld\n", values[4][0], values[4][1] ); printf( "6. ops/2,accum %10lld %10lld\n", values[5][0], values[5][1] ); printf( "7. ops/2,read %10lld %10lld\n", values[6][0], values[6][1] ); printf( "8. reset,ops/2,stop %10lld %10lld\n", values[7][0], values[7][1] ); printf( "9. reset,read %10lld %10lld\n", values[8][0], values[8][1] ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf( "Row 1 approximately equals rows 2 and 3 \n" ); printf( "Row 4 approximately equals 1/2 of row 3\n" ); printf( "Row 5 approximately equals twice row 4\n" ); printf( "Row 6 approximately equals 6 times row 4\n" ); printf( "Rows 7 and 8 approximately equal row 4\n" ); printf( "Row 9 equals 0\n" ); printf( "%% difference between %s 1 & 2: %.2f\n", "PAPI_TOT_CYC", 100.0 * ( float ) values[0][0] / ( float ) values[1][0] ); printf( "%% difference between %s 1 & 2: %.2f\n", add_event_str, 100.0 * ( float ) values[0][1] / ( float ) values[1][1] ); for ( i = 0; i <= 1; i++ ) { if ( !approx_equals ( ( double ) values[0][i], ( double ) values[1][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[1][i], ( double ) values[2][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[2][i], ( double ) values[3][i] * 2.0 ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[2][i], ( double ) values[4][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[5][i], ( double ) values[3][i] * 6.0 ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[6][i], ( double ) values[3][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( !approx_equals ( ( double ) values[7][i], ( double ) values[3][i] ) ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); if ( values[8][i] != 0LL ) test_fail( __FILE__, __LINE__, ( ( i == 0 ) ? "PAPI_TOT_CYC" : add_event_str ), 1 ); } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/exec.c0000600003276200002170000000174312247131121015325 0ustar ralphundrgrad/* * File: exec.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: start, stop and timer functionality for a parent and a forked child. */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); PAPI_shutdown( ); if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/vector.c0000600003276200002170000001146312247131121015703 0ustar ralphundrgrad#include #include #include #define NUMBER 100 inline void inline_packed_sse_add( float *aa, float *bb, float *cc ) { __asm__ __volatile__( "movaps (%0), %%xmm0;" "movaps (%1), %%xmm1;" "addps %%xmm0, %%xmm1;" "movaps %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_packed_sse_mul( float *aa, float *bb, float *cc ) { __asm__ __volatile__( "movaps (%0), %%xmm0;" "movaps (%1), %%xmm1;" "mulps %%xmm0, %%xmm1;" "movaps %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_packed_sse2_add( double *aa, double *bb, double *cc ) { __asm__ __volatile__( "movapd (%0), %%xmm0;" "movapd (%1), %%xmm1;" "addpd %%xmm0, %%xmm1;" "movapd %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_packed_sse2_mul( double *aa, double *bb, double *cc ) { __asm__ __volatile__( "movapd (%0), %%xmm0;" "movapd (%1), %%xmm1;" "mulpd %%xmm0, %%xmm1;" "movapd %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_unpacked_sse_add( float *aa, float *bb, float *cc ) { __asm__ __volatile__( "movss (%0), %%xmm0;" "movss (%1), %%xmm1;" "addss %%xmm0, %%xmm1;" "movss %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_unpacked_sse_mul( float *aa, float *bb, float *cc ) { __asm__ __volatile__( "movss (%0), %%xmm0;" "movss (%1), %%xmm1;" "mulss %%xmm0, %%xmm1;" "movss %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_unpacked_sse2_add( double *aa, double *bb, double *cc ) { __asm__ __volatile__( "movsd (%0), %%xmm0;" "movsd (%1), %%xmm1;" "addsd %%xmm0, %%xmm1;" "movsd %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } inline void inline_unpacked_sse2_mul( double *aa, double *bb, double *cc ) { __asm__ __volatile__( "movsd (%0), %%xmm0;" "movsd (%1), %%xmm1;" "mulsd %%xmm0, %%xmm1;" "movsd %%xmm1, (%2);"::"r"( aa ), "r"( bb ), "r"( cc ) :"%xmm0", "%xmm1" ); } int main( int argc, char **argv ) { int i, packed = 0, sse = 0; float a[4] = { 1.0, 2.0, 3.0, 4.0 }; float b[4] = { 2.0, 3.0, 4.0, 5.0 }; float c[4] = { 0.0, 0.0, 0.0, 0.0 }; double d[4] = { 1.0, 2.0, 3.0, 4.0 }; double e[4] = { 2.0, 3.0, 4.0, 5.0 }; double f[4] = { 0.0, 0.0, 0.0, 0.0 }; if ( argc != 3 ) { bail: printf( "Usage %s: \n", argv[0] ); exit( 1 ); } if ( strcasecmp( argv[1], "packed" ) == 0 ) packed = 1; else if ( strcasecmp( argv[1], "unpacked" ) == 0 ) packed = 0; else goto bail; if ( strcasecmp( argv[2], "sse" ) == 0 ) sse = 1; else if ( strcasecmp( argv[2], "sse2" ) == 0 ) sse = 0; else goto bail; #if 0 if ( ( sse ) && ( system( "cat /proc/cpuinfo | grep sse > /dev/null" ) != 0 ) ) { printf( "This processor does not have SSE.\n" ); exit( 1 ); } if ( ( sse == 0 ) && ( system( "cat /proc/cpuinfo | grep sse2 > /dev/null" ) != 0 ) ) { printf( "This processor does not have SSE2.\n" ); exit( 1 ); } #endif printf( "Vector 1: %f %f %f %f\n", a[0], a[1], a[2], a[3] ); printf( "Vector 2: %f %f %f %f\n\n", b[0], b[1], b[2], b[3] ); if ( ( packed == 0 ) && ( sse == 1 ) ) { for ( i = 0; i < NUMBER; i++ ) { inline_unpacked_sse_add( &a[0], &b[0], &c[0] ); } printf( "%d SSE Unpacked Adds: Result %f\n", NUMBER, c[0] ); for ( i = 0; i < NUMBER; i++ ) { inline_unpacked_sse_mul( &a[0], &b[0], &c[0] ); } printf( "%d SSE Unpacked Muls: Result %f\n", NUMBER, c[0] ); } if ( ( packed == 1 ) && ( sse == 1 ) ) { for ( i = 0; i < NUMBER; i++ ) { inline_packed_sse_add( a, b, c ); } printf( "%d SSE Packed Adds: Result %f %f %f %f\n", NUMBER, c[0], c[1], c[2], c[3] ); for ( i = 0; i < NUMBER; i++ ) { inline_packed_sse_mul( a, b, c ); } printf( "%d SSE Packed Muls: Result %f %f %f %f\n", NUMBER, c[0], c[1], c[2], c[3] ); } if ( ( packed == 0 ) && ( sse == 0 ) ) { for ( i = 0; i < NUMBER; i++ ) { inline_unpacked_sse2_add( &d[0], &e[0], &f[0] ); } printf( "%d SSE2 Unpacked Adds: Result %f\n", NUMBER, c[0] ); for ( i = 0; i < NUMBER; i++ ) { inline_unpacked_sse2_mul( &d[0], &e[0], &f[0] ); } printf( "%d SSE2 Unpacked Muls: Result %f\n", NUMBER, c[0] ); } if ( ( packed == 1 ) && ( sse == 0 ) ) { for ( i = 0; i < NUMBER; i++ ) { inline_packed_sse2_add( &d[0], &e[0], &f[0] ); } printf( "%d SSE2 Packed Adds: Result %f\n", NUMBER, c[0] ); for ( i = 0; i < NUMBER; i++ ) { inline_packed_sse2_mul( &d[0], &e[0], &f[0] ); } printf( "%d SSE2 Packed Muls: Result %f\n", NUMBER, c[0] ); } exit( 0 ); } papi-5.3.0/src/ctests/multiplex3_pthreads.c0000600003276200002170000001432012247131121020374 0ustar ralphundrgrad/* * File: multiplex3_pthreads.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: John May * johnmay@llnl.gov */ /* This file tests the multiplex functionality when there are * threads in which the application isn't calling PAPI (and only * one thread that is calling PAPI.) */ #include #include "papi_test.h" #define MAX_TO_ADD 5 /* A thread function that does nothing forever, while the other * tests are running. */ void * thread_fn( void *dummy ) { ( void ) dummy; while ( 1 ) { do_stuff( ); } return ( NULL ); } /* Runs a bunch of multiplexed events */ void mainloop( int arg ) { int allvalid; long long *values; int EventSet = PAPI_NULL; int retval, i, j = 2, skipped_counters=0; PAPI_event_info_t pset; ( void ) arg; /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); init_multiplex( ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); retval = PAPI_set_multiplex( EventSet ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); if ( ( retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } retval = PAPI_add_event( EventSet, PAPI_TOT_INS ); if ( ( retval != PAPI_OK ) && ( retval != PAPI_ECNFLCT ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( !TESTS_QUIET ) { printf( "Added %s\n", "PAPI_TOT_INS" ); } retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ); if ( ( retval != PAPI_OK ) && ( retval != PAPI_ECNFLCT ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( !TESTS_QUIET ) { printf( "Added %s\n", "PAPI_TOT_CYC" ); } values = ( long long * ) malloc( MAX_TO_ADD * sizeof ( long long ) ); if ( values == NULL ) test_fail( __FILE__, __LINE__, "malloc", 0 ); for ( i = 0; i < PAPI_MAX_PRESET_EVENTS; i++ ) { retval = PAPI_get_event_info( i | PAPI_PRESET_MASK, &pset ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info", retval ); if ( pset.count ) { printf( "Adding %s\n", pset.symbol ); retval = PAPI_add_event( EventSet, ( int ) pset.event_code ); if ( ( retval != PAPI_OK ) && ( retval != PAPI_ECNFLCT ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); if ( retval == PAPI_OK ) { printf( "Added %s\n", pset.symbol ); } else { printf( "Could not add %s\n", pset.symbol ); } do_stuff( ); if ( retval == PAPI_OK ) { retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( values[j] ) { if ( ++j >= MAX_TO_ADD ) break; } else { retval = PAPI_remove_event( EventSet, ( int ) pset.event_code ); if ( retval == PAPI_OK ) printf( "Removed %s\n", pset.symbol ); /* This added because the test */ /* can take a long time if mplexing */ /* is broken and all values are 0 */ skipped_counters++; if (skipped_counters>MAX_TO_ADD) break; } } } } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, values ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); test_print_event_header( "multiplex3_pthreads:\n", EventSet ); allvalid = 0; for ( i = 0; i < MAX_TO_ADD; i++ ) { printf( ONENUM, values[i] ); if ( values[i] != 0 ) allvalid++; } printf( "\n" ); if ( !allvalid ) test_fail( __FILE__, __LINE__, "all counter registered no counts", 1 ); retval = PAPI_cleanup_eventset( EventSet ); /* JT */ if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); free( values ); PAPI_shutdown( ); } int main( int argc, char **argv ) { int i, rc, retval; pthread_t id[NUM_THREADS]; pthread_attr_t attr; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ printf( "%s: Using %d threads\n\n", argv[0], NUM_THREADS ); printf ( "Does non-threaded multiplexing work with extraneous threads present?\n" ); /* Create a bunch of unused pthreads, to simulate threads created * by the system that the user doesn't know about. */ pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif #ifdef PPC64 sigset_t sigprof; sigemptyset( &sigprof ); sigaddset( &sigprof, SIGPROF ); retval = sigprocmask( SIG_BLOCK, &sigprof, NULL ); if ( retval != 0 ) test_fail( __FILE__, __LINE__, "sigprocmask SIG_BLOCK", retval ); #endif for ( i = 0; i < NUM_THREADS; i++ ) { rc = pthread_create( &id[i], &attr, thread_fn, NULL ); if ( rc ) test_fail( __FILE__, __LINE__, "pthread_create", rc ); } pthread_attr_destroy( &attr ); #ifdef PPC64 retval = sigprocmask( SIG_UNBLOCK, &sigprof, NULL ); if ( retval != 0 ) test_fail( __FILE__, __LINE__, "sigprocmask SIG_UNBLOCK", retval ); #endif mainloop( NUM_ITERS ); test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/sdsc4.c0000600003276200002170000002300012247131121015407 0ustar ralphundrgrad/* * $Id$ * * Test example for multiplex functionality, originally * provided by Timothy Kaiser, SDSC. It was modified to fit the * PAPI test suite by Nils Smeds, . * * This example verifies the adding and removal of multiplexed * events in an event set. */ #include "papi_test.h" #include #include #include #define MAXEVENTS 9 #define REPEATS (MAXEVENTS * 4) #define SLEEPTIME 100 #define MINCOUNTS 100000 static double dummy3( double x, int iters ); int main( int argc, char **argv ) { PAPI_event_info_t info; char name2[PAPI_MAX_STR_LEN]; int i, j, retval, idx, repeats; int iters = NUM_FLOPS; double x = 1.1, y, dtmp; long long t1, t2; long long values[MAXEVENTS], refvals[MAXEVENTS]; int nsamples[MAXEVENTS], truelist[MAXEVENTS], ntrue; #ifdef STARTSTOP long long dummies[MAXEVENTS]; #endif int sleep_time = SLEEPTIME; double valsample[MAXEVENTS][REPEATS]; double valsum[MAXEVENTS]; double avg[MAXEVENTS]; double spread[MAXEVENTS]; int nevents = MAXEVENTS, nev1; int eventset = PAPI_NULL; int events[MAXEVENTS]; int eventidx[MAXEVENTS]; int eventmap[MAXEVENTS]; int fails; events[0] = PAPI_FP_INS; events[1] = PAPI_TOT_CYC; events[2] = PAPI_TOT_INS; events[3] = PAPI_TOT_IIS; events[4] = PAPI_INT_INS; events[5] = PAPI_STL_CCY; events[6] = PAPI_BR_INS; events[7] = PAPI_SR_INS; events[8] = PAPI_LD_INS; for ( i = 0; i < MAXEVENTS; i++ ) { values[i] = 0; valsum[i] = 0; nsamples[i] = 0; } if ( argc > 1 ) { if ( !strcmp( argv[1], "TESTS_QUIET" ) ) tests_quiet( argc, argv ); else { sleep_time = atoi( argv[1] ); if ( sleep_time <= 0 ) sleep_time = SLEEPTIME; } } if ( !TESTS_QUIET ) { printf( "\nFunctional check of multiplexing routines.\n" ); printf( "Adding and removing events from an event set.\n\n" ); } if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); #ifdef MPX init_multiplex( ); #endif if ( ( retval = PAPI_create_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); #ifdef MPX /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( eventset, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); if ( ( retval = PAPI_set_multiplex( eventset ) ) ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } #endif /* What does this code even do */ nevents = MAXEVENTS; for ( i = 0; i < nevents; i++ ) { if ( ( retval = PAPI_add_event( eventset, events[i] ) ) ) { for ( j = i; j < MAXEVENTS-1; j++ ) events[j] = events[j + 1]; nevents--; i--; } } if ( nevents < 3 ) test_skip( __FILE__, __LINE__, "Not enough events left...", 0 ); /* Find a reasonable number of iterations (each * event active 20 times) during the measurement */ t2 = 10000 * 20 * nevents; /* Target: 10000 usec/multiplex, 20 repeats */ if ( t2 > 30e6 ) test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); /* Measure one run */ t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); t1 = PAPI_get_real_usec( ) - t1; if ( t2 > t1 ) /* Scale up execution time to match t2 */ iters = iters * ( int ) ( t2 / t1 ); else if ( t1 > 30e6 ) /* Make sure execution time is < 30s per repeated test */ test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); j = nevents; for ( i = 1; i < nevents; i = i + 2 ) eventidx[--j] = i; for ( i = 0; i < nevents; i = i + 2 ) eventidx[--j] = i; assert( j == 0 ); for ( i = 0; i < nevents; i++ ) eventmap[i] = i; x = 1.0; if ( !TESTS_QUIET ) printf( "\nReference run:\n" ); t1 = PAPI_get_real_usec( ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); y = dummy3( x, iters ); PAPI_read( eventset, refvals ); t2 = PAPI_get_real_usec( ); ntrue = nevents; PAPI_list_events( eventset, truelist, &ntrue ); if ( !TESTS_QUIET ) { printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( y / ( double ) ( t2 - t1 ) ) ); printf( "%20s %16s %-15s %-15s\n", "PAPI measurement:", "Acquired count", "Expected event", "PAPI_list_events" ); } if ( !TESTS_QUIET ) { for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); PAPI_event_code_to_name( truelist[j], name2 ); if ( !TESTS_QUIET ) printf( "%20s = %16lld %-15s %-15s %s\n", info.short_descr, refvals[j], info.symbol, name2, strcmp( info.symbol, name2 ) ? "*** MISMATCH ***" : "" ); } printf( "\n" ); } nev1 = nevents; repeats = nevents * 4; for ( i = 0; i < repeats; i++ ) { if ( ( i % nevents ) + 1 == nevents ) continue; if ( !TESTS_QUIET ) printf( "\nTest %d (of %d):\n", i + 1 - i / nevents, repeats - 4 ); if ( ( retval = PAPI_stop( eventset, values ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); j = eventidx[i % nevents]; if ( ( i / nevents ) % 2 == 0 ) { PAPI_get_event_info( events[j], &info ); if ( !TESTS_QUIET ) printf( "Removing event[%d]: %s\n", j, info.short_descr ); if ( ( retval = PAPI_remove_event( eventset, events[j] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_remove_event", retval ); nev1--; for ( idx = 0; eventmap[idx] != j; idx++ ); for ( j = idx; j < nev1; j++ ) eventmap[j] = eventmap[j + 1]; } else { PAPI_get_event_info( events[j], &info ); if ( !TESTS_QUIET ) printf( "Adding event[%d]: %s\n", j, info.short_descr ); if ( ( retval = PAPI_add_event( eventset, events[j] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); eventmap[nev1] = j; nev1++; } if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); x = 1.0; #ifndef STARTSTOP if ( ( retval = PAPI_reset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); #else if ( ( retval = PAPI_stop( eventset, dummies ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); #endif t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); PAPI_read( eventset, values ); t2 = PAPI_get_real_usec( ); if ( !TESTS_QUIET ) { printf( "\n(calculated independent of PAPI)\n" ); printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( y / ( double ) ( t2 - t1 ) ) ); printf( "%20s %16s %-15s %-15s\n", "PAPI measurement:", "Acquired count", "Expected event", "PAPI_list_events" ); } ntrue = nev1; PAPI_list_events( eventset, truelist, &ntrue ); for ( j = 0; j < nev1; j++ ) { idx = eventmap[j]; /* printf("Mapping: Counter %d -> slot %d.\n",j,idx); */ PAPI_get_event_info( events[idx], &info ); PAPI_event_code_to_name( truelist[j], name2 ); if ( !TESTS_QUIET ) printf( "%20s = %16lld %-15s %-15s %s\n", info.short_descr, values[j], info.symbol, name2, strcmp( info.symbol, name2 ) ? "*** MISMATCH ***" : "" ); dtmp = ( double ) values[j]; valsum[idx] += dtmp; valsample[idx][nsamples[idx]] = dtmp; nsamples[idx]++; } if ( !TESTS_QUIET ) printf( "\n" ); } if ( ( retval = PAPI_stop( eventset, values ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "\n\nEstimated variance relative to average counts:\n" ); for ( j = 0; j < nev1; j++ ) printf( " Event %.2d", j ); printf( "\n" ); } fails = nevents; /* Due to limited precision of floating point cannot really use typical standard deviation compuation for large numbers with very small variations. Instead compute the std devation problems with precision. */ for ( j = 0; j < nev1; j++ ) { avg[j] = valsum[j] / nsamples[j]; spread[j] = 0; for ( i = 0; i < nsamples[j]; ++i ) { double diff = ( valsample[j][i] - avg[j] ); spread[j] += diff * diff; } spread[j] = sqrt( spread[j] / nsamples[j] ) / avg[j]; if ( !TESTS_QUIET ) printf( "%9.2g ", spread[j] ); /* Make sure that NaN get counted as errors */ if ( spread[j] < MPX_TOLERANCE ) fails--; else if ( values[j] < MINCOUNTS ) /* Neglect inprecise results with low counts */ fails--; } if ( !TESTS_QUIET ) { printf( "\n\n" ); for ( j = 0; j < nev1; j++ ) { PAPI_get_event_info( events[j], &info ); printf( "Event %.2d: mean=%10.0f, sdev/mean=%7.2g nrpt=%2d -- %s\n", j, avg[j], spread[j], nsamples[j], info.short_descr ); } printf( "\n\n" ); } if ( fails ) test_fail( __FILE__, __LINE__, "Values differ from reference", fails ); else test_pass( __FILE__, NULL, 0 ); return 0; } static double dummy3( double x, int iters ) { int i; double w, y, z, a, b, c, d, e, f, g, h; double one; one = 1.0; w = x; y = x; z = x; a = x; b = x; c = x; d = x; e = x; f = x; g = x; h = x; for ( i = 1; i <= iters; i++ ) { w = w * 1.000000000001 + one; y = y * 1.000000000002 + one; z = z * 1.000000000003 + one; a = a * 1.000000000004 + one; b = b * 1.000000000005 + one; c = c * 0.999999999999 + one; d = d * 0.999999999998 + one; e = e * 0.999999999997 + one; f = f * 0.999999999996 + one; g = h * 0.999999999995 + one; h = h * 1.000000000006 + one; } return 2.0 * ( a + b + c + d + e + f + w + x + y + z + g + h ); } papi-5.3.0/src/ctests/remove_events.c0000600003276200002170000000636212247131121017264 0ustar ralphundrgrad/* This test checks if removing events works properly at the low level by Vince Weaver (vweaver1@eecs.utk.edu) */ #include "papi_test.h" int main( int argc, char **argv ) { int retval; int EventSet = PAPI_NULL; long long values1[2],values2[2]; char *event_names[] = {"PAPI_TOT_CYC","PAPI_TOT_INS"}; char add_event_str[PAPI_MAX_STR_LEN]; double instructions_error; long long old_instructions; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* Create an empty event set */ retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); } /* add the events named above */ retval = PAPI_add_named_event( EventSet, event_names[0] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[0] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } retval = PAPI_add_named_event( EventSet, event_names[1] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[1] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } /* Start PAPI */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* our test code */ do_flops( NUM_FLOPS ); /* Stop PAPI */ retval = PAPI_stop( EventSet, values1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } old_instructions=values1[1]; if ( !TESTS_QUIET ) { printf( "========================\n" ); /* cycles is first, other event second */ sprintf( add_event_str, "%-12s : \t", event_names[0] ); printf( TAB1, add_event_str, values1[0] ); sprintf( add_event_str, "%-12s : \t", event_names[1] ); printf( TAB1, add_event_str, values1[1] ); } /* remove PAPI_TOT_CYC */ retval = PAPI_remove_named_event( EventSet, event_names[0] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[0] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } /* Start PAPI */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* our test code */ do_flops( NUM_FLOPS ); /* Stop PAPI */ retval = PAPI_stop( EventSet, values2 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* test if after removing the event, the second event */ /* still points to the proper native event */ /* this only works if IPC != 1 */ if ( !TESTS_QUIET ) { printf( "==========================\n" ); printf( "After removing PAP_TOT_CYC\n"); sprintf( add_event_str, "%-12s : \t", event_names[1] ); printf( TAB1, add_event_str, values2[0] ); instructions_error=((double)old_instructions - (double)values2[0])/ (double)old_instructions; if (instructions_error>10.0) { printf("Error of %.2f%%\n",instructions_error); test_fail( __FILE__, __LINE__, "validation", 0 ); } } test_pass( __FILE__, NULL, 0 ); return 0; } papi-5.3.0/src/ctests/exec2.c0000600003276200002170000000171512247131121015406 0ustar ralphundrgrad/* * File: exec.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: start, stop and timer functionality for a parent and a forked child. */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_index.c0000600003276200002170000001072612247131121017434 0ustar ralphundrgrad/* * File: overflow_index.c * CVS: $Id$ * Author: min@cs.utk.edu * Min Zhou */ /* This file performs the following test: overflow dispatch on 2 counters. */ #include "papi_test.h" #define OVER_FMT "handler(%d) Overflow at %p! vector=0x%llx\n" #define OUT_FMT "%-12s : %16lld%16lld\n" #define INDEX_FMT "Overflows vector 0x%llx: \n" typedef struct { long long mask; int count; } ocount_t; /* there are three possible vectors, one counter overflows, the other counter overflows, both overflow */ ocount_t overflow_counts[3] = { {0, 0}, {0, 0}, {0, 0} }; int total_unknown = 0; void handler( int EventSet, void *address, long long overflow_vector, void *context ) { int i; ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } /* Look for the overflow_vector entry */ for ( i = 0; i < 3; i++ ) { if ( overflow_counts[i].mask == overflow_vector ) { overflow_counts[i].count++; return; } } /* Didn't find it so add it. */ for ( i = 0; i < 3; i++ ) { if ( overflow_counts[i].mask == ( long long ) 0 ) { overflow_counts[i].mask = overflow_vector; overflow_counts[i].count = 1; return; } } /* Unknown entry!?! */ total_unknown++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long ( values[3] )[2]; int retval; int PAPI_event, k, i; char event_name[PAPI_MAX_STR_LEN]; int index_array[2], number; int num_events1, mask1; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ EventSet = add_two_nonderived_events( &num_events1, &PAPI_event, &mask1 ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, THRESHOLD, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_overflow( EventSet, PAPI_TOT_CYC, THRESHOLD, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf ( "Test case: Overflow dispatch of 2nd event in set with 2 events.\n" ); printf ( "---------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", THRESHOLD ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-----------------------------------------------\n" ); printf( "Test type : %16d%16d\n", 1, 2 ); printf( OUT_FMT, "PAPI_TOT_CYC", ( values[0] )[0], ( values[1] )[0] ); printf( OUT_FMT, event_name, ( values[0] )[1], ( values[1] )[1] ); if ( overflow_counts[0].count == 0 && overflow_counts[1].count == 0 ) test_fail( __FILE__, __LINE__, "one counter had no overflows", 1 ); for ( k = 0; k < 3; k++ ) { if ( overflow_counts[k].mask ) { number = 2; retval = PAPI_get_overflow_event_index( EventSet, overflow_counts[k].mask, index_array, &number ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_overflow_event_index", retval ); printf( INDEX_FMT, ( long long ) overflow_counts[k].mask ); printf( " counts: %d ", overflow_counts[k].count ); for ( i = 0; i < number; i++ ) printf( " Event Index %d ", index_array[i] ); printf( "\n" ); } } printf( "Case 2 %s Overflows: %d\n", "Unknown", total_unknown ); printf( "-----------------------------------------------\n" ); if ( total_unknown > 0 ) test_fail( __FILE__, __LINE__, "Unknown counter had overflows", 1 ); retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/exeinfo.c0000600003276200002170000000427212247131121016036 0ustar ralphundrgrad/* * File: profile.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ #include #include #include "papi_test.h" int main( int argc, char **argv ) { int retval; const PAPI_exe_info_t *exeinfo; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( exeinfo = PAPI_get_executable_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", retval ); printf( "Path+Program: %s\n", exeinfo->fullname ); printf( "Program: %s\n", exeinfo->address_info.name ); printf( "Text start: %p, Text end: %p\n", exeinfo->address_info.text_start, exeinfo->address_info.text_end ); printf( "Data start: %p, Data end: %p\n", exeinfo->address_info.data_start, exeinfo->address_info.data_end ); printf( "Bss start: %p, Bss end: %p\n", exeinfo->address_info.bss_start, exeinfo->address_info.bss_end ); if ( ( strlen( &(exeinfo->fullname[0]) ) == 0 ) ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); if ( ( strlen( &(exeinfo->address_info.name[0]) ) == 0 ) ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); if ( ( exeinfo->address_info.text_start == 0x0 ) || ( exeinfo->address_info.text_end == 0x0 ) || ( exeinfo->address_info.text_start >= exeinfo->address_info.text_end ) ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); if ( ( exeinfo->address_info.data_start == 0x0 ) || ( exeinfo->address_info.data_end == 0x0 ) || ( exeinfo->address_info.data_start >= exeinfo->address_info.data_end ) ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); /* if ((exeinfo->address_info.bss_start == 0x0) || (exeinfo->address_info.bss_end == 0x0) || (exeinfo->address_info.bss_start >= exeinfo->address_info.bss_end)) test_fail(__FILE__, __LINE__, "PAPI_get_executable_info",1); */ sleep( 1 ); /* Needed for debugging, so you can ^Z and stop the process, inspect /proc to see if it's right */ test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/overflow.c0000600003276200002170000001242512247131121016243 0ustar ralphundrgrad/* * File: overflow.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: overflow dispatch The Eventset contains: + PAPI_TOT_CYC + PAPI_FP_INS (overflow monitor) - Start eventset 1 - Do flops - Stop and measure eventset 1 - Set up overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 */ #include "papi_test.h" #define OVER_FMT "handler(%d ) Overflow at %p! bit=0x%llx \n" #define OUT_FMT "%-12s : %16lld%16lld\n" static int total = 0; /* total overflows */ void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } total++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long ( values[2] )[2]; long long min, max; int num_flops = NUM_FLOPS, retval; int PAPI_event, mythreshold = THRESHOLD; char event_name1[PAPI_MAX_STR_LEN]; const PAPI_hw_info_t *hw_info = NULL; int num_events, mask; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init PAPI */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* Get hardware info */ hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); /* add PAPI_TOT_CYC and one of the events in */ /* PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, */ /* depending on the availability of the event on */ /* the platform */ EventSet = add_two_nonderived_events( &num_events, &PAPI_event, &mask ); printf("Using %#x for the overflow event\n",PAPI_event); if ( PAPI_event == PAPI_FP_INS ) { mythreshold = THRESHOLD; } else { #if defined(linux) mythreshold = ( int ) hw_info->cpu_max_mhz * 20000; #else mythreshold = THRESHOLD * 2; #endif } /* Start the run calibration run */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); /* stop the calibration run */ retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* set up overflow handler */ retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, 0, handler ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); } /* Start overflow run */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } do_flops( num_flops ); /* stop overflow run */ retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, 0, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( !TESTS_QUIET ) { if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf ( "Test case: Overflow dispatch of 2nd event in set with 2 events.\n" ); printf ( "---------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", mythreshold ); printf( "Using %d iterations of c += a*b\n", num_flops ); printf( "-----------------------------------------------\n" ); printf( "Test type : %16d%16d\n", 1, 2 ); printf( OUT_FMT, event_name1, ( values[0] )[1], ( values[1] )[1] ); printf( OUT_FMT, "PAPI_TOT_CYC", ( values[0] )[0], ( values[1] )[0] ); printf( "Overflows : %16s%16d\n", "", total ); printf( "-----------------------------------------------\n" ); } retval = PAPI_cleanup_eventset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( !TESTS_QUIET ) { printf( "Verification:\n" ); #if defined(linux) || defined(__ia64__) || defined(_POWER4) num_flops *= 2; #endif if ( PAPI_event == PAPI_FP_INS || PAPI_event == PAPI_FP_OPS ) { printf( "Row 1 approximately equals %d %d\n", num_flops, num_flops ); } printf( "Column 1 approximately equals column 2\n" ); printf( "Row 3 approximately equals %u +- %u %%\n", ( unsigned ) ( ( values[0] )[1] / ( long long ) mythreshold ), ( unsigned ) ( OVR_TOLERANCE * 100.0 ) ); } /* min = (long long)((values[0])[1]*(1.0-TOLERANCE)); max = (long long)((values[0])[1]*(1.0+TOLERANCE)); if ( (values[0])[1] > max || (values[0])[1] < min ) test_fail(__FILE__, __LINE__, event_name, 1); */ min = ( long long ) ( ( ( double ) values[0][1] * ( 1.0 - OVR_TOLERANCE ) ) / ( double ) mythreshold ); max = ( long long ) ( ( ( double ) values[0][1] * ( 1.0 + OVR_TOLERANCE ) ) / ( double ) mythreshold ); printf( "Overflows: total(%d) > max(%lld) || total(%d) < min(%lld) \n", total, max, total, min ); if ( total > max || total < min ) test_fail( __FILE__, __LINE__, "Overflows", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/zero.c0000600003276200002170000001161012247131121015352 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #define MAX_CYCLE_ERROR 30 int main( int argc, char **argv ) { int retval, num_tests = 1, tmp; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; double cycles_error; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events, &PAPI_event, &mask1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } sprintf( add_event_str, "PAPI_add_event[%s]", event_name ); values = allocate_test_space( num_tests, num_events ); /* warm up the processor to pull it out of idle state */ do_flops( NUM_FLOPS*10 ); /* Gather before stats */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Start PAPI */ retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* our work code */ do_flops( NUM_FLOPS ); /* Stop PAPI */ retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* Calculate total values */ elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { printf( "Test case 0: start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\n" ); sprintf( add_event_str, "%-12s : \t", event_name ); /* cycles is first, other event second */ printf( TAB1, add_event_str, values[0][1] ); /* If cycles is there, it's always the first event */ if ( mask1 & MASK_TOT_CYC ) { printf( TAB1, "PAPI_TOT_CYC : \t", values[0][0] ); } printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification: PAPI_TOT_CYC should be roughly real_cycles\n" ); printf( "NOTE: Not true if dynamic frequency scaling is enabled.\n" ); printf( "Verification: PAPI_FP_INS should be roughly %d\n", 2*NUM_FLOPS ); } /* Check that TOT_CYC and real_cycles roughly match */ cycles_error=100.0*((double)values[0][0] - (double)elapsed_cyc)/((double)elapsed_cyc); if ((cycles_error > MAX_CYCLE_ERROR) || (cycles_error < -MAX_CYCLE_ERROR)) { printf("PAPI_TOT_CYC Error of %.2f%%\n",cycles_error); test_fail( __FILE__, __LINE__, "Cycles validation", 0 ); } /* Check that FP_INS is reasonable */ if (abs(values[0][1] - (2*NUM_FLOPS)) > (2*NUM_FLOPS)) { printf("%s Error of %.2f%%\n", event_name, (100.0 * (double)(values[0][1] - (2*NUM_FLOPS)))/(2*NUM_FLOPS)); test_fail( __FILE__, __LINE__, "FLOPS validation", 0 ); } if (abs(values[0][1] - (2*NUM_FLOPS)) > (NUM_FLOPS/2)) { printf("%s Error of %.2f%%\n", event_name, (100.0 * (double)(values[0][1] - (2*NUM_FLOPS)))/(2*NUM_FLOPS)); test_warn( __FILE__, __LINE__, "FLOPS validation", 0 ); } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/overflow3_pthreads.c0000600003276200002170000000712612247131121020222 0ustar ralphundrgrad/* * File: overflow3_pthreads.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file tests the overflow functionality when there are * threads in which the application isn't calling PAPI (and only * one thread that is calling PAPI.) */ #include #include "papi_test.h" int total = 0; void * thread_fn( void *dummy ) { ( void ) dummy; while ( 1 ) { do_stuff( ); } return ( NULL ); } void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) overflow_vector; ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, "handler(%d ) Overflow at %p, thread 0x%lux!\n", EventSet, address, PAPI_thread_id( ) ); } total++; } void mainloop( int arg ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int mask1 = 0x0; int num_events1; long long **values; int PAPI_event; char event_name[PAPI_MAX_STR_LEN]; ( void ) arg; if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_nonderived_events( &num_events1, &PAPI_event, &mask1 ); values = allocate_test_space( num_tests, num_events1 ); if ( ( retval = PAPI_overflow( EventSet1, PAPI_event, THRESHOLD, 0, handler ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); do_stuff( ); if ( ( retval = PAPI_start( EventSet1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet1, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* clear the papi_overflow event */ if ( ( retval = PAPI_overflow( EventSet1, PAPI_event, 0, 0, NULL ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); if ( !TESTS_QUIET ) { printf( "Thread %#x %s : \t%lld\n", ( int ) pthread_self( ), event_name, ( values[0] )[0] ); printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", ( int ) pthread_self( ), ( values[0] )[1] ); } retval = PAPI_cleanup_eventset( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); free_test_space( values, num_tests ); PAPI_shutdown( ); } int main( int argc, char **argv ) { int i, rc, retval; pthread_t id[NUM_THREADS]; pthread_attr_t attr; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ printf( "%s: Using %d threads\n\n", argv[0], NUM_THREADS ); printf ( "Does non-threaded overflow work with extraneous threads present?\n" ); pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif for ( i = 0; i < NUM_THREADS; i++ ) { rc = pthread_create( &id[i], &attr, thread_fn, NULL ); if ( rc ) test_fail( __FILE__, __LINE__, "pthread_create", rc ); } pthread_attr_destroy( &attr ); mainloop( NUM_ITERS ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/zero_named.c0000600003276200002170000001100712247131121016516 0ustar ralphundrgrad/* This test exercises the PAPI_{query, add, remove}_event APIs for PRESET events. It more or less duplicates the functionality of the classic "zero" test. */ #include "papi_test.h" int main( int argc, char **argv ) { int retval, num_tests = 1, tmp; int EventSet = PAPI_NULL; int num_events = 2; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char *event_names[] = {"PAPI_TOT_CYC","PAPI_TOT_INS"}; char add_event_str[PAPI_MAX_STR_LEN]; double cycles_error; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } /* Verify that the named events exist */ retval = PAPI_query_named_event(event_names[0]); if ( retval == PAPI_OK) retval = PAPI_query_named_event(event_names[1]); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_query_named_event", retval ); /* Create an empty event set */ retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* add the events named above */ retval = PAPI_add_named_event( EventSet, event_names[0] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[0] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } retval = PAPI_add_named_event( EventSet, event_names[1] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[1] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } values = allocate_test_space( num_tests, num_events ); /* Gather before stats */ elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); /* Start PAPI */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_start", retval ); } /* our test code */ do_flops( NUM_FLOPS ); /* Stop PAPI */ retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); } /* Calculate total values */ elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; /* remove PAPI_TOT_CYC and PAPI_TOT_INS */ retval = PAPI_remove_named_event( EventSet, event_names[0] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[0] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } retval = PAPI_remove_named_event( EventSet, event_names[1] ); if ( retval != PAPI_OK ) { sprintf( add_event_str, "PAPI_add_named_event[%s]", event_names[1] ); test_fail( __FILE__, __LINE__, add_event_str, retval ); } if ( !TESTS_QUIET ) { printf( "PAPI_{query, add, remove}_named_event API test.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\n" ); /* cycles is first, other event second */ sprintf( add_event_str, "%-12s : \t", event_names[0] ); printf( TAB1, add_event_str, values[0][0] ); sprintf( add_event_str, "%-12s : \t", event_names[1] ); printf( TAB1, add_event_str, values[0][1] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf( "-------------------------------------------------------------------------\n" ); printf( "Verification: PAPI_TOT_CYC should be roughly real_cycles\n" ); cycles_error=100.0*((double)values[0][0] - (double)elapsed_cyc)/ (double)values[0][0]; if (cycles_error>10.0) { printf("Error of %.2f%%\n",cycles_error); test_fail( __FILE__, __LINE__, "validation", 0 ); } } test_pass( __FILE__, values, num_tests ); return 0; } papi-5.3.0/src/ctests/prof_utils.c0000600003276200002170000002450312247131121016566 0ustar ralphundrgrad/* * File: prof_utils.c * CVS: $Id$ * Author: Dan Terpstra * terpstra@cs.utk.edu * Mods: * */ /* This file contains utility functions useful for all profiling tests It can be used by: - profile.c, - sprofile.c, - profile_pthreads.c, - profile_twoevents.c, - earprofile.c, - future profiling tests. */ #include "papi_test.h" #include "prof_utils.h" /* variables global to profiling tests */ long long **values; char event_name[PAPI_MAX_STR_LEN]; int PAPI_event; int EventSet = PAPI_NULL; void *profbuf[5]; /* This function does the generic initialization stuff found at the top of most profile tests (most tests in general). This includes: - setting the QUIET flag; - initing the PAPI library; - setting the debug level; - getting hardware and executable info. It assumes that prginfo is global to the parent routine. */ void prof_init( int argc, char **argv, const PAPI_exe_info_t ** prginfo ) { int retval; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( *prginfo = PAPI_get_executable_info( ) ) == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_executable_info", 1 ); } /* Many profiling tests count one of {FP_INS, FP_OPS, TOT_INS} and TOT_CYC. This function creates an event set containing the appropriate pair of events. It also initializes the global event_name string to the event selected. Assumed globals: EventSet, PAPI_event, event_name. */ int prof_events( int num_tests) { int retval; int num_events, mask; /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ EventSet = add_two_nonderived_events( &num_events, &PAPI_event, &mask ); values = allocate_test_space( num_tests, num_events ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); return ( mask ); } /* This function displays info from the prginfo structure in a standardized format. */ void prof_print_address( char *title, const PAPI_exe_info_t * prginfo ) { printf( "%s\n", title ); printf ( "----------------------------------------------------------------\n" ); printf( "Text start: %p, Text end: %p, Text length: %#x\n", prginfo->address_info.text_start, prginfo->address_info.text_end, ( unsigned int ) ( prginfo->address_info.text_end - prginfo->address_info.text_start ) ); printf( "Data start: %p, Data end: %p\n", prginfo->address_info.data_start, prginfo->address_info.data_end ); printf( "BSS start : %p, BSS end : %p\n", prginfo->address_info.bss_start, prginfo->address_info.bss_end ); printf ( "----------------------------------------------------------------\n" ); } /* This function displays profining information useful for several profile tests. It (probably inappropriately) assumes use of a common THRESHOLD. This should probably be a passed parameter. Assumed globals: event_name, start, stop. */ void prof_print_prof_info( caddr_t start, caddr_t end, int threshold, char *event_name ) { printf( "Profiling event : %s\n", event_name ); printf( "Profile Threshold: %d\n", threshold ); printf( "Profile Iters : %d\n", ( getenv( "NUM_ITERS" ) ? atoi( getenv( "NUM_ITERS" ) ) : NUM_ITERS ) ); printf( "Profile Range : %p to %p\n", start, end ); printf ( "----------------------------------------------------------------\n" ); printf( "\n" ); } /* Most profile tests begin by counting the eventset with no profiling enabled. This function does that work. It assumes that the 'work' routine is do_both(). A better implementation would pass a pointer to the work function. Assumed globals: EventSet, values, event_name. */ void do_no_profile( void ) { int retval; if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( getenv( "NUM_ITERS" ) ? atoi( getenv( "NUM_ITERS" ) ) : NUM_ITERS ); if ( ( retval = PAPI_stop( EventSet, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); printf( "Test type : \t%s\n", "No profiling" ); printf( TAB1, event_name, ( values[0] )[0] ); printf( TAB1, "PAPI_TOT_CYC", ( values[0] )[1] ); } /* This routine allocates and initializes up to 5 equal sized profiling buffers. They need to be freed when profiling is completed. The number and size are passed parameters. The profbuf[] array of void * pointers is an assumed global. It should be cast to the required type by the parent routine. */ void prof_alloc( int num, unsigned long blength ) { int i; for ( i = 0; i < num; i++ ) { profbuf[i] = malloc( blength ); if ( profbuf[i] == NULL ) { test_fail( __FILE__, __LINE__, "malloc", PAPI_ESYS ); } memset( profbuf[i], 0x00, blength ); } } /* Given the profiling type (16, 32, or 64) this function returns the bucket size in bytes. NOTE: the bucket size does not ALWAYS correspond to the expected value, esp on architectures like Cray with weird data types. This is necessary because the posix_profile routine in extras.c relies on the data types and sizes produced by the compiler. */ int prof_buckets( int bucket ) { int bucket_size; switch ( bucket ) { case PAPI_PROFIL_BUCKET_16: bucket_size = sizeof ( short ); break; case PAPI_PROFIL_BUCKET_32: bucket_size = sizeof ( int ); break; case PAPI_PROFIL_BUCKET_64: bucket_size = sizeof ( unsigned long long ); break; default: bucket_size = 0; break; } return ( bucket_size ); } /* A standardized header printing routine. No assumed globals. */ void prof_head( unsigned long blength, int bucket, int num_buckets, char *header ) { int bucket_size = prof_buckets( bucket ); printf ( "\n------------------------------------------------------------\n" ); printf( "PAPI_profil() hash table, Bucket size: %d bits.\n", bucket_size * 8 ); printf( "Number of buckets: %d.\nLength of buffer: %ld bytes.\n", num_buckets, blength ); printf( "------------------------------------------------------------\n" ); printf( "%s\n", header ); } /* This function prints a standardized profile output based on the bucket size. A row consisting of an address and 'n' data elements is displayed for each address with at least one non-zero bucket. Assumes global profbuf[] array pointers. */ void prof_out( caddr_t start, int n, int bucket, int num_buckets, unsigned int scale ) { int i, j; unsigned short buf_16; unsigned int buf_32; unsigned long long buf_64; unsigned short **buf16 = ( unsigned short ** ) profbuf; unsigned int **buf32 = ( unsigned int ** ) profbuf; unsigned long long **buf64 = ( unsigned long long ** ) profbuf; if ( !TESTS_QUIET ) { /* printf("0x%lx\n",(unsigned long) start + (unsigned long) (2 * i)); */ /* printf("start: %p; i: %#x; scale: %#x; i*scale: %#x; i*scale >>15: %#x\n", start, i, scale, i*scale, (i*scale)>>15); */ switch ( bucket ) { case PAPI_PROFIL_BUCKET_16: for ( i = 0; i < num_buckets; i++ ) { for ( j = 0, buf_16 = 0; j < n; j++ ) buf_16 |= ( buf16[j] )[i]; if ( buf_16 ) { /* On 32bit builds with gcc 4.3 gcc complained about casting caddr_t => long long * Thus the unsigned long to long long cast */ printf( "0x%-16llx", (long long) (unsigned long)start + ( ( ( long long ) i * scale ) >> 15 ) ); for ( j = 0, buf_16 = 0; j < n; j++ ) printf( "\t%d", ( buf16[j] )[i] ); printf( "\n" ); } } break; case PAPI_PROFIL_BUCKET_32: for ( i = 0; i < num_buckets; i++ ) { for ( j = 0, buf_32 = 0; j < n; j++ ) buf_32 |= ( buf32[j] )[i]; if ( buf_32 ) { printf( "0x%-16llx", (long long) (unsigned long)start + ( ( ( long long ) i * scale ) >> 15 ) ); for ( j = 0, buf_32 = 0; j < n; j++ ) printf( "\t%d", ( buf32[j] )[i] ); printf( "\n" ); } } break; case PAPI_PROFIL_BUCKET_64: for ( i = 0; i < num_buckets; i++ ) { for ( j = 0, buf_64 = 0; j < n; j++ ) buf_64 |= ( buf64[j] )[i]; if ( buf_64 ) { printf( "0x%-16llx", (long long) (unsigned long)start + ( ( ( long long ) i * scale ) >> 15 ) ); for ( j = 0, buf_64 = 0; j < n; j++ ) printf( "\t%lld", ( buf64[j] )[i] ); printf( "\n" ); } } break; } printf ( "------------------------------------------------------------\n\n" ); } } /* This function checks to make sure that some buffer value somewhere is nonzero. If all buffers are empty, zero is returned. This usually indicates a profiling failure. Assumes global profbuf[]. */ int prof_check( int n, int bucket, int num_buckets ) { int i, j; int retval = 0; unsigned short **buf16 = ( unsigned short ** ) profbuf; unsigned int **buf32 = ( unsigned int ** ) profbuf; unsigned long long **buf64 = ( unsigned long long ** ) profbuf; switch ( bucket ) { case PAPI_PROFIL_BUCKET_16: for ( i = 0; i < num_buckets; i++ ) for ( j = 0; j < n; j++ ) retval = retval || buf16[j][i]; break; case PAPI_PROFIL_BUCKET_32: for ( i = 0; i < num_buckets; i++ ) for ( j = 0; j < n; j++ ) retval = retval || buf32[j][i]; break; case PAPI_PROFIL_BUCKET_64: for ( i = 0; i < num_buckets; i++ ) for ( j = 0; j < n; j++ ) retval = retval || buf64[j][i]; break; } return ( retval ); } /* Computes the length (in bytes) of the buffer required for profiling. 'plength' is the profile length, or address range to be profiled. By convention, it is assumed that there are half as many buckets as addresses. The scale factor is a fixed point fraction in which 0xffff = ~1 0x8000 = 1/2 0x4000 = 1/4, etc. Thus, the number of profile buckets is (plength/2) * (scale/65536), and the length (in bytes) of the profile buffer is buckets * bucket size. */ unsigned long prof_size( unsigned long plength, unsigned scale, int bucket, int *num_buckets ) { unsigned long blength; long long llength = ( ( long long ) plength * scale ); int bucket_size = prof_buckets( bucket ); *num_buckets = ( int ) ( llength / 65536 / 2 ); blength = ( unsigned long ) ( *num_buckets * bucket_size ); return ( blength ); } papi-5.3.0/src/ctests/fork2.c0000600003276200002170000000205412247131121015420 0ustar ralphundrgrad/* * File: fork2.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: PAPI_library_init() fork(); / \ parent child wait() PAPI_shutdown() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); if ( fork( ) == 0 ) { PAPI_shutdown(); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "forked PAPI_library_init", retval ); exit( 0 ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_single_event.c0000600003276200002170000001200712247131121021001 0ustar ralphundrgrad/* * File: overflow_single_event.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: overflow dispatch of an eventset with just a single event. The Eventset contains: + PAPI_FP_INS (overflow monitor) - Start eventset 1 - Do flops - Stop and measure eventset 1 - Set up overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 */ #include "papi_test.h" #define OVER_FMT "handler(%d ) Overflow at %p overflow_vector=0x%llx!\n" #define OUT_FMT "%-12s : %16lld%16lld\n" static int total = 0; /* total overflows */ void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } total++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long values[2] = { 0, 0 }; long long min, max; int num_flops = NUM_FLOPS, retval; int PAPI_event = 0, mythreshold; char event_name[PAPI_MAX_STR_LEN]; const PAPI_hw_info_t *hw_info = NULL; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( ( !strncmp( hw_info->model_string, "UltraSPARC", 10 ) && !( strncmp( hw_info->vendor_string, "SUN", 3 ) ) ) || ( !strncmp( hw_info->model_string, "AMD K7", 6 ) ) || ( !strncmp( hw_info->vendor_string, "Cray", 4 ) ) || ( strstr( hw_info->model_string, "POWER3" ) ) ) { /* query and set up the right instruction to monitor */ if ( PAPI_query_event( PAPI_TOT_INS ) == PAPI_OK ) { PAPI_event = PAPI_TOT_INS; } else { test_fail( __FILE__, __LINE__, "PAPI_TOT_INS not available on this Sun platform!", 0 ); } } else { /* query and set up the right instruction to monitor */ PAPI_event = find_nonderived_event( ); } if (( PAPI_event == PAPI_FP_OPS ) || ( PAPI_event == PAPI_FP_INS )) mythreshold = THRESHOLD; else #if defined(linux) mythreshold = ( int ) hw_info->cpu_max_mhz * 20000; #else mythreshold = THRESHOLD * 2; #endif retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_add_event( EventSet, PAPI_event ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, &values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, &values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); #if defined(linux) || defined(__ia64__) || defined(_POWER4) num_flops *= 2; #endif if ( !TESTS_QUIET ) { if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf ( "Test case: Overflow dispatch of 1st event in set with 1 event.\n" ); printf ( "--------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", mythreshold ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-----------------------------------------------\n" ); printf( "Test type : %16d%16d\n", 1, 2 ); printf( OUT_FMT, event_name, values[0], values[1] ); printf( "Overflows : %16s%16d\n", "", total ); printf( "-----------------------------------------------\n" ); printf( "Verification:\n" ); /* if (PAPI_event == PAPI_FP_INS) printf("Row 1 approximately equals %d %d\n", num_flops, num_flops); printf("Column 1 approximately equals column 2\n"); */ printf( "Row 3 approximately equals %u +- %u %%\n", ( unsigned ) ( ( values[0] ) / ( long long ) mythreshold ), ( unsigned ) ( OVR_TOLERANCE * 100.0 ) ); } /* min = (long long)(values[0]*(1.0-TOLERANCE)); max = (long long)(values[0]*(1.0+TOLERANCE)); if ( values[1] > max || values[1] < min ) test_fail(__FILE__, __LINE__, event_name, 1); */ min = ( long long ) ( ( ( double ) values[0] * ( 1.0 - OVR_TOLERANCE ) ) / ( double ) mythreshold ); max = ( long long ) ( ( ( double ) values[0] * ( 1.0 + OVR_TOLERANCE ) ) / ( double ) mythreshold ); if ( total > max || total < min ) test_fail( __FILE__, __LINE__, "Overflows", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/multiplex1_pthreads.c0000600003276200002170000003152212247131121020375 0ustar ralphundrgrad/* * File: multiplex1_pthreads.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file tests the multiplex pthread functionality */ #include #include "papi_test.h" #define TOTAL_EVENTS 10 int solaris_preset_PAPI_events[TOTAL_EVENTS] = { PAPI_BR_MSP, PAPI_TOT_CYC, PAPI_L2_TCM, PAPI_L1_ICM, 0 }; int power6_preset_PAPI_events[TOTAL_EVENTS] = { PAPI_FP_INS, PAPI_TOT_CYC, PAPI_L1_DCM, PAPI_L1_ICM, 0 }; int preset_PAPI_events[TOTAL_EVENTS] = { PAPI_FP_INS, PAPI_TOT_INS, PAPI_L1_DCM, PAPI_L1_ICM, 0 }; static int PAPI_events[TOTAL_EVENTS] = { 0, }; static int PAPI_events_len = 0; #define CPP_TEST_FAIL(string, retval) test_fail(__FILE__, __LINE__, string, retval) void init_papi_pthreads( int *out_events, int *len ) { int retval; int i, real_len = 0; int *in_events = preset_PAPI_events; const PAPI_hw_info_t *hw_info; /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) CPP_TEST_FAIL( "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( strstr( hw_info->model_string, "UltraSPARC" ) ) { in_events = solaris_preset_PAPI_events; } if ( strcmp( hw_info->model_string, "POWER6" ) == 0 ) { in_events = power6_preset_PAPI_events; retval = PAPI_set_domain( PAPI_DOM_ALL ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_set_domain", retval ); } retval = PAPI_multiplex_init( ); if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } else if ( retval != PAPI_OK ) { CPP_TEST_FAIL( "PAPI_multiplex_init", retval ); } if ( ( retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } for ( i = 0; in_events[i] != 0; i++ ) { char out[PAPI_MAX_STR_LEN]; /* query and set up the right instruction to monitor */ retval = PAPI_query_event( in_events[i] ); if ( retval == PAPI_OK ) { out_events[real_len++] = in_events[i]; PAPI_event_code_to_name( in_events[i], out ); if ( real_len == *len ) break; } else { PAPI_event_code_to_name( in_events[i], out ); if ( !TESTS_QUIET ) printf( "%s does not exist\n", out ); } } if ( real_len < 1 ) CPP_TEST_FAIL( "No counters available", 0 ); *len = real_len; } int do_pthreads( void *( *fn ) ( void * ) ) { int i, rc, retval; pthread_attr_t attr; pthread_t id[NUM_THREADS]; pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif for ( i = 0; i < NUM_THREADS; i++ ) { rc = pthread_create( &id[i], &attr, fn, NULL ); if ( rc ) return ( FAILURE ); } for ( i = 0; i < NUM_THREADS; i++ ) pthread_join( id[i], NULL ); pthread_attr_destroy( &attr ); return ( SUCCESS ); } /* Tests that PAPI_multiplex_init does not mess with normal operation. */ void * case1_pthreads( void *arg ) { ( void ) arg; /*unused */ int retval, i, EventSet = PAPI_NULL; long long values[2]; if ( ( retval = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "case1 thread %4x:", ( unsigned ) pthread_self( ) ); test_print_event_header( "", EventSet ); printf( "case1 thread %4x:", ( unsigned ) pthread_self( ) ); printf( TAB2, "", values[0], values[1] ); } if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) /* JT */ test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_destroy_eventset( &EventSet) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( ( retval = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( ( void * ) SUCCESS ); } /* Tests that PAPI_set_multiplex() works before adding events */ void * case2_pthreads( void *arg ) { ( void ) arg; /*unused */ int retval, i, EventSet = PAPI_NULL; long long values[2]; if ( ( retval = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_assign_eventset_component", retval ); if ( ( retval = PAPI_set_multiplex( EventSet ) ) != PAPI_OK ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } printf( "++case2 thread %4x:", ( unsigned ) pthread_self( ) ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "case2 thread %4x:", ( unsigned ) pthread_self( ) ); test_print_event_header( "", EventSet ); printf( "case2 thread %4x:", ( unsigned ) pthread_self( ) ); printf( TAB2, "", values[0], values[1] ); } if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) /* JT */ test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_destroy_eventset( &EventSet) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( ( retval = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( ( void * ) SUCCESS ); } /* Tests that PAPI_set_multiplex() works after adding events */ void * case3_pthreads( void *arg ) { ( void ) arg; /*unused */ int retval, i, EventSet = PAPI_NULL; long long values[2]; if ( ( retval = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { char out[PAPI_MAX_STR_LEN]; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); if ( !TESTS_QUIET ) printf( "Added %s\n", out ); } if ( ( retval = PAPI_set_multiplex( EventSet ) ) != PAPI_OK ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "case3 thread %4x:", ( unsigned ) pthread_self( ) ); test_print_event_header( "", EventSet ); printf( "case3 thread %4x:", ( unsigned ) pthread_self( ) ); printf( TAB2, "", values[0], values[1] ); } if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) /* JT */ test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_destroy_eventset( &EventSet) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( ( retval = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( ( void * ) SUCCESS ); } /* Tests that PAPI_set_multiplex() works before/after adding events */ void * case4_pthreads( void *arg ) { ( void ) arg; /*unused */ int retval, i, EventSet = PAPI_NULL; long long values[4]; char out[PAPI_MAX_STR_LEN]; if ( ( retval = PAPI_register_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); i = 0; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); if ( ( retval = PAPI_set_multiplex( EventSet ) ) != PAPI_OK ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } i = 1; retval = PAPI_add_event( EventSet, PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); PAPI_event_code_to_name( PAPI_events[i], out ); printf( "Added %s\n", out ); do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "case4 thread %4x:", ( unsigned ) pthread_self( ) ); test_print_event_header( "", EventSet ); printf( "case4 thread %4x:", ( unsigned ) pthread_self( ) ); printf( TAB2, "", values[0], values[1] ); } if ( ( retval = PAPI_cleanup_eventset( EventSet ) ) != PAPI_OK ) /* JT */ test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_destroy_eventset( &EventSet) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); if ( ( retval = PAPI_unregister_thread( ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( ( void * ) SUCCESS ); } int case1( void ) { int retval; PAPI_events_len = 2; init_papi_pthreads( PAPI_events, &PAPI_events_len ); retval = do_pthreads( case1_pthreads ); PAPI_shutdown( ); return ( retval ); } int case2( void ) { int retval; PAPI_events_len = 2; init_papi_pthreads( PAPI_events, &PAPI_events_len ); retval = do_pthreads( case2_pthreads ); PAPI_shutdown( ); return ( retval ); } int case3( void ) { int retval; PAPI_events_len = 2; init_papi_pthreads( PAPI_events, &PAPI_events_len ); retval = do_pthreads( case3_pthreads ); PAPI_shutdown( ); return ( retval ); } int case4( void ) { int retval; PAPI_events_len = 2; init_papi_pthreads( PAPI_events, &PAPI_events_len ); retval = do_pthreads( case4_pthreads ); PAPI_shutdown( ); return ( retval ); } int main( int argc, char **argv ) { int retval; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ printf( "%s: Using %d threads\n\n", argv[0], NUM_THREADS ); printf ( "case1: Does PAPI_multiplex_init() not break regular operation?\n" ); if ( case1( ) != SUCCESS ) test_fail( __FILE__, __LINE__, "case1", PAPI_ESYS ); printf( "case2: Does setmpx/add work?\n" ); if ( case2( ) != SUCCESS ) test_fail( __FILE__, __LINE__, "case2", PAPI_ESYS ); printf( "case3: Does add/setmpx work?\n" ); if ( case3( ) != SUCCESS ) test_fail( __FILE__, __LINE__, "case3", PAPI_ESYS ); printf( "case4: Does add/setmpx/add work?\n" ); if ( case4( ) != SUCCESS ) test_fail( __FILE__, __LINE__, "case4", PAPI_ESYS ); retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) CPP_TEST_FAIL( "PAPI_library_init", retval ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/nmi_watchdog.c0000600003276200002170000000440012247131121017035 0ustar ralphundrgrad/* If the NMI watchdog is enabled it will steal a performance counter. */ /* There is a bug that if you try to use the maximum number of counters */ /* (not counting the stolen one) with a group leader, sys_perf_open() */ /* will indicate success, as will starting the count, but you will fail */ /* at read time. */ /* This bug still exists in 3.x */ /* The perf NMI watchdog was not introduced until 2.6.34 */ /* This also triggers in the case of the schedulability bug */ /* but since that was fixed in 2.6.34 then in theory there is */ /* no overlap in the tests. */ #include "papi_test.h" int detect_nmi_watchdog(void) { int watchdog_detected=0,watchdog_value=0; FILE *fff; fff=fopen("/proc/sys/kernel/nmi_watchdog","r"); if (fff!=NULL) { if (fscanf(fff,"%d",&watchdog_value)==1) { if (watchdog_value>0) watchdog_detected=1; } fclose(fff); } else { watchdog_detected=-1; } return watchdog_detected; } int main( int argc, char **argv ) { int retval,watchdog_active=0; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init the PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } watchdog_active=detect_nmi_watchdog(); if (watchdog_active<0) { test_skip( __FILE__, __LINE__, "nmi_watchdog file does not exist\n", 0); } if (watchdog_active) { if (!TESTS_QUIET) { printf("\nOn perf_event kernels with the nmi_watchdog enabled\n"); printf("the watchdog steals an event, but the scheduability code\n"); printf("is not notified. Thus adding a full complement of events\n"); printf("seems to pass, but then fails at read time.\n"); printf("Because of this, PAPI has to do some slow workarounds.\n"); printf("For best PAPI performance, you may wish to disable\n"); printf("the watchdog by running (as root)\n"); printf("\techo \"0\" > /proc/sys/kernel/nmi_watchdog\n\n"); } test_warn( __FILE__, __LINE__, "NMI Watchdog Active, enabling slow workarounds", 0 ); } test_pass( __FILE__, NULL, 0 ); return 0; } papi-5.3.0/src/ctests/profile_twoevents.c0000600003276200002170000000616412247131121020161 0ustar ralphundrgrad/* * File: profile_twoevents.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: profiling two events */ #include "papi_test.h" #include "prof_utils.h" int main( int argc, char **argv ) { int i, num_tests = 6; unsigned long length, blength; int num_buckets, mask; char title[80]; int retval; const PAPI_exe_info_t *prginfo; caddr_t start, end; prof_init( argc, argv, &prginfo ); mask = prof_events( num_tests ); start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; /* Must have at least FP instr or Tot ins */ if ( ( ( mask & MASK_FP_INS ) == 0 ) && ( ( mask & MASK_TOT_INS ) == 0 ) ) { test_skip( __FILE__, __LINE__, "No FP or Total Ins. event", 1 ); } if ( start > end ) test_fail( __FILE__, __LINE__, "Profile length < 0!", 0 ); length = ( unsigned long ) ( end - start ); prof_print_address ( "Test case profile: POSIX compatible profiling with two events.\n", prginfo ); prof_print_prof_info( start, end, THRESHOLD, event_name ); prof_alloc( 2, length ); blength = prof_size( length, FULL_SCALE, PAPI_PROFIL_BUCKET_16, &num_buckets ); do_no_profile( ); if ( !TESTS_QUIET ) { printf( "Test type : \tPAPI_PROFIL_POSIX\n" ); } if ( ( retval = PAPI_profil( profbuf[0], ( unsigned int ) blength, start, FULL_SCALE, EventSet, PAPI_event, THRESHOLD, PAPI_PROFIL_POSIX ) ) != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } if ( ( retval = PAPI_profil( profbuf[1], ( unsigned int ) blength, start, FULL_SCALE, EventSet, PAPI_TOT_CYC, THRESHOLD, PAPI_PROFIL_POSIX ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( TAB1, event_name, ( values[1] )[0] ); printf( TAB1, "PAPI_TOT_CYC:", ( values[1] )[1] ); } if ( ( retval = PAPI_profil( profbuf[0], ( unsigned int ) blength, start, FULL_SCALE, EventSet, PAPI_event, 0, PAPI_PROFIL_POSIX ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); if ( ( retval = PAPI_profil( profbuf[1], ( unsigned int ) blength, start, FULL_SCALE, EventSet, PAPI_TOT_CYC, 0, PAPI_PROFIL_POSIX ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); sprintf( title, " \t\t %s\tPAPI_TOT_CYC\naddress\t\t\tcounts\tcounts\n", event_name ); prof_head( blength, PAPI_PROFIL_BUCKET_16, num_buckets, title ); prof_out( start, 2, PAPI_PROFIL_BUCKET_16, num_buckets, FULL_SCALE ); remove_test_events( &EventSet, mask ); retval = prof_check( 2, PAPI_PROFIL_BUCKET_16, num_buckets ); for ( i = 0; i < 2; i++ ) { free( profbuf[i] ); } if ( retval == 0 ) test_fail( __FILE__, __LINE__, "No information in buffers", 1 ); test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/shlib.c0000600003276200002170000001030612247131121015475 0ustar ralphundrgrad/* * File: profile.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ #include "papi_test.h" void print_shlib_info_map(const PAPI_shlib_info_t *shinfo) { PAPI_address_map_t *map = shinfo->map; int i; if (NULL == map) { test_fail(__FILE__, __LINE__, "PAPI_get_shared_lib_info", 1); } for ( i = 0; i < shinfo->count; i++ ) { printf( "Library: %s\n", map->name ); printf( "Text start: %p, Text end: %p\n", map->text_start, map->text_end ); printf( "Data start: %p, Data end: %p\n", map->data_start, map->data_end ); printf( "Bss start: %p, Bss end: %p\n", map->bss_start, map->bss_end ); if ( strlen( &(map->name[0]) ) == 0 ) test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); if ( ( map->text_start == 0x0 ) || ( map->text_end == 0x0 ) || ( map->text_start >= map->text_end ) ) test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); /* if ((map->data_start == 0x0) || (map->data_end == 0x0) || (map->data_start >= map->data_end)) test_fail(__FILE__, __LINE__, "PAPI_get_shared_lib_info",1); if (((map->bss_start) && (!map->bss_end)) || ((!map->bss_start) && (map->bss_end)) || (map->bss_start > map->bss_end)) test_fail(__FILE__, __LINE__, "PAPI_get_shared_lib_info",1); */ map++; } } void display( char *msg ) { int i; for (i=0; i<64; i++) { printf( "%1d", (msg[i] ? 1 : 0) ); } printf("\n"); } int main( int argc, char **argv ) { int retval; const PAPI_shlib_info_t *shinfo; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( shinfo = PAPI_get_shared_lib_info( ) ) == NULL ) { test_skip( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); } if ( ( shinfo->count == 0 ) && ( shinfo->map ) ) { test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); } print_shlib_info_map(shinfo); sleep( 1 ); /* Needed for debugging, so you can ^Z and stop the process, inspect /proc to see if it's right */ #ifndef NO_DLFCN { char *_libname = "libcrypt.so"; void *handle; void ( *setkey) (const char *key); void ( *encrypt) (char block[64], int edflag); char key[64]={ 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0, 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0, 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0, 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0, }; /* bit pattern for key */ char orig[64]; /* bit pattern for messages */ char txt[64]; /* bit pattern for messages */ int oldcount; handle = dlopen( _libname, RTLD_NOW ); if ( !handle ) { printf( "dlopen: %s\n", dlerror( ) ); printf ( "Did you forget to set the environmental variable LIBPATH (in AIX) or LD_LIBRARY_PATH (in linux) ?\n" ); test_fail( __FILE__, __LINE__, "dlopen", 1 ); } setkey = dlsym( handle, "setkey" ); encrypt = dlsym( handle, "encrypt" ); if ( setkey == NULL || encrypt == NULL) { printf( "dlsym: %s\n", dlerror( ) ); test_fail( __FILE__, __LINE__, "dlsym", 1 ); } memset(orig,0,64); memcpy(txt,orig,64); setkey(key); printf("original "); display(txt); encrypt(txt, 0); /* encode */ printf("encrypted "); display(txt); if (!memcmp(txt,orig,64)) test_fail( __FILE__, __LINE__, "encode", 1 ); encrypt(txt, 1); /* decode */ printf("decrypted "); display(txt); if (memcmp(txt,orig,64)) test_fail( __FILE__, __LINE__, "decode", 1 ); oldcount = shinfo->count; if ( ( shinfo = PAPI_get_shared_lib_info( ) ) == NULL ) { test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); } sleep( 1 ); /* Needed for debugging, so you can ^Z and stop the process, inspect /proc to see if it's right */ if ( ( shinfo->count == 0 ) && ( shinfo->map ) ) { test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); } if ( shinfo->count <= oldcount ) { test_fail( __FILE__, __LINE__, "PAPI_get_shared_lib_info", 1 ); } print_shlib_info_map(shinfo); sleep( 1 ); /* Needed for debugging, so you can ^Z and stop the process, inspect /proc to see if it's right */ dlclose( handle ); } #endif test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/profile.c0000600003276200002170000001233612247131121016041 0ustar ralphundrgrad/* * File: profile.c * CVS: $Id$ * Author: Philip Mucci * mucci@cs.utk.edu * Mods: Dan Terpstra * terpstra@cs.utk.edu * Mods: Maynard Johnson * maynardj@us.ibm.com * Mods: * */ /* This file performs the following test: profiling and program info option call - This tests the SVR4 profiling interface of PAPI. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). The Eventset contains: + PAPI_FP_INS (to profile) + PAPI_TOT_CYC - Set up profile - Start eventset 1 - Do both (flops and reads) - Stop eventset 1 */ #include "papi_test.h" #include "prof_utils.h" #define PROFILE_ALL static int do_profile( caddr_t start, unsigned long plength, unsigned scale, int thresh, int bucket ); int main( int argc, char **argv ) { int num_tests = 6; long length; int mask; int retval; int mythreshold = THRESHOLD; const PAPI_exe_info_t *prginfo; caddr_t start, end; prof_init( argc, argv, &prginfo ); mask = prof_events( num_tests ); #ifdef PROFILE_ALL /* use these lines to profile entire code address space */ start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; #else /* use these lines to profile only do_flops address space */ start = ( caddr_t ) do_flops; end = ( caddr_t ) fdo_flops; /* Itanium and ppc64 processors return function descriptors instead of function addresses. You must dereference the descriptor to get the address. */ #if defined(ITANIUM1) || defined(ITANIUM2) || defined(__powerpc64__) start = ( caddr_t ) ( ( ( struct fdesc * ) start )->ip ); end = ( caddr_t ) ( ( ( struct fdesc * ) end )->ip ); #endif #endif #if defined(linux) { char *tmp = getenv( "THRESHOLD" ); if ( tmp ) mythreshold = atoi( tmp ); } #endif length = end - start; if ( length < 0 ) test_fail( __FILE__, __LINE__, "Profile length < 0!", ( int ) length ); prof_print_address ( "Test case profile: POSIX compatible profiling with hardware counters.\n", prginfo ); prof_print_prof_info( start, end, mythreshold, event_name ); retval = do_profile( start, ( unsigned long ) length, FULL_SCALE, mythreshold, PAPI_PROFIL_BUCKET_16 ); if ( retval == PAPI_OK ) retval = do_profile( start, ( unsigned long ) length, FULL_SCALE, mythreshold, PAPI_PROFIL_BUCKET_32 ); if ( retval == PAPI_OK ) retval = do_profile( start, ( unsigned long ) length, FULL_SCALE, mythreshold, PAPI_PROFIL_BUCKET_64 ); remove_test_events( &EventSet, mask ); test_pass( __FILE__, values, num_tests ); exit( 1 ); } static int do_profile( caddr_t start, unsigned long plength, unsigned scale, int thresh, int bucket ) { int i, retval; unsigned long blength; int num_buckets; char *profstr[5] = { "PAPI_PROFIL_POSIX", "PAPI_PROFIL_RANDOM", "PAPI_PROFIL_WEIGHTED", "PAPI_PROFIL_COMPRESS", "PAPI_PROFIL_" }; int profflags[5] = { PAPI_PROFIL_POSIX, PAPI_PROFIL_POSIX | PAPI_PROFIL_RANDOM, PAPI_PROFIL_POSIX | PAPI_PROFIL_WEIGHTED, PAPI_PROFIL_POSIX | PAPI_PROFIL_COMPRESS, PAPI_PROFIL_POSIX | PAPI_PROFIL_WEIGHTED | PAPI_PROFIL_RANDOM | PAPI_PROFIL_COMPRESS }; do_no_profile( ); blength = prof_size( plength, scale, bucket, &num_buckets ); prof_alloc( 5, blength ); for ( i = 0; i < 5; i++ ) { if ( !TESTS_QUIET ) printf( "Test type : \t%s\n", profstr[i] ); #ifndef SWPROFILE if ( ( retval = PAPI_profil( profbuf[i], ( unsigned int ) blength, start, scale, EventSet, PAPI_event, thresh, profflags[i] | bucket ) ) != PAPI_OK ) { if (retval==PAPI_ENOSUPP) { char warning[BUFSIZ]; sprintf(warning,"PAPI_profil %s not supported", profstr[i]); test_warn( __FILE__, __LINE__, warning, 1 ); } else { test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } } #else if ( ( retval = PAPI_profil( profbuf[i], ( unsigned int ) blength, start, scale, EventSet, PAPI_event, thresh, profflags[i] | bucket | PAPI_PROFIL_FORCE_SW ) ) != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } #endif if ( retval != PAPI_OK ) break; if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( getenv( "NUM_FLOPS" ) ? atoi( getenv( "NUM_FLOPS" ) ) : NUM_FLOPS ); if ( ( retval = PAPI_stop( EventSet, values[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( TAB1, event_name, ( values[1] )[0] ); printf( TAB1, "PAPI_TOT_CYC", ( values[1] )[1] ); } if ( ( retval = PAPI_profil( profbuf[i], ( unsigned int ) blength, start, scale, EventSet, PAPI_event, 0, profflags[i] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_profil", retval ); } if ( retval == PAPI_OK ) { prof_head( blength, bucket, num_buckets, "address\t\t\tflat\trandom\tweight\tcomprs\tall\n" ); prof_out( start, 5, bucket, num_buckets, scale ); retval = prof_check( 5, bucket, num_buckets ); } for ( i = 0; i < 5; i++ ) { free( profbuf[i] ); } return ( retval ); } papi-5.3.0/src/ctests/net-mpi-test/0000700003276200002170000000000012247131121016554 5ustar ralphundrgradpapi-5.3.0/src/ctests/net-mpi-test/Makefile0000600003276200002170000000264212247131121020222 0ustar ralphundrgradCC = gcc CC_R = gcc -pthread CC_SHR = gcc -shared #MXMPIPATH = /usr/local/mpich/mpich-gcc #MXMPIPATH = /usr/local/mpich-mx #MPICC = $(MXMPIPATH)/bin/mpicc #MPICC = /usr/bin/mpicc MPICC = mpicc MPICC_SHR = $(MPICC) -shared MPICCLD_SHR = $(MPICC_SHR) F77 = g77 FLAGS = -g -Wall CFLAGS = $(FLAGS) -O3 # -DPROFILE_TIMER -DDEBUG -DVERBOSE BLASLIBS = -lblas #BLASLIBS = -L/usr/local/lib -lf77blas -latlas LAPACKLIBS = -llapack UTILOBJS= ../do_loops.o ../test_utils.o ../dummy.o INCLUDE = -I.. -I../.. -I/usr/include PAPILIB = -L../.. -lpapi MPILIBS = MPIINC = XTRALIBS = PTHRLIBS = MPILIBS = LIBS =$(PAPILIB) -lm TESTS = cpi tests: $(TESTS) # Applications # Test programs ../test_utils.o: ../test_utils.c ../papi_test.h ../test_utils.h $(CC) $(CFLAGS) $(INCLUDE) -c ../test_utils.c -o ../test_utils.o ../do_loops.o: ../do_loops.c ../papi_test.h ../test_utils.h $(CC) $(CFLAGS) $(INCLUDE) -c ../do_loops.c -o ../do_loops.o ../dummy.o: ../dummy.c $(CC) $(CFLAGS) $(INCLUDE) -c ../dummy.c -o ../dummy.o cpi: cpi.c $(UTILOBJS) $(MPICC) $(MPFLAGS) $(CFLAGS) $(INCLUDE) $(MPIINC) $(TOPTFLAGS) cpi.c $(UTILOBJS) $(PAPILIB) $(MPILIBS) -o cpi #cpi: cpi.c # $(MPICC) $(FLAGS) cpi.c -o $@ $(MPIPERFLIBS) $(XTRALIBS) $(MPILIBS) -lm clean: rm -f core $(TESTS) *~ *.o papi-5.3.0/src/ctests/net-mpi-test/cpi.c0000600003276200002170000001165412247131121017504 0ustar ralphundrgrad/* From Dave McNamara at PSRV. Thanks! */ /* If an event is countable but you've exhausted the counter resources and you try to add an event, it seems subsequent PAPI_start and/or PAPI_stop will causes a Seg. Violation. I got around this by calling PAPI to get the # of countable events, then making sure that I didn't try to add more than these number of events. I still have a problem if someone adds Level 2 cache misses and then adds FLOPS 'cause I didn't count FLOPS as actually requiring 2 counters. */ #include "papi_test.h" #include #include #include extern int TESTS_QUIET; /* Declared in test_utils.c */ char *netevents[] = { "LO_RX_PACKETS", "LO_TX_PACKETS", "ETH0_RX_PACKETS", "ETH0_TX_PACKETS" }; double f( double a ) { return ( 4.0 / ( 1.0 + a * a ) ); } int main( int argc, char **argv ) { int EventSet = PAPI_NULL, EventSet1 = PAPI_NULL; int evtcode; int retval, i, ins = 0; long long g1[2], g2[2]; int done = 0, n, myid, numprocs; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x; double startwtime = 0.0, endwtime; int namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); if ( ( retval = PAPI_create_eventset( &EventSet1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); PAPI_event_name_to_code( netevents[2], &evtcode ); if ( ( retval = PAPI_query_event( evtcode ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_aquery_event", retval ); } if ( ( retval = PAPI_add_event( EventSet, evtcode ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } PAPI_event_name_to_code( netevents[3], &evtcode ); if ( ( retval = PAPI_query_event( evtcode ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_aquery_event", retval ); } if ( ( retval = PAPI_add_event( EventSet, evtcode ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } if ( ( retval = PAPI_query_event( PAPI_FP_INS ) ) != PAPI_OK ) { if ( ( retval = PAPI_query_event( PAPI_FP_OPS ) ) == PAPI_OK ) { ins = 2; if ( ( retval = PAPI_add_event( EventSet1, PAPI_FP_OPS ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } } } else { ins = 1; if ( ( retval = PAPI_add_event( EventSet1, PAPI_FP_INS ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } } if ( ( retval = PAPI_add_event( EventSet1, PAPI_TOT_CYC ) ) != PAPI_OK ) { if ( retval != PAPI_ECNFLCT ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); } MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &numprocs ); MPI_Comm_rank( MPI_COMM_WORLD, &myid ); MPI_Get_processor_name( processor_name, &namelen ); fprintf( stdout, "Process %d of %d on %s\n", myid, numprocs, processor_name ); fflush( stdout ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); if ( ( retval = PAPI_start( EventSet1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); n = 0; while ( !done ) { if ( myid == 0 ) { if ( n == 0 ) n = 1000000; else n = 0; startwtime = MPI_Wtime( ); } MPI_Bcast( &n, 1, MPI_INT, 0, MPI_COMM_WORLD ); if ( n == 0 ) done = 1; else { h = 1.0 / ( double ) n; sum = 0.0; /* A slightly better approach starts from large i and works back */ for ( i = myid + 1; i <= n; i += numprocs ) { x = h * ( ( double ) i - 0.5 ); sum += f( x ); } mypi = h * sum; MPI_Reduce( &mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD ); if ( myid == 0 ) { printf( "pi is approximately %.16f, Error is %.16f\n", pi, fabs( pi - PI25DT ) ); endwtime = MPI_Wtime( ); printf( "wall clock time = %f\n", endwtime - startwtime ); fflush( stdout ); } } } if ( ( retval = PAPI_stop( EventSet1, g1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( ( retval = PAPI_stop( EventSet, g2 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); MPI_Finalize( ); printf( "ETH0_RX_BYTES: %lld ETH0_TX_BYTES: %lld\n", g2[0], g2[1] ); if ( ins == 0 ) { printf( "PAPI_TOT_CYC : %lld\n", g1[0] ); } else if ( ins == 1 ) { printf( "PAPI_FP_INS : %lld PAPI_TOT_CYC : %lld\n", g1[0], g1[1] ); } else if ( ins == 2 ) { printf( "PAPI_FP_OPS : %lld PAPI_TOT_CYC : %lld\n", g1[0], g1[1] ); } test_pass( __FILE__, NULL, 0 ); return 0; } papi-5.3.0/src/ctests/net-mpi-test/cpi.pbs0000600003276200002170000000325712247131121020046 0ustar ralphundrgrad#!/bin/bash ############################################################ ## Template PBS Job Script for Parallel Job on Myrinet Nodes ## ## Lines beginning with '#PBS' are PBS directives, see ## 'man qsub' for additional information. ############################################################ ### Set the job name #PBS -N cpi ### Set the queue to submit this job: ALWAYS use the default queue ##PBS -q workq ### Set the number of nodes that will be used, 4 in this case, ### use a single processor per node (ppn=1), and use Myrinet #PBS -l nodes=4 ### The following command computes the number of processors requested ### from the file containing the list of nodes assigned to the job export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'` ### The following statements dump some diagnostic information to ### the batch job's standard output. echo The master node of this job is `hostname` echo The working directory is `echo $PBS_O_WORKDIR` echo The node file is $PBS_NODEFILE echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" echo This job runs on the following nodes: echo `cat $PBS_NODEFILE` echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" echo This job has allocated $NPROCS nodes echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" ### Change to the working directory of the qsub command. cd $PBS_O_WORKDIR ### Execute the MPI job --- NOTE: It is *crucial* that the proper ### 'mpirun' command (there are several versions of the command ### on the cluster) be used to launch the job---it is safest to use ### the full pathname as is done here. #/usr/local/mpich-mx/bin/mpirun -np $NPROCS -machinefile $PBS_NODEFILE cpi /usr/local/mpich/bin/mpirun -np $NPROCS -machinefile $PBS_NODEFILE cpi papi-5.3.0/src/ctests/branches.c0000600003276200002170000001462512247131121016171 0ustar ralphundrgrad/* * $Id$ * * Test example for branch accuracy and functionality, originally * provided by Timothy Kaiser, SDSC. It was modified to fit the * PAPI test suite by Nils Smeds, . * and Phil Mucci * This example verifies the accuracy of branch events */ #include "papi_test.h" #include #include #define MAXEVENTS 4 #define SLEEPTIME 100 #define MINCOUNTS 100000 static double dummy3( double x, int iters ); int main( int argc, char **argv ) { PAPI_event_info_t info; int i, j, retval; int iters = 10000000; double x = 1.1, y; long long t1, t2; long long values[MAXEVENTS], refvalues[MAXEVENTS]; int sleep_time = SLEEPTIME; double spread[MAXEVENTS]; int nevents = MAXEVENTS; int eventset = PAPI_NULL; int events[MAXEVENTS]; events[0] = PAPI_BR_NTK; events[1] = PAPI_BR_PRC; events[2] = PAPI_BR_INS; events[3] = PAPI_BR_MSP; /* events[3]=PAPI_BR_CN; events[4]=PAPI_BR_UCN;*/ /*events[5]=PAPI_BR_TKN; */ for ( i = 0; i < MAXEVENTS; i++ ) { values[i] = 0; } if ( argc > 1 ) { if ( !strcmp( argv[1], "TESTS_QUIET" ) ) tests_quiet( argc, argv ); else { sleep_time = atoi( argv[1] ); if ( sleep_time <= 0 ) sleep_time = SLEEPTIME; } } if ( !TESTS_QUIET ) { printf( "\nAccuracy check of branch presets.\n" ); printf( "Comparing a measurement with separate measurements.\n\n" ); } if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( retval = PAPI_create_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); #ifdef MPX if ( ( retval = PAPI_multiplex_init( ) ) ) test_fail( __FILE__, __LINE__, "PAPI_multiplex_init", retval ); if ( ( retval = PAPI_set_multiplex( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); #endif nevents = 0; for ( i = 0; i < MAXEVENTS; i++ ) { if ( PAPI_query_event( events[i] ) != PAPI_OK ) continue; if ( PAPI_add_event( eventset, events[i] ) == PAPI_OK ) { events[nevents] = events[i]; nevents++; } } if ( nevents < 1 ) test_skip( __FILE__, __LINE__, "Not enough events left...", 0 ); /* Find a reasonable number of iterations (each * event active 20 times) during the measurement */ t2 = 10000 * 20 * nevents; /* Target: 10000 usec/multiplex, 20 repeats */ if ( t2 > 30e6 ) test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); /* Measure one run */ t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); t1 = PAPI_get_real_usec( ) - t1; if ( t2 > t1 ) /* Scale up execution time to match t2 */ iters = iters * ( int ) ( t2 / t1 ); else if ( t1 > 30e6 ) /* Make sure execution time is < 30s per repeated test */ test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); x = 1.0; if ( !TESTS_QUIET ) printf( "\nFirst run: Together.\n" ); t1 = PAPI_get_real_usec( ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); y = dummy3( x, iters ); if ( ( retval = PAPI_stop( eventset, values ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); t2 = PAPI_get_real_usec( ); if ( !TESTS_QUIET ) { printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( y / ( double ) ( t2 - t1 ) ) ); printf( "PAPI grouped measurement:\n" ); } for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); if ( !TESTS_QUIET ) { printf( "%20s = ", info.short_descr ); printf( LLDFMT, values[j] ); printf( "\n" ); } } if ( !TESTS_QUIET ) printf( "\n" ); if ( ( retval = PAPI_remove_events( eventset, events, nevents ) ) ) test_fail( __FILE__, __LINE__, "PAPI_remove_events", retval ); if ( ( retval = PAPI_destroy_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); eventset = PAPI_NULL; if ( ( retval = PAPI_create_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); for ( i = 0; i < nevents; i++ ) { if ( ( retval = PAPI_cleanup_eventset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_add_event( eventset, events[i] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); x = 1.0; if ( !TESTS_QUIET ) printf( "\nReference measurement %d (of %d):\n", i + 1, nevents ); t1 = PAPI_get_real_usec( ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); y = dummy3( x, iters ); if ( ( retval = PAPI_stop( eventset, &refvalues[i] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); t2 = PAPI_get_real_usec( ); if ( !TESTS_QUIET ) { printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( y / ( double ) ( t2 - t1 ) ) ); } PAPI_get_event_info( events[i], &info ); if ( !TESTS_QUIET ) { printf( "PAPI results:\n%20s = ", info.short_descr ); printf( LLDFMT, refvalues[i] ); printf( "\n" ); } } if ( !TESTS_QUIET ) printf( "\n" ); if ( !TESTS_QUIET ) { printf( "\n\nRelative accuracy:\n" ); for ( j = 0; j < nevents; j++ ) printf( " Event %.2d", j ); printf( "\n" ); } for ( j = 0; j < nevents; j++ ) { spread[j] = abs( ( int ) ( refvalues[j] - values[j] ) ); if ( values[j] ) spread[j] /= ( double ) values[j]; if ( !TESTS_QUIET ) printf( "%10.3g ", spread[j] ); /* Make sure that NaN get counted as errors */ if ( spread[j] < MPX_TOLERANCE ) i--; else if ( refvalues[j] < MINCOUNTS ) /* Neglect inprecise results with low counts */ i--; } if ( !TESTS_QUIET ) printf( "\n\n" ); if ( i ) test_fail( __FILE__, __LINE__, "Values outside threshold", i ); else test_pass( __FILE__, NULL, 0 ); return 0; } static double dummy3( double x, int iters ) { int i; double w, y, z, a, b, c, d, e, f, g, h; double one; one = 1.0; w = x; y = x; z = x; a = x; b = x; c = x; d = x; e = x; f = x; g = x; h = x; for ( i = 1; i <= iters; i++ ) { w = w * 1.000000000001 + one; y = y * 1.000000000002 + one; z = z * 1.000000000003 + one; a = a * 1.000000000004 + one; b = b * 1.000000000005 + one; c = c * 0.999999999999 + one; d = d * 0.999999999998 + one; e = e * 0.999999999997 + one; f = f * 0.999999999996 + one; g = h * 0.999999999995 + one; h = h * 1.000000000006 + one; } return 2.0 * ( a + b + c + d + e + f + w + x + y + z + g + h ); } papi-5.3.0/src/ctests/realtime.c0000600003276200002170000000507312247131121016203 0ustar ralphundrgrad#include "papi_test.h" int main( int argc, char **argv ) { int retval; long long elapsed_us, elapsed_cyc; const PAPI_hw_info_t *hw_info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); printf( "Testing real time clock. (CPU Max %d MHz, CPU Min %d MHz)\n", hw_info->cpu_max_mhz, hw_info->cpu_min_mhz ); printf( "Sleeping for 10 seconds.\n" ); sleep( 10 ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; printf( "%lld us. %lld cyc.\n", elapsed_us, elapsed_cyc ); printf( "%f Computed MHz.\n", ( float ) elapsed_cyc / ( float ) elapsed_us ); /* Elapsed microseconds and elapsed cycles are not as unambiguous as they appear. On Pentium III and 4, for example, cycles is a measured value, while useconds is computed from cycles and mhz. MHz is read from /proc/cpuinfo (on linux). Thus, any error in MHz is propagated to useconds. Conversely, on ultrasparc useconds are extracted from a system call (gethrtime()) and cycles are computed from useconds. Also, MHz comes from a scan of system info, Thus any error in gethrtime() propagates to both cycles and useconds, and cycles can be further impacted by errors in reported MHz. Without knowing the error bars on these system values, we can't really specify error ranges for our reported values, but we *DO* know that errors for at least one instance of Pentium 4 (torc17@utk) are on the order of one part per thousand. Newer multicore Intel processors seem to have broken the relationship between the clock rate reported in /proc/cpuinfo and the actual computed clock. To accomodate this artifact, the test no longer fails, but merely reports results out of range. */ if ( elapsed_us < 9000000 ) printf( "NOTE: Elapsed real time less than 9 seconds!\n" ); if ( elapsed_us > 11000000 ) printf( "NOTE: Elapsed real time greater than 11 seconds!\n" ); if ( ( float ) elapsed_cyc < 9.0 * hw_info->cpu_max_mhz * 1000000.0 ) printf( "NOTE: Elapsed real cycles less than 9*MHz*1000000.0!\n" ); if ( ( float ) elapsed_cyc > 11.0 * hw_info->cpu_max_mhz * 1000000.0 ) printf( "NOTE: Elapsed real cycles greater than 11*MHz*1000000.0!\n" ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/omptough.c0000600003276200002170000000517312247131121016244 0ustar ralphundrgrad#include #include #include #include #include #include #include "papi_test.h" #define NITER (100000) int main( int argc, char *argv[] ) { int i; int ret; int nthreads; int *evtset; int *ctrcode; nthreads = omp_get_max_threads( ); evtset = ( int * ) malloc( sizeof ( int ) * nthreads ); ctrcode = ( int * ) malloc( sizeof ( int ) * nthreads ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ ret = PAPI_library_init( PAPI_VER_CURRENT ); if ( ret != PAPI_VER_CURRENT && ret > 0 ) { fprintf( stderr, "PAPI library version mismatch '%s'\n", PAPI_strerror( ret ) ); exit( 1 ); } if ( ret < 0 ) { fprintf( stderr, "PAPI initialization error '%s'\n", PAPI_strerror( ret ) ); exit( 1 ); } if ( ( ret = PAPI_thread_init( ( unsigned long ( * )( void ) ) pthread_self ) ) != PAPI_OK ) { fprintf( stderr, "PAPI thread initialization error '%s'\n", PAPI_strerror( ret ) ); exit( 1 ); } for ( i = 0; i < nthreads; i++ ) { evtset[i] = PAPI_NULL; if ( ( ret = PAPI_event_name_to_code( "PAPI_TOT_INS", &ctrcode[i] ) ) != PAPI_OK ) { fprintf( stderr, "PAPI evt-name-to-code error '%s'\n", PAPI_strerror( ret ) ); } } for ( i = 0; i < NITER; i++ ) { #pragma omp parallel { int tid; int pid; tid = omp_get_thread_num( ); pid = pthread_self( ); if ( ( ret = PAPI_register_thread( ) ) != PAPI_OK ) { if ( !TESTS_QUIET ) { fprintf( stderr, "[%5d] Error in register thread (tid=%d pid=%d) '%s'\n", i, tid, pid, PAPI_strerror( ret ) ); test_fail( __FILE__, __LINE__, "omptough", 1 ); } } evtset[tid] = PAPI_NULL; if ( ( ret = PAPI_create_eventset( &( evtset[tid] ) ) ) != PAPI_OK ) { if ( !TESTS_QUIET ) { fprintf( stderr, "[%5d] Error creating eventset (tid=%d pid=%d) '%s'\n", i, tid, pid, PAPI_strerror( ret ) ); test_fail( __FILE__, __LINE__, "omptough", 1 ); } } if ( ( ret = PAPI_destroy_eventset( &( evtset[tid] ) ) ) != PAPI_OK ) { if ( !TESTS_QUIET ) { fprintf( stderr, "[%5d] Error destroying eventset (tid=%d pid=%d) '%s'\n", i, tid, pid, PAPI_strerror( ret ) ); evtset[tid] = PAPI_NULL; test_fail( __FILE__, __LINE__, "omptough", 1 ); } } if ( ( ret = PAPI_unregister_thread( ) ) != PAPI_OK ) { if ( !TESTS_QUIET ) { fprintf( stderr, "[%5d] Error in unregister thread (tid=%d pid=%d) ret='%s'\n", i, tid, pid, PAPI_strerror( ret ) ); test_fail( __FILE__, __LINE__, "omptough", 1 ); } } } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/sdsc.c0000600003276200002170000002107112247131121015331 0ustar ralphundrgrad/* * $Id$ * * Test example for multiplex functionality, originally * provided by Timothy Kaiser, SDSC. It was modified to fit the * PAPI test suite by Nils Smeds, . * * This example verifies the accuracy of multiplexed events */ #include "papi_test.h" #include #include #define REPEATS 5 #define MAXEVENTS 14 #define SLEEPTIME 100 #define MINCOUNTS 100000 static double dummy3( double x, int iters ); void check_values( int eventset, int *events, int nevents, long long *values, long long *refvalues ) { double spread[MAXEVENTS]; int i = nevents, j = 0; if ( !TESTS_QUIET ) { printf( "\nRelative accuracy:\n" ); for ( j = 0; j < nevents; j++ ) printf( " Event %.2d", j + 1 ); printf( "\n" ); } for ( j = 0; j < nevents; j++ ) { spread[j] = abs( (int) ( refvalues[j] - values[j] ) ); if ( values[j] ) spread[j] /= ( double ) values[j]; if ( !TESTS_QUIET ) printf( "%10.3g ", spread[j] ); /* Make sure that NaN get counted as errors */ if ( spread[j] < MPX_TOLERANCE ) { i--; } else if ( refvalues[j] < MINCOUNTS ) { /* Neglect inprecise results with low counts */ i--; } else { char buff[BUFSIZ]; printf("reference = %lld, value = %lld, diff = %lld\n", refvalues[j],values[j],refvalues[j] - values[j] ); sprintf(buff,"Error on %d, spread %lf > threshold %lf AND count %lld > minimum size threshold %d\n",j,spread[j],MPX_TOLERANCE, refvalues[j],MINCOUNTS); test_fail( __FILE__, __LINE__, buff, 1 ); } } printf( "\n\n" ); #if 0 if ( !TESTS_QUIET ) { for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); printf( "Event %.2d: ref=", j ); printf( LLDFMT10, refvalues[j] ); printf( ", diff/ref=%7.2g -- %s\n", spread[j], info.short_descr ); printf( "\n" ); } printf( "\n" ); } #else ( void ) eventset; ( void ) events; #endif } void ref_measurements( int iters, int *eventset, int *events, int nevents, long long *refvalues ) { PAPI_event_info_t info; int i, retval; double x = 1.1, y; long long t1, t2; printf( "PAPI reference measurements:\n" ); if ( ( retval = PAPI_create_eventset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); for ( i = 0; i < nevents; i++ ) { if ( ( retval = PAPI_add_event( *eventset, events[i] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); x = 1.0; t1 = PAPI_get_real_usec( ); if ( ( retval = PAPI_start( *eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); y = dummy3( x, iters ); if ( ( retval = PAPI_stop( *eventset, &refvalues[i] ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); t2 = PAPI_get_real_usec( ); if (!TESTS_QUIET) { printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( ( float ) y / ( t2 - t1 ) ) ); } PAPI_get_event_info( events[i], &info ); printf( "%20s = ", info.short_descr ); printf( LLDFMT, refvalues[i] ); printf( "\n" ); if ( ( retval = PAPI_cleanup_eventset( *eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); } if ( ( retval = PAPI_destroy_eventset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); *eventset = PAPI_NULL; } void decide_which_events( int *events, int *nevents ) { int i, j = 0; PAPI_event_info_t info; int newevents[MAXEVENTS]; for ( i = 0; i < MAXEVENTS; i++ ) { if ( PAPI_get_event_info( events[i], &info ) == PAPI_OK ) { if ( info.count && ( strcmp( info.derived, "NOT_DERIVED" ) == 0 ) ) { printf( "Added %s\n", info.symbol ); newevents[j++] = events[i]; } } } if ( j < 2 ) test_skip( __FILE__, __LINE__, "Not enough events to multiplex...", 0 ); *nevents = j; memcpy( events, newevents, sizeof ( newevents ) ); printf( "Using %d events\n\n", *nevents ); } int main( int argc, char **argv ) { PAPI_event_info_t info; int i, j, retval; int iters = NUM_FLOPS; double x = 1.1, y; long long t1, t2; long long values[MAXEVENTS], refvalues[MAXEVENTS]; int sleep_time = SLEEPTIME; int nevents = MAXEVENTS; int eventset = PAPI_NULL; int events[MAXEVENTS]; events[0] = PAPI_FP_INS; events[1] = PAPI_TOT_INS; events[2] = PAPI_INT_INS; events[3] = PAPI_TOT_CYC; events[4] = PAPI_STL_CCY; events[5] = PAPI_BR_INS; events[6] = PAPI_SR_INS; events[7] = PAPI_LD_INS; events[8] = PAPI_TOT_IIS; events[9] = PAPI_FAD_INS; events[10] = PAPI_BR_TKN; events[11] = PAPI_BR_MSP; events[12] = PAPI_L1_ICA; events[13] = PAPI_L1_DCA; for ( i = 0; i < MAXEVENTS; i++ ) { values[i] = 0; } if ( argc > 1 ) { if ( !strcmp( argv[1], "TESTS_QUIET" ) ) tests_quiet( argc, argv ); else { sleep_time = atoi( argv[1] ); if ( sleep_time <= 0 ) sleep_time = SLEEPTIME; } } if ( !TESTS_QUIET ) { printf( "\nAccuracy check of multiplexing routines.\n" ); printf ( "Comparing a multiplex measurement with separate measurements.\n\n" ); } if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); decide_which_events( events, &nevents ); init_multiplex( ); /* Find a reasonable number of iterations (each * event active 20 times) during the measurement */ t2 = 10000 * 20 * nevents; /* Target: 10000 usec/multiplex, 20 repeats */ if ( t2 > 30e6 ) test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); y = dummy3( x, iters ); /* Measure one run */ t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); t1 = PAPI_get_real_usec( ) - t1; if ( t1 < 1000000 ) { /* Scale up execution time to match t2 */ iters = iters * ( int ) ( 1000000 / t1 ); printf( "Modified iteration count to %d\n\n", iters ); } if (!TESTS_QUIET) fprintf(stdout,"y=%lf\n",y); /* Now loop through the items one at a time */ ref_measurements( iters, &eventset, events, nevents, refvalues ); /* Now check multiplexed */ if ( ( retval = PAPI_create_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( eventset, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); if ( ( retval = PAPI_set_multiplex( eventset ) ) ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } if ( ( retval = PAPI_add_events( eventset, events, nevents ) ) ) test_fail( __FILE__, __LINE__, "PAPI_add_events", retval ); printf( "\nPAPI multiplexed measurements:\n" ); x = 1.0; t1 = PAPI_get_real_usec( ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); y = dummy3( x, iters ); if ( ( retval = PAPI_stop( eventset, values ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); t2 = PAPI_get_real_usec( ); for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); if ( !TESTS_QUIET ) { printf( "%20s = ", info.short_descr ); printf( LLDFMT, values[j] ); printf( "\n" ); } } check_values( eventset, events, nevents, values, refvalues ); if ( ( retval = PAPI_remove_events( eventset, events, nevents ) ) ) test_fail( __FILE__, __LINE__, "PAPI_remove_events", retval ); if ( ( retval = PAPI_cleanup_eventset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); if ( ( retval = PAPI_destroy_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); eventset = PAPI_NULL; /* Now loop through the items one at a time */ ref_measurements( iters, &eventset, events, nevents, refvalues ); check_values( eventset, events, nevents, values, refvalues ); test_pass( __FILE__, NULL, 0 ); return 0; } static double dummy3( double x, int iters ) { int i; double w, y, z, a, b, c, d, e, f, g, h; double one; one = 1.0; w = x; y = x; z = x; a = x; b = x; c = x; d = x; e = x; f = x; g = x; h = x; for ( i = 1; i <= iters; i++ ) { w = w * 1.000000000001 + one; y = y * 1.000000000002 + one; z = z * 1.000000000003 + one; a = a * 1.000000000004 + one; b = b * 1.000000000005 + one; c = c * 0.999999999999 + one; d = d * 0.999999999998 + one; e = e * 0.999999999997 + one; f = f * 0.999999999996 + one; g = h * 0.999999999995 + one; h = h * 1.000000000006 + one; } return 2.0 * ( a + b + c + d + e + f + w + x + y + z + g + h ); } papi-5.3.0/src/ctests/get_event_component.c0000600003276200002170000000341512247131121020441 0ustar ralphundrgrad/* * File: get_event_component.c * Author: Vince Weaver * vweaver1@eecs.utk.edu */ /* This test makes sure PAPI_get_event_component() works */ #include "papi_test.h" int main( int argc, char **argv ) { int i; int retval; PAPI_event_info_t info; int numcmp, cid, our_cid; const PAPI_component_info_t* cmpinfo; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Init PAPI library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } numcmp = PAPI_num_components( ); /* Loop through all components */ for( cid = 0; cid < numcmp; cid++ ) { cmpinfo = PAPI_get_component_info( cid ); if (cmpinfo == NULL) { test_fail( __FILE__, __LINE__, "PAPI_get_component_info", 2 ); } if (cmpinfo->disabled) { printf( "Name: %-23s %s\n", cmpinfo->name ,cmpinfo->description); printf(" \\-> Disabled: %s\n",cmpinfo->disabled_reason); continue; } i = 0 | PAPI_NATIVE_MASK; retval = PAPI_enum_cmp_event( &i, PAPI_ENUM_FIRST, cid ); if (retval!=PAPI_OK) continue; do { retval = PAPI_get_event_info( i, &info ); our_cid=PAPI_get_event_component(i); if (our_cid!=cid) { if (!TESTS_QUIET) { printf("%d %d %s\n",cid,our_cid,info.symbol); } test_fail( __FILE__, __LINE__, "component mismatch", 1 ); } if (!TESTS_QUIET) { printf("%d %d %s\n",cid,our_cid,info.symbol); } } while ( PAPI_enum_cmp_event( &i, PAPI_ENUM_EVENTS, cid ) == PAPI_OK ); } test_pass( __FILE__, NULL, 0 ); return 0; } papi-5.3.0/src/ctests/eventname.c0000600003276200002170000000166312247131121016364 0ustar ralphundrgrad#include "papi_test.h" extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( int argc, char **argv ) { int retval; int preset; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = PAPI_event_name_to_code( "PAPI_FP_INS", &preset ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); if ( preset != PAPI_FP_INS ) test_fail( __FILE__, __LINE__, "Wrong preset returned", retval ); retval = PAPI_event_name_to_code( "PAPI_TOT_CYC", &preset ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_name_to_code", retval ); if ( preset != PAPI_TOT_CYC ) test_fail( __FILE__, __LINE__, "*preset returned did not equal PAPI_TOT_CYC", retval ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/sprofile.c0000600003276200002170000000752012247131121016223 0ustar ralphundrgrad/* * File: sprofile.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: Maynard Johnson * maynardj@us.ibm.com * Mods: * */ /* These architectures use Function Descriptors as Function Pointers */ #if (defined(linux) && defined(__ia64__)) || (defined(_AIX)) ||(defined(__powerpc64__)) #define DO_READS (unsigned long)(*(void **)do_reads) #define DO_FLOPS (unsigned long)(*(void **)do_flops) #else #define DO_READS (unsigned long)(do_reads) #define DO_FLOPS (unsigned long)(do_flops) #endif /* This file performs the following test: sprofile */ #include "papi_test.h" #include "prof_utils.h" int main( int argc, char **argv ) { int i, num_events, num_tests = 6, mask = 0x1; int EventSet = PAPI_NULL; unsigned short **buf = ( unsigned short ** ) profbuf; unsigned long length, blength; int num_buckets; PAPI_sprofil_t sprof[3]; int retval; const PAPI_exe_info_t *prginfo; caddr_t start, end; prof_init( argc, argv, &prginfo ); start = prginfo->address_info.text_start; end = prginfo->address_info.text_end; if ( start > end ) { test_fail( __FILE__, __LINE__, "Profile length < 0!", PAPI_ESYS ); } length = ( unsigned long ) ( end - start ); prof_print_address ( "Test case sprofile: POSIX compatible profiling over multiple regions.\n", prginfo ); blength = prof_size( length, FULL_SCALE, PAPI_PROFIL_BUCKET_16, &num_buckets ); prof_alloc( 3, blength ); /* First half */ sprof[0].pr_base = buf[0]; sprof[0].pr_size = ( unsigned int ) blength; sprof[0].pr_off = ( caddr_t ) DO_FLOPS; #if defined(linux) && defined(__ia64__) if ( !TESTS_QUIET ) fprintf( stderr, "do_flops is at %p %p\n", &do_flops, sprof[0].pr_off ); #endif sprof[0].pr_scale = FULL_SCALE; /* Second half */ sprof[1].pr_base = buf[1]; sprof[1].pr_size = ( unsigned int ) blength; sprof[1].pr_off = ( caddr_t ) DO_READS; #if defined(linux) && defined(__ia64__) if ( !TESTS_QUIET ) fprintf( stderr, "do_reads is at %p %p\n", &do_reads, sprof[1].pr_off ); #endif sprof[1].pr_scale = FULL_SCALE; /* Overflow bin */ sprof[2].pr_base = buf[2]; sprof[2].pr_size = 1; sprof[2].pr_off = 0; sprof[2].pr_scale = 0x2; EventSet = add_test_events( &num_events, &mask, 1 ); values = allocate_test_space( num_tests, num_events ); if ( ( retval = PAPI_sprofil( sprof, 3, EventSet, PAPI_TOT_CYC, THRESHOLD, PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_sprofil", retval ); do_stuff( ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet, values[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); /* clear the profile flag before removing the event */ if ( ( retval = PAPI_sprofil( sprof, 3, EventSet, PAPI_TOT_CYC, 0, PAPI_PROFIL_POSIX | PAPI_PROFIL_BUCKET_16 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_sprofil", retval ); remove_test_events( &EventSet, mask ); if ( !TESTS_QUIET ) { printf( "Test case: PAPI_sprofil()\n" ); printf( "---------Buffer 1--------\n" ); for ( i = 0; i < ( int ) length / 2; i++ ) { if ( buf[0][i] ) printf( "0x%lx\t%d\n", DO_FLOPS + 2 * ( unsigned long ) i, buf[0][i] ); } printf( "---------Buffer 2--------\n" ); for ( i = 0; i < ( int ) length / 2; i++ ) { if ( buf[1][i] ) printf( "0x%lx\t%d\n", DO_READS + 2 * ( unsigned long ) i, buf[1][i] ); } printf( "-------------------------\n" ); printf( "%u samples fell outside the regions.\n", *buf[2] ); } retval = prof_check( 2, PAPI_PROFIL_BUCKET_16, num_buckets ); for ( i = 0; i < 3; i++ ) { free( profbuf[i] ); } if ( retval == 0 ) test_fail( __FILE__, __LINE__, "No information in buffers", 1 ); test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/Makefile.recipies0000600003276200002170000004661412247131121017505 0ustar ralphundrgradOMP = zero_omp omptough SMP = zero_smp SHMEM = zero_shmem PTHREADS= pthrtough pthrtough2 thrspecific profile_pthreads overflow_pthreads \ zero_pthreads clockres_pthreads overflow3_pthreads locks_pthreads \ krentel_pthreads MPX = max_multiplex multiplex1 multiplex2 mendes-alt sdsc-mpx sdsc2-mpx \ sdsc4-mpx reset_multiplex MPXPTHR = multiplex1_pthreads multiplex3_pthreads kufrin MPI = mpifirst SHARED = shlib SERIAL = inherit hwinfo code2name reset cmpinfo realtime virttime first \ exeinfo zero zero_named branches dmem_info all_native_events \ all_events derived high-level2 hl_rates describe memory \ zero_flip low-level high-level eventname case1 case2 \ calibrate flops second johnmay2 matrix-hl tenth ipc nmi_watchdog \ get_event_component disable_component remove_events cycle_ratio FORKEXEC = fork fork2 exec exec2 forkexec forkexec2 forkexec3 forkexec4 \ fork_overflow exec_overflow child_overflow system_child_overflow \ system_overflow burn zero_fork OVERFLOW = fork_overflow exec_overflow child_overflow system_child_overflow \ system_overflow burn overflow overflow_force_software \ overflow_single_event overflow_twoevents timer_overflow overflow2 \ overflow_index overflow_one_and_read overflow_allcounters PROFILE = profile profile_force_software sprofile profile_twoevents \ byte_profile ATTACH = multiattach multiattach2 zero_attach attach3 attach2 attach_target P4_TEST = p4_lst_ins EAR = earprofile RANGE = data_range BROKEN = pernode API = api ifneq ($(MPICC),) ALL = $(PTHREADS) $(SERIAL) $(FORKEXEC) $(OVERFLOW) $(PROFILE) $(MPI) $(MPX) $(MPXPTHR) $(OMP) $(SMP) $(SHMEM)\ $(SHARED) $(EAR) $(RANGE) $(P4_TEST) $(ATTACH) $(API) else ALL = $(PTHREADS) $(SERIAL) $(FORKEXEC) $(OVERFLOW) $(PROFILE) $(MPX) $(MPXPTHR) $(OMP) $(SMP) $(SHMEM)\ $(SHARED) $(EAR) $(RANGE) $(P4_TEST) $(ATTACH) $(API) endif DEFAULT = papi_api serial forkexec_tests overflow_tests profile_tests attach multiplex_and_pthreads shared all: $(ALL) default ctests ctest: $(DEFAULT) attach: $(ATTACH) p4: $(P4_TEST) ear: $(EAR) range: $(RANGE) mpi: $(MPI) shared: $(SHARED) multiplex_and_pthreads: $(MPXPTHR) $(MPX) $(PTHREADS) multiplex: $(MPX) omp: $(OMP) smp: $(SMP) pthreads: $(PTHREADS) shmem: $(SHMEM) serial: $(SERIAL) forkexec_tests: $(FORKEXEC) overflow_tests: $(OVERFLOW) profile_tests: $(PROFILE) papi_api: $(API) api: api.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) api.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ sdsc2: sdsc2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) sdsc.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -lm -o $@ sdsc2-mpx: sdsc2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) -DMPX $(TOPTFLAGS) sdsc2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -lm -o $@ branches: branches.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) branches.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -lm -o $@ sdsc2-mpx-noreset: sdsc2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) -DMPX -DSTARTSTOP $(TOPTFLAGS) sdsc.c $(TESTLIB) $(PAPILIB) -lm $(LDFLAGS) -o $@ sdsc-mpx: sdsc.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) -DMPX $(TOPTFLAGS) sdsc.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ sdsc4-mpx: sdsc4.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) -DMPX $(TOPTFLAGS) sdsc4.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -lm -o $@ calibrate: calibrate.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) calibrate.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o calibrate data_range: data_range.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) data_range.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o data_range p4_lst_ins: p4_lst_ins.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) p4_lst_ins.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o p4_lst_ins acpi: acpi.c dummy.o $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) acpi.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o acpi timer_overflow: timer_overflow.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) timer_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ mendes-alt: mendes-alt.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DMULTIPLEX mendes-alt.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ max_multiplex: max_multiplex.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) max_multiplex.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ multiplex1: multiplex1.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiplex1.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ multiplex2: multiplex2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiplex2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ multiplex1_pthreads: multiplex1_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiplex1_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ -lpthread kufrin: kufrin.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) kufrin.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ -lpthread multiplex3_pthreads: multiplex3_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiplex3_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ -lpthread overflow3_pthreads: overflow3_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow3_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o $@ -lpthread thrspecific: thrspecific.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) thrspecific.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o thrspecific -lpthread pthrtough: pthrtough.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) pthrtough.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o pthrtough -lpthread pthrtough2: pthrtough2.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) pthrtough2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o pthrtough2 -lpthread profile_pthreads: profile_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) profile_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o profile_pthreads -lpthread locks_pthreads: locks_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) locks_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o locks_pthreads -lpthread krentel_pthreads: krentel_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) krentel_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o krentel_pthreads -lpthread overflow_pthreads: overflow_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_pthreads -lpthread zero_pthreads: zero_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_pthreads -lpthread zero_smp: zero_smp.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(SMPCFLGS) $(CFLAGS) $(TOPTFLAGS) zero_smp.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_smp $(SMPLIBS) zero_shmem: zero_shmem.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(SMPCFLGS) $(CFLAGS) $(TOPTFLAGS) zero_shmem.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_shmem $(SMPLIBS) zero_omp: zero_omp.c $(TESTLIB) $(PAPILIB) -$(CC_R) $(INCLUDE) $(OMPCFLGS) $(CFLAGS) $(TOPTFLAGS) zero_omp.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_omp $(OMPLIBS) omptough: omptough.c $(TESTLIB) $(PAPILIB) -$(CC_R) $(INCLUDE) $(OMPCFLGS) $(CFLAGS) $(TOPTFLAGS) omptough.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o omptough $(OMPLIBS) val_omp: val_omp.c $(TESTLIB) $(PAPILIB) -$(CC_R) $(INCLUDE) $(OMPCFLGS) $(CFLAGS) $(TOPTFLAGS) val_omp.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o val_omp $(OMPLIBS) clockres_pthreads: clockres_pthreads.c $(TESTLIB) $(PAPILIB) $(CC_R) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) clockres_pthreads.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o clockres_pthreads -lpthread -lm inherit: inherit.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) inherit.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o inherit johnmay2: johnmay2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) johnmay2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o johnmay2 describe: describe.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) describe.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o describe derived: derived.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) derived.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o derived zero: zero.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero zero_named: zero_named.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero_named.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_named remove_events: remove_events.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) remove_events.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o remove_events nmi_watchdog: nmi_watchdog.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) nmi_watchdog.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o nmi_watchdog cycle_ratio: cycle_ratio.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) cycle_ratio.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o cycle_ratio zero_fork: zero_fork.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero_fork.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_fork try: try.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) try.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o try zero_flip: zero_flip.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero_flip.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_flip realtime: realtime.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) realtime.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o realtime virttime: virttime.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) virttime.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o virttime first: first.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) first.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o first mpifirst: mpifirst.c $(TESTLIB) $(PAPILIB) $(MPICC) $(INCLUDE) $(MPFLAGS) $(CFLAGS) $(TOPTFLAGS) first.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o mpifirst first-twice: first-twice.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) first-twice.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o first-twice second: second.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) second.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o second flops: flops.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) flops.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o flops ipc: ipc.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) ipc.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o ipc overflow: overflow.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow overflow_allcounters: overflow_allcounters.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_allcounters.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_allcounters overflow_twoevents: overflow_twoevents.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_twoevents.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_twoevents overflow_one_and_read: overflow_one_and_read.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_one_and_read.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_one_and_read overflow_index: overflow_index.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_index.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_index overflow_values: overflow_values.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_values.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_values overflow2: overflow2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow2 overflow_single_event: overflow_single_event.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_single_event.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_single_event overflow_force_software: overflow_force_software.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) overflow_force_software.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o overflow_force_software sprofile: sprofile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) sprofile.c prof_utils.o $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o sprofile profile: profile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) profile.c prof_utils.o $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o profile profile_force_software: profile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DSWPROFILE profile.c prof_utils.o $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o profile_force_software profile_twoevents: profile_twoevents.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) profile_twoevents.c prof_utils.o $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o profile_twoevents earprofile: earprofile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) earprofile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(LDFLAGS) -o earprofile byte_profile: byte_profile.c $(TESTLIB) prof_utils.o $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) byte_profile.c prof_utils.o $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o byte_profile pernode: pernode.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) pernode.c $(LDFLAGS) -o pernode dmem_info: dmem_info.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) dmem_info.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o dmem_info all_events: all_events.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) all_events.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o all_events all_native_events: all_native_events.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) all_native_events.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o all_native_events get_event_component: get_event_component.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) get_event_component.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o get_event_component disable_component: disable_component.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) disable_component.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o disable_component memory: memory.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) memory.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o memory tenth: tenth.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) tenth.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o tenth eventname: eventname.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) eventname.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o eventname case1: case1.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) case1.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o case1 case2: case2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) case2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o case2 low-level: low-level.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) low-level.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o low-level matrix-hl: matrix-hl.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) matrix-hl.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o matrix-hl hl_rates: hl_rates.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) hl_rates.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o hl_rates high-level: high-level.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) high-level.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o high-level high-level2: high-level2.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) high-level2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o high-level2 shlib: shlib.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) shlib.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o shlib $(LDL) exeinfo: exeinfo.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) exeinfo.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o exeinfo cmpinfo: cmpinfo.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) cmpinfo.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o cmpinfo hwinfo: hwinfo.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) hwinfo.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o hwinfo code2name: code2name.c $(TESTLIB) $(PAPILIB) $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) code2name.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o code2name attach_target: attach_target.c -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) attach_target.c -o attach_target $(TESTLIB) zero_attach: zero_attach.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) zero_attach.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o zero_attach multiattach: multiattach.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiattach.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o multiattach multiattach2: multiattach2.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) multiattach2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o multiattach2 attach3: attach3.c attach_target $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) attach3.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o attach3 attach2: attach2.c attach_target $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) attach2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o attach2 reset: reset.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) reset.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o reset reset_multiplex: reset_multiplex.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) reset_multiplex.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o reset_multiplex fork_overflow: fork_exec_overflow.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) fork_exec_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o fork_overflow exec_overflow: fork_exec_overflow.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DPEXEC fork_exec_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o exec_overflow child_overflow: fork_exec_overflow.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DPCHILD fork_exec_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o child_overflow system_child_overflow: fork_exec_overflow.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DSYSTEM fork_exec_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o system_child_overflow system_overflow: fork_exec_overflow.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -DSYSTEM2 fork_exec_overflow.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o system_overflow burn: burn.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) burn.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o burn fork: fork.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) fork.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o fork exec: exec.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) exec.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o exec exec2: exec2.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) exec2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o exec2 fork2: fork2.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) fork2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o fork2 forkexec: forkexec.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) forkexec.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o forkexec forkexec2: forkexec2.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) forkexec2.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o forkexec2 forkexec3: forkexec3.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) forkexec3.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o forkexec3 forkexec4: forkexec4.c $(TESTLIB) $(PAPILIB) -$(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) forkexec4.c $(TESTLIB) $(PAPILIB) $(LDFLAGS) -o forkexec4 prof_utils.o: prof_utils.c $(testlibdir)/papi_test.h prof_utils.h $(CC) $(INCLUDE) $(CFLAGS) $(TOPTFLAGS) -c prof_utils.c .PHONY : all default ctests ctest clean clean: rm -f *.o *.stderr *.stdout core *~ $(ALL) unregister_pthreads papi-5.3.0/src/ctests/Makefile.target.in0000600003276200002170000000066212247131121017566 0ustar ralphundrgradprefix = @prefix@ exec_prefix = @exec_prefix@ PACKAGE_TARNAME = @PACKAGE_TARNAME@ datadir = @datadir@/${PACKAGE_TARNAME} testlibdir = $(datadir)/testlib INCLUDE = -I. -I@includedir@ -I$(testlibdir) LIBDIR = @libdir@ LIBRARY=@LIBRARY@ SHLIB=@SHLIB@ PAPILIB = $(LIBDIR)/@LINKLIB@ TESTLIB = $(testlibdir)/libtestlib.a LDFLAGS = @LDL@ @STATIC@ CC = @CC@ MPICC = @MPICC@ F77 = @F77@ CC_R = @CC_R@ CFLAGS = @CFLAGS@ OMPCFLGS = @OMPCFLGS@ papi-5.3.0/src/ctests/overflow_pthreads.c0000600003276200002170000001404112247131121020131 0ustar ralphundrgrad/* This file performs the following test: overflow dispatch with pthreads - This tests the dispatch of overflow calls from PAPI. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). The Eventset contains: + PAPI_FP_INS (overflow monitor) + PAPI_TOT_CYC - Set up overflow - Start eventset 1 - Do flops - Stop eventset 1 */ #include #include "papi_test.h" static const PAPI_hw_info_t *hw_info = NULL; static int total[NUM_THREADS]; static int expected[NUM_THREADS]; static pthread_t myid[NUM_THREADS]; void handler( int EventSet, void *address, long long overflow_vector, void *context ) { #if 0 printf( "handler(%d,0x%lx,%llx) Overflow %d in thread %lx\n", EventSet, ( unsigned long ) address, overflow_vector, total[EventSet], PAPI_thread_id( ) ); printf( "%lx vs %lx\n", myid[EventSet], PAPI_thread_id( ) ); #else /* eliminate unused parameter warning message */ ( void ) address; ( void ) overflow_vector; ( void ) context; #endif total[EventSet]++; } long long mythreshold=0; void * Thread( void *arg ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int mask1, papi_event; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN]; retval = PAPI_register_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_register_thread", retval ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ EventSet1 = add_two_nonderived_events( &num_events1, &papi_event, &mask1 ); if (EventSet1 < 0) return NULL; /* Wait, we're indexing a per-thread array with the EventSet number? */ /* does that make any sense at all???? -- vmw */ expected[EventSet1] = *( int * ) arg / mythreshold; myid[EventSet1] = PAPI_thread_id( ); values = allocate_test_space( num_tests, num_events1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); if ((retval = PAPI_overflow( EventSet1, papi_event, mythreshold, 0, handler ) ) != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); } /* start_timer(1); */ if ( ( retval = PAPI_start( EventSet1 ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); if ( ( retval = PAPI_stop( EventSet1, values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; if ( ( retval = PAPI_overflow( EventSet1, papi_event, 0, 0, NULL ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); remove_test_events( &EventSet1, mask1 ); if ( ( retval = PAPI_event_code_to_name( papi_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); if ( !TESTS_QUIET ) { printf( "Thread %#x %s : \t%lld\n", ( int ) pthread_self( ), event_name, ( values[0] )[0] ); printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", ( int ) pthread_self( ), ( values[0] )[1] ); printf( "Thread %#x Real usec : \t%lld\n", ( int ) pthread_self( ), elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", ( int ) pthread_self( ), elapsed_cyc ); } free_test_space( values, num_tests ); retval = PAPI_unregister_thread( ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_unregister_thread", retval ); return ( NULL ); } int main( int argc, char **argv ) { pthread_t id[NUM_THREADS]; int flops[NUM_THREADS]; int i, rc, retval; pthread_attr_t attr; float ratio; memset( total, 0x0, NUM_THREADS * sizeof ( *total ) ); memset( expected, 0x0, NUM_THREADS * sizeof ( *expected ) ); memset( myid, 0x0, NUM_THREADS * sizeof ( *myid ) ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( ( retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ) ) != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } #if defined(linux) mythreshold = hw_info->cpu_max_mhz * 10000 * 2; #else mythreshold = THRESHOLD * 2; #endif pthread_attr_init( &attr ); #ifdef PTHREAD_CREATE_UNDETACHED pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_UNDETACHED ); #endif #ifdef PTHREAD_SCOPE_SYSTEM retval = pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM ); if ( retval != 0 ) test_skip( __FILE__, __LINE__, "pthread_attr_setscope", retval ); #endif for ( i = 0; i < NUM_THREADS; i++ ) { flops[i] = NUM_FLOPS * ( i + 1 ); rc = pthread_create( &id[i], &attr, Thread, ( void * ) &flops[i] ); if ( rc ) test_fail( __FILE__, __LINE__, "pthread_create", PAPI_ESYS ); } for ( i = 0; i < NUM_THREADS; i++ ) pthread_join( id[i], NULL ); pthread_attr_destroy( &attr ); { long long t = 0, r = 0; for ( i = 0; i < NUM_THREADS; i++ ) { t += ( NUM_FLOPS * ( i + 1 ) ) / mythreshold; r += total[i]; } printf( "Expected total overflows: %lld\n", t ); printf( "Received total overflows: %lld\n", r ); } /* ratio = (float)total[0] / (float)expected[0]; */ /* printf("Ratio of total to expected: %f\n",ratio); */ ratio = 1.0; for ( i = 0; i < NUM_THREADS; i++ ) { printf( "Overflows thread %d: %d, expected %d\n", i, total[i], ( int ) ( ratio * ( float ) expected[i] ) ); } for ( i = 0; i < NUM_THREADS; i++ ) { if ( total[i] < ( int ) ( ( ratio * ( float ) expected[i] ) / 2.0 ) ) test_fail( __FILE__, __LINE__, "not enough overflows", PAPI_EMISC ); } test_pass( __FILE__, NULL, 0 ); pthread_exit( NULL ); exit( 1 ); } papi-5.3.0/src/ctests/zero_omp.c0000600003276200002170000001044512247131121016232 0ustar ralphundrgrad/* * File: zero_omp.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: Nils Smeds * smeds@pdc.kth.se * Anders Nilsson * anni@pdc.kth.se */ /* This file performs the following test: start, stop and timer functionality for 2 slave OMP threads - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC Each thread inside the Thread routine: - Get cyc. - Get us. - Start counters - Do flops - Stop and read counters - Get us. - Get cyc. Master serial thread: - Get us. - Get cyc. - Run parallel for loop - Get us. - Get cyc. */ #include "papi_test.h" #ifdef _OPENMP #include #else #error "This compiler does not understand OPENMP" #endif const PAPI_hw_info_t *hw_info = NULL; void Thread( int n ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN]; printf( "Thread %#x started\n", omp_get_thread_num( ) ); num_events1 = 2; /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); values = allocate_test_space( num_tests, num_events1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( n ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { printf( "Thread %#x %-12s : \t%lld\n", omp_get_thread_num( ), event_name, values[0][1] ); printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", omp_get_thread_num( ), values[0][0] ); printf( "Thread %#x Real usec : \t%lld\n", omp_get_thread_num( ), elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", omp_get_thread_num( ), elapsed_cyc ); } /* It is illegal for the threads to exit in OpenMP */ /* test_pass(__FILE__,0,0); */ free_test_space( values, num_tests ); PAPI_unregister_thread( ); printf( "Thread %#x finished\n", omp_get_thread_num( ) ); } int main( int argc, char **argv ) { int maxthr, retval; long long elapsed_us, elapsed_cyc; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( omp_get_thread_num ) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } #pragma omp parallel private(maxthr) { maxthr = omp_get_num_threads( ); Thread( 1000000 * ( omp_get_thread_num( ) + 1 ) ); } omp_set_num_threads( 1 ); Thread( 1000000 * ( omp_get_thread_num( ) + 1 ) ); omp_set_num_threads( omp_get_max_threads( ) ); #pragma omp parallel private(maxthr) { maxthr = omp_get_num_threads( ); Thread( 1000000 * ( omp_get_thread_num( ) + 1 ) ); } elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; if ( !TESTS_QUIET ) { printf( "Master real usec : \t%lld\n", elapsed_us ); printf( "Master real cycles : \t%lld\n", elapsed_cyc ); } test_pass( __FILE__, NULL, 0 ); exit( 0 ); } papi-5.3.0/src/ctests/code2name.c0000600003276200002170000000731312247131121016235 0ustar ralphundrgrad/* This file performs the following test: event_code_to_name */ #include "papi_test.h" static void test_continue( char *call, int retval ) { printf( "Expected error in %s: %s\n", call, PAPI_strerror(retval) ); } int main( int argc, char **argv ) { int retval; int code = PAPI_TOT_CYC, last; char event_name[PAPI_MAX_STR_LEN]; const PAPI_hw_info_t *hwinfo = NULL; const PAPI_component_info_t *cmp_info; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); retval = papi_print_header ( "Test case code2name.c: Check limits and indexing of event tables.\n", &hwinfo ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); printf( "Looking for PAPI_TOT_CYC...\n" ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf( "Found |%s|\n", event_name ); code = PAPI_FP_OPS; printf( "Looking for highest defined preset event (PAPI_FP_OPS): %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf( "Found |%s|\n", event_name ); code = PAPI_PRESET_MASK | ( PAPI_MAX_PRESET_EVENTS - 1 ); printf( "Looking for highest allocated preset event: %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) test_continue( "PAPI_event_code_to_name", retval ); else printf( "Found |%s|\n", event_name ); code = PAPI_PRESET_MASK | ( int ) PAPI_NATIVE_AND_MASK; printf( "Looking for highest possible preset event: %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) test_continue( "PAPI_event_code_to_name", retval ); else printf( "Found |%s|\n", event_name ); /* Find the first defined native event */ /* For platform independence, always ASK FOR the first event */ /* Don't just assume it'll be the first numeric value */ code = PAPI_NATIVE_MASK; PAPI_enum_event( &code, PAPI_ENUM_FIRST ); printf( "Looking for first native event: %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } else { printf( "Found |%s|\n", event_name ); } /* Find the last defined native event */ /* FIXME: hardcoded cmp 0 */ cmp_info = PAPI_get_component_info( 0 ); if ( cmp_info == NULL ) { test_fail( __FILE__, __LINE__, "PAPI_get_component_info", PAPI_ECMP ); } code = PAPI_NATIVE_MASK; PAPI_enum_event( &code, PAPI_ENUM_FIRST ); while ( PAPI_enum_event( &code, PAPI_ENUM_EVENTS ) == PAPI_OK ) { last=code; } code = last; printf( "Looking for last native event: %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); } else { printf( "Found |%s|\n", event_name ); } /* Highly doubtful we have this many natives */ /* Turn on all bits *except* PRESET bit and COMPONENT bits */ code = PAPI_PRESET_AND_MASK; printf( "Looking for highest definable native event: %#x...\n", code ); retval = PAPI_event_code_to_name( code, event_name ); if ( retval != PAPI_OK ) test_continue( "PAPI_event_code_to_name", retval ); else printf( "Found |%s|\n", event_name ); if ( ( retval == PAPI_ENOCMP) || ( retval == PAPI_ENOEVNT ) || ( retval == PAPI_OK ) ) test_pass( __FILE__, 0, 0 ); test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", PAPI_EBUG ); exit( 1 ); } papi-5.3.0/src/ctests/memory.c0000600003276200002170000001267712247131121015721 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for L1 related events - They are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). - Start counters - Do iterations - Stop and read counters */ #include "papi_test.h" #define OUT_FMT "%12d\t%12lld\t%12lld\t%.2f\n" int main( int argc, char **argv ) { int retval, i, j; int EventSet = PAPI_NULL; long long values[2]; const PAPI_hw_info_t *hwinfo = NULL; char descr[PAPI_MAX_STR_LEN]; PAPI_event_info_t evinfo; PAPI_mh_level_t *L; const int eventlist[] = { PAPI_L1_DCA, PAPI_L1_DCM, PAPI_L1_DCH, PAPI_L2_DCA, PAPI_L2_DCM, PAPI_L2_DCH, #if 0 PAPI_L1_LDM, PAPI_L1_STM, PAPI_L1_DCR, PAPI_L1_DCW, PAPI_L1_ICM, PAPI_L1_TCM, PAPI_LD_INS, PAPI_SR_INS, PAPI_LST_INS, PAPI_L2_DCR, PAPI_L2_DCW, PAPI_CSR_TOT, PAPI_MEM_SCY, PAPI_MEM_RCY, PAPI_MEM_WCY, PAPI_L1_ICH, PAPI_L1_ICA, PAPI_L1_ICR, PAPI_L1_ICW, PAPI_L1_TCH, PAPI_L1_TCA, PAPI_L1_TCR, PAPI_L1_TCW, PAPI_L2_DCM, PAPI_L2_ICM, PAPI_L2_TCM, PAPI_L2_LDM, PAPI_L2_STM, PAPI_L2_DCH, PAPI_L2_DCA, PAPI_L2_DCR, PAPI_L2_DCW, PAPI_L2_ICH, PAPI_L2_ICA, PAPI_L2_ICR, PAPI_L2_ICW, PAPI_L2_TCH, PAPI_L2_TCA, PAPI_L2_TCR, PAPI_L2_TCW, #endif 0 }; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( hwinfo = PAPI_get_hardware_info( ) ) == NULL ) { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); } if ( ( retval = PAPI_create_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); /* Extract and report the cache information */ L = ( PAPI_mh_level_t * ) ( hwinfo->mem_hierarchy.level ); for ( i = 0; i < hwinfo->mem_hierarchy.levels; i++ ) { for ( j = 0; j < 2; j++ ) { int tmp; tmp = PAPI_MH_CACHE_TYPE( L[i].cache[j].type ); if ( tmp == PAPI_MH_TYPE_UNIFIED ) { printf( "L%d Unified ", i + 1 ); } else if ( tmp == PAPI_MH_TYPE_DATA ) { printf( "L%d Data ", i + 1 ); } else if ( tmp == PAPI_MH_TYPE_INST ) { printf( "L%d Instruction ", i + 1 ); } else if ( tmp == PAPI_MH_TYPE_VECTOR ) { printf( "L%d Vector ", i + 1 ); } else if ( tmp == PAPI_MH_TYPE_TRACE ) { printf( "L%d Trace ", i + 1 ); } else if ( tmp == PAPI_MH_TYPE_EMPTY ) { break; } else { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", PAPI_EBUG ); } tmp = PAPI_MH_CACHE_WRITE_POLICY( L[i].cache[j].type ); if ( tmp == PAPI_MH_TYPE_WB ) { printf( "Write back " ); } else if ( tmp == PAPI_MH_TYPE_WT ) { printf( "Write through " ); } else { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", PAPI_EBUG ); } tmp = PAPI_MH_CACHE_REPLACEMENT_POLICY( L[i].cache[j].type ); if ( tmp == PAPI_MH_TYPE_PSEUDO_LRU ) { printf( "Pseudo LRU policy " ); } else if ( tmp == PAPI_MH_TYPE_LRU ) { printf( "LRU policy " ); } else if ( tmp == PAPI_MH_TYPE_UNKNOWN ) { printf( "Unknown policy " ); } else { test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", PAPI_EBUG ); } printf( "Cache:\n" ); if ( L[i].cache[j].type ) { printf ( " Total size: %dKB\n Line size: %dB\n Number of Lines: %d\n Associativity: %d\n\n", ( L[i].cache[j].size ) >> 10, L[i].cache[j].line_size, L[i].cache[j].num_lines, L[i].cache[j].associativity ); } } } for ( i = 0; eventlist[i] != 0; i++ ) { if (PAPI_event_code_to_name( eventlist[i], descr ) != PAPI_OK) continue; if ( PAPI_add_event( EventSet, eventlist[i] ) != PAPI_OK ) continue; if ( PAPI_get_event_info( eventlist[i], &evinfo ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_get_event_info", retval ); printf( "\nEvent: %s\nShort: %s\nLong: %s\n\n", evinfo.symbol, evinfo.short_descr, evinfo.long_descr ); printf( " Bytes\t\tCold\t\tWarm\tPercent\n" ); if ( ( retval = PAPI_start( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); for ( j = 512; j <= 16 * ( 1024 * 1024 ); j = j * 2 ) { do_misses( 1, j ); do_flush( ); if ( ( retval = PAPI_reset( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); do_misses( 1, j ); if ( ( retval = PAPI_read( EventSet, &values[0] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); if ( ( retval = PAPI_reset( EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); do_misses( 1, j ); if ( ( retval = PAPI_read( EventSet, &values[1] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); printf( OUT_FMT, j, values[0], values[1], ( ( float ) values[1] / ( float ) ( ( values[0] != 0 ) ? values[0] : 1 ) * 100.0 ) ); } if ( ( retval = PAPI_stop( EventSet, NULL ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( ( retval = PAPI_remove_event( EventSet, eventlist[i] ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_remove_event", retval ); } if ( ( retval = PAPI_destroy_eventset( &EventSet ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_one_and_read.c0000600003276200002170000000672712247131121020731 0ustar ralphundrgrad/* * File: overflow_one_and_read.c : based on overflow_twoevents.c * Mods: Philip Mucci * mucci@cs.utk.edu * Kevin London * london@cs.utk.edu */ /* This file performs the following test: overflow dispatch on 1 counter. * In the handler read events. */ #include "papi_test.h" #define OVER_FMT "handler(%d) Overflow at %p! vector=0x%llx\n" #define OUT_FMT "%-12s : %16lld%16lld\n" typedef struct { long long mask; int count; } ocount_t; /* there are three possible vectors, one counter overflows, the other counter overflows, both overflow */ /*not used*/ ocount_t overflow_counts[3] = { {0, 0}, {0, 0}, {0, 0} }; /*not used*/ int total_unknown = 0; /*added*/ long long dummyvalues[2]; void handler( int EventSet, void *address, long long overflow_vector, void *context ) { int retval; ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } if ( ( retval = PAPI_read( EventSet, dummyvalues ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read", retval ); if ( !TESTS_QUIET ) { fprintf( stderr, TWO12, dummyvalues[0], dummyvalues[1], "(Reading counters)\n" ); } if ( dummyvalues[1] == 0 ) test_fail( __FILE__, __LINE__, "Total Cycles == 0", 1 ); } int main( int argc, char **argv ) { int EventSet; long long **values = NULL; int retval; int PAPI_event; char event_name[PAPI_MAX_STR_LEN]; int num_events1, mask1; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depends on the availability of the event on the platform */ /* NOTE: Only adding one overflow on PAPI_event -- no overflow for PAPI_TOT_CYC*/ EventSet = add_two_nonderived_events( &num_events1, &PAPI_event, &mask1 ); values = allocate_test_space( 2, num_events1 ); if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); retval = PAPI_overflow( EventSet, PAPI_event, THRESHOLD, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_flops( NUM_FLOPS ); retval = PAPI_stop( EventSet, values[1] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); remove_test_events( &EventSet, mask1 ); if ( !TESTS_QUIET ) { printf ( "Test case: Overflow dispatch of 1st event in set with 2 events.\n" ); printf ( "---------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", THRESHOLD ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf( "-----------------------------------------------\n" ); printf( "Test type : %16d%16d\n", 1, 2 ); printf( OUT_FMT, event_name, ( values[0] )[0], ( values[1] )[0] ); printf( OUT_FMT, "PAPI_TOT_CYC", ( values[0] )[1], ( values[1] )[1] ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/sdsc2.c0000600003276200002170000001537212247131121015422 0ustar ralphundrgrad/* * $Id$ * * Test example for multiplex functionality, originally * provided by Timothy Kaiser, SDSC. It was modified to fit the * PAPI test suite by Nils Smeds, . * * This example verifies the PAPI_reset function for * multiplexed events */ #include "papi_test.h" #include #include #define REPEATS 5 #define MAXEVENTS 9 #define SLEEPTIME 100 #define MINCOUNTS 100000 static double dummy3( double x, int iters ); int main( int argc, char **argv ) { PAPI_event_info_t info; int i, j, retval; int iters = NUM_FLOPS; double x = 1.1, y, dtmp; long long t1, t2; long long values[MAXEVENTS]; int sleep_time = SLEEPTIME; #ifdef STARTSTOP long long dummies[MAXEVENTS]; #endif double valsample[MAXEVENTS][REPEATS]; double valsum[MAXEVENTS]; double avg[MAXEVENTS]; double spread[MAXEVENTS]; int nevents = MAXEVENTS; int eventset = PAPI_NULL; int events[MAXEVENTS]; int fails; events[0] = PAPI_FP_INS; events[1] = PAPI_TOT_INS; events[2] = PAPI_INT_INS; events[3] = PAPI_TOT_CYC; events[4] = PAPI_STL_CCY; events[5] = PAPI_BR_INS; events[6] = PAPI_SR_INS; events[7] = PAPI_LD_INS; events[8] = PAPI_TOT_IIS; for ( i = 0; i < MAXEVENTS; i++ ) { values[i] = 0; valsum[i] = 0; } if ( argc > 1 ) { if ( !strcmp( argv[1], "TESTS_QUIET" ) ) tests_quiet( argc, argv ); else { sleep_time = atoi( argv[1] ); if ( sleep_time <= 0 ) sleep_time = SLEEPTIME; } } if ( !TESTS_QUIET ) { printf( "\nAccuracy check of multiplexing routines.\n" ); printf( "Investigating the variance of multiplexed measurements.\n\n" ); } if ( ( retval = PAPI_library_init( PAPI_VER_CURRENT ) ) != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); #ifdef MPX init_multiplex( ); #endif if ( ( retval = PAPI_create_eventset( &eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); #ifdef MPX /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( eventset, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); if ( ( retval = PAPI_set_multiplex( eventset ) ) ) { if ( retval == PAPI_ENOSUPP) { test_skip(__FILE__, __LINE__, "Multiplex not supported", 1); } test_fail( __FILE__, __LINE__, "PAPI_set_multiplex", retval ); } #endif /* What does this code even do? */ nevents = MAXEVENTS; for ( i = 0; i < nevents; i++ ) { if ( ( retval = PAPI_add_event( eventset, events[i] ) ) ) { for ( j = i; j < MAXEVENTS-1; j++ ) { events[j] = events[j + 1]; } nevents--; i--; } } if ( nevents < 2 ) test_skip( __FILE__, __LINE__, "Not enough events left...", 0 ); /* Find a reasonable number of iterations (each * event active 20 times) during the measurement */ t2 = 10000 * 20 * nevents; /* Target: 10000 usec/multiplex, 20 repeats */ if ( t2 > 30e6 ) test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); /* Measure one run */ t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); t1 = PAPI_get_real_usec( ) - t1; if ( t2 > t1 ) /* Scale up execution time to match t2 */ iters = iters * ( int ) ( t2 / t1 ); else if ( t1 > 30e6 ) /* Make sure execution time is < 30s per repeated test */ test_skip( __FILE__, __LINE__, "This test takes too much time", retval ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); for ( i = 1; i <= REPEATS; i++ ) { x = 1.0; #ifndef STARTSTOP if ( ( retval = PAPI_reset( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_reset", retval ); #else if ( ( retval = PAPI_stop( eventset, dummies ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( ( retval = PAPI_start( eventset ) ) ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); #endif if ( !TESTS_QUIET ) printf( "\nTest %d (of %d):\n", i, REPEATS ); t1 = PAPI_get_real_usec( ); y = dummy3( x, iters ); PAPI_read( eventset, values ); t2 = PAPI_get_real_usec( ); if ( !TESTS_QUIET ) { printf( "\n(calculated independent of PAPI)\n" ); printf( "\tOperations= %.1f Mflop", y * 1e-6 ); printf( "\t(%g Mflop/s)\n\n", ( y / ( double ) ( t2 - t1 ) ) ); printf( "PAPI measurements:\n" ); } for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); if ( !TESTS_QUIET ) { printf( "%20s = ", info.short_descr ); printf( LLDFMT, values[j] ); printf( "\n" ); } dtmp = ( double ) values[j]; valsum[j] += dtmp; valsample[j][i - 1] = dtmp; } if ( !TESTS_QUIET ) printf( "\n" ); } if ( ( retval = PAPI_stop( eventset, values ) ) ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); if ( !TESTS_QUIET ) { printf( "\n\nEstimated variance relative to average counts:\n" ); for ( j = 0; j < nevents; j++ ) printf( " Event %.2d", j ); printf( "\n" ); } fails = nevents; /* Due to limited precision of floating point cannot really use typical standard deviation compuation for large numbers with very small variations. Instead compute the std devation problems with precision. */ for ( j = 0; j < nevents; j++ ) { avg[j] = valsum[j] / REPEATS; spread[j] = 0; for ( i = 0; i < REPEATS; ++i ) { double diff = ( valsample[j][i] - avg[j] ); spread[j] += diff * diff; } spread[j] = sqrt( spread[j] / REPEATS ) / avg[j]; if ( !TESTS_QUIET ) printf( "%9.2g ", spread[j] ); /* Make sure that NaN get counted as errors */ if ( spread[j] < MPX_TOLERANCE ) --fails; else if ( valsum[j] < MINCOUNTS ) /* Neglect inprecise results with low counts */ --fails; } if ( !TESTS_QUIET ) { printf( "\n\n" ); for ( j = 0; j < nevents; j++ ) { PAPI_get_event_info( events[j], &info ); printf( "Event %.2d: mean=%10.0f, sdev/mean=%7.2g nrpt=%2d -- %s\n", j, avg[j], spread[j], REPEATS, info.short_descr ); } printf( "\n\n" ); } if ( fails ) test_fail( __FILE__, __LINE__, "Values outside threshold", fails ); else test_pass( __FILE__, NULL, 0 ); return 0; } static double dummy3( double x, int iters ) { int i; double w, y, z, a, b, c, d, e, f, g, h; double one; one = 1.0; w = x; y = x; z = x; a = x; b = x; c = x; d = x; e = x; f = x; g = x; h = x; for ( i = 1; i <= iters; i++ ) { w = w * 1.000000000001 + one; y = y * 1.000000000002 + one; z = z * 1.000000000003 + one; a = a * 1.000000000004 + one; b = b * 1.000000000005 + one; c = c * 0.999999999999 + one; d = d * 0.999999999998 + one; e = e * 0.999999999997 + one; f = f * 0.999999999996 + one; g = h * 0.999999999995 + one; h = h * 1.000000000006 + one; } return 2.0 * ( a + b + c + d + e + f + w + x + y + z + g + h ); } papi-5.3.0/src/ctests/zero_shmem.c0000600003276200002170000000507212247131121016550 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for 2 slave OMP threads - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC Each of 2 slave pthreads: - Get cyc. - Get us. - Start counters - Do flops - Stop and read counters - Get us. - Get cyc. Master pthread: - Get us. - Get cyc. - Fork threads - Wait for threads to exit - Get us. - Get cyc. */ #include #include #include #include #include #include #include #include "papi_test.h" void Thread( int n ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int mask1 = 0x5; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; EventSet1 = add_test_events( &num_events1, &mask1, 1 ); /* num_events1 is greater than num_events2 so don't worry. */ values = allocate_test_space( num_tests, num_events1 ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); retval = PAPI_start( EventSet1 ); if ( retval >= PAPI_OK ) exit( 1 ); do_flops( n ); retval = PAPI_stop( EventSet1, values[0] ); if ( retval >= PAPI_OK ) exit( 1 ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; remove_test_events( &EventSet1, mask1 ); printf( "Thread %#x PAPI_FP_INS : \t%lld\n", n / 1000000, ( values[0] )[0] ); printf( "Thread %#x PAPI_TOT_CYC: \t%lld\n", n / 1000000, ( values[0] )[1] ); printf( "Thread %#x Real usec : \t%lld\n", n / 1000000, elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", n / 1000000, elapsed_cyc ); free_test_space( values, num_tests ); } int main( int argc, char **argv ) { /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); long long elapsed_us, elapsed_cyc; elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); #ifdef HAVE_OPENSHMEM start_pes( 2 ); Thread( 1000000 * ( _my_pe( ) + 1 ) ); #else test_skip( __FILE__, __LINE__, "OpenSHMEM support not found, skipping.", 0); #endif elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; printf( "Master real usec : \t%lld\n", elapsed_us ); printf( "Master real cycles : \t%lld\n", elapsed_cyc ); exit( 0 ); } papi-5.3.0/src/ctests/fork.c0000600003276200002170000000175012247131121015340 0ustar ralphundrgrad/* * File: fork.c * Author: Philip Mucci * mucci@cs.utk.edu * Mods: * */ /* This file performs the following test: PAPI_library_init() fork(); / \ parent child wait() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); if ( fork( ) == 0 ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "forked PAPI_library_init", retval ); exit( 0 ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/mendes-alt.c0000600003276200002170000001104612247131121016427 0ustar ralphundrgrad#include #include #include #include "papi_test.h" #ifdef SETMAX #define MAX SETMAX #else #define MAX 10000 #endif #define TIMES 1000 #define PAPI_MAX_EVENTS 2 long long PAPI_values1[PAPI_MAX_EVENTS]; long long PAPI_values2[PAPI_MAX_EVENTS]; long long PAPI_values3[PAPI_MAX_EVENTS]; static int EventSet = PAPI_NULL; extern int TESTS_QUIET; /* Declared in test_utils.c */ int main( argc, argv ) int argc; char *argv[]; { int i, retval; double a[MAX], b[MAX]; void funcX( ), funcA( ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ for ( i = 0; i < MAX; i++ ) { a[i] = 0.0; b[i] = 0.; } for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) PAPI_values1[i] = PAPI_values2[i] = 0; retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); #ifdef MULTIPLEX if ( !TESTS_QUIET ) { printf( "Activating PAPI Multiplex\n" ); } init_multiplex( ); #endif retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI set event fail\n", retval ); #ifdef MULTIPLEX /* In Component PAPI, EventSets must be assigned a component index before you can fiddle with their internals. 0 is always the cpu component */ retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); retval = PAPI_set_multiplex( EventSet ); if (retval == PAPI_ENOSUPP) { test_skip( __FILE__, __LINE__, "Multiplex not supported", 1 ); } else if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_multiplex fails \n", retval ); #endif retval = PAPI_add_event( EventSet, PAPI_FP_INS ); if ( retval < PAPI_OK ) { retval = PAPI_add_event( EventSet, PAPI_TOT_INS ); if ( retval < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI add PAPI_FP_INS or PAPI_TOT_INS fail\n", retval ); else if ( !TESTS_QUIET ) { printf( "PAPI_TOT_INS\n" ); } } else if ( !TESTS_QUIET ) { printf( "PAPI_FP_INS\n" ); } retval = PAPI_add_event( EventSet, PAPI_TOT_CYC ); if ( retval < PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI add PAPI_TOT_CYC fail\n", retval ); if ( !TESTS_QUIET ) { printf( "PAPI_TOT_CYC\n" ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI start fail\n", retval ); funcX( a, b, MAX ); retval = PAPI_read( EventSet, PAPI_values1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI read fail \n", retval ); funcX( a, b, MAX ); retval = PAPI_read( EventSet, PAPI_values2 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI read fail \n", retval ); #ifdef RESET retval = PAPI_reset( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI read fail \n", retval ); #endif funcA( a, b, MAX ); retval = PAPI_stop( EventSet, PAPI_values3 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI read fail \n", retval ); if ( !TESTS_QUIET ) { printf( "values1 is:\n" ); for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) printf( LLDFMT15, PAPI_values1[i] ); printf( "\nvalues2 is:\n" ); for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) printf( LLDFMT15, PAPI_values2[i] ); printf( "\nvalues3 is:\n" ); for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) printf( LLDFMT15, PAPI_values3[i] ); #ifndef RESET printf( "\nPAPI value (2-1) is : \n" ); for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) printf( LLDFMT15, PAPI_values2[i] - PAPI_values1[i] ); printf( "\nPAPI value (3-2) is : \n" ); for ( i = 0; i < PAPI_MAX_EVENTS; i++ ) { long long diff; diff = PAPI_values3[i] - PAPI_values2[i]; printf( LLDFMT15, diff); if (diff<0) { test_fail( __FILE__, __LINE__, "Multiplexed counter decreased", 1 ); } } #endif printf( "\n\nVerification:\n" ); printf( "From start to first PAPI_read %d fp operations are made.\n", 2 * MAX * TIMES ); printf( "Between 1st and 2nd PAPI_read %d fp operations are made.\n", 2 * MAX * TIMES ); printf( "Between 2nd and 3rd PAPI_read %d fp operations are made.\n", 0 ); printf( "\n" ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } void funcX( a, b, n ) double a[MAX], b[MAX]; int n; { int i, k; for ( k = 0; k < TIMES; k++ ) for ( i = 0; i < n; i++ ) a[i] = a[i] * b[i] + 1.; } void funcA( a, b, n ) double a[MAX], b[MAX]; int n; { int i, k; double t[MAX]; for ( k = 0; k < TIMES; k++ ) for ( i = 0; i < n; i++ ) { t[i] = b[n - i]; b[i] = a[n - i]; a[i] = t[i]; } } papi-5.3.0/src/ctests/fork_exec_overflow.c0000600003276200002170000001245512247131121020273 0ustar ralphundrgrad/* * Test PAPI with fork() and exec(). */ #include #include #include #include #include #include #include #include "papi.h" #include "papi_test.h" #define MAX_EVENTS 3 int Event[MAX_EVENTS] = { PAPI_TOT_CYC, PAPI_FP_INS, PAPI_FAD_INS, }; int Threshold[MAX_EVENTS] = { 8000000, 4000000, 4000000, }; int num_events = 1; int EventSet = PAPI_NULL; char *name = "unknown"; struct timeval start, last; long count, total; void my_handler( int EventSet, void *pc, long long ovec, void *context ) { ( void ) EventSet; ( void ) pc; ( void ) ovec; ( void ) context; count++; total++; } void zero_count( void ) { gettimeofday( &start, NULL ); last = start; count = 0; total = 0; } #define HERE(str) printf("[%d] %s, %s\n", getpid(), name, str); void print_rate( char *str ) { static int last_count = -1; struct timeval now; double st_secs, last_secs; gettimeofday( &now, NULL ); st_secs = ( double ) ( now.tv_sec - start.tv_sec ) + ( ( double ) ( now.tv_usec - start.tv_usec ) ) / 1000000.0; last_secs = ( double ) ( now.tv_sec - last.tv_sec ) + ( ( double ) ( now.tv_usec - last.tv_usec ) ) / 1000000.0; if ( last_secs <= 0.001 ) last_secs = 0.001; printf( "[%d] %s, time = %.3f, total = %ld, last = %ld, rate = %.1f/sec\n", getpid( ), str, st_secs, total, count, ( ( double ) count ) / last_secs ); if ( last_count != -1 ) { if ( count < .1 * last_count ) { test_fail( name, __LINE__, "Interrupt rate changed!", 1 ); exit( 1 ); } } last_count = ( int ) count; count = 0; last = now; } void do_cycles( int program_time ) { struct timeval start, now; double x, sum; gettimeofday( &start, NULL ); for ( ;; ) { sum = 1.0; for ( x = 1.0; x < 250000.0; x += 1.0 ) sum += x; if ( sum < 0.0 ) printf( "==>> SUM IS NEGATIVE !! <<==\n" ); gettimeofday( &now, NULL ); if ( now.tv_sec >= start.tv_sec + program_time ) break; } } void my_papi_init( void ) { if ( PAPI_library_init( PAPI_VER_CURRENT ) != PAPI_VER_CURRENT ) test_fail( name, __LINE__, "PAPI_library_init failed", 1 ); } void my_papi_start( void ) { int ev; EventSet = PAPI_NULL; if ( PAPI_create_eventset( &EventSet ) != PAPI_OK ) test_fail( name, __LINE__, "PAPI_create_eventset failed", 1 ); for ( ev = 0; ev < num_events; ev++ ) { if ( PAPI_add_event( EventSet, Event[ev] ) != PAPI_OK ) test_fail( name, __LINE__, "PAPI_add_event failed", 1 ); } for ( ev = 0; ev < num_events; ev++ ) { if ( PAPI_overflow( EventSet, Event[ev], Threshold[ev], 0, my_handler ) != PAPI_OK ) { test_fail( name, __LINE__, "PAPI_overflow failed", 1 ); } } if ( PAPI_start( EventSet ) != PAPI_OK ) test_fail( name, __LINE__, "PAPI_start failed", 1 ); } void my_papi_stop( void ) { if ( PAPI_stop( EventSet, NULL ) != PAPI_OK ) test_fail( name, __LINE__, "PAPI_stop failed", 1 ); } void run( char *str, int len ) { int n; for ( n = 1; n <= len; n++ ) { do_cycles( 1 ); print_rate( str ); } } int main( int argc, char **argv ) { char buf[100]; if ( argc < 2 || sscanf( argv[1], "%d", &num_events ) < 1 ) num_events = 1; if ( num_events < 0 || num_events > MAX_EVENTS ) num_events = 1; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ do_cycles( 1 ); zero_count( ); my_papi_init( ); name = argv[0]; printf( "[%d] %s, num_events = %d\n", getpid( ), name, num_events ); sprintf( buf, "%d", num_events ); my_papi_start( ); run( name, 3 ); #if defined(PCHILD) HERE( "stop" ); my_papi_stop( ); HERE( "end" ); test_pass( name, NULL, 0 ); #elif defined(PEXEC) HERE( "stop" ); my_papi_stop( ); HERE( "exec(./child_overflow)" ); if ( access( "./child_overflow", X_OK ) == 0 ) execl( "./child_overflow", "./child_overflow", ( TESTS_QUIET ? "TESTS_QUIET" : NULL ), NULL ); else if ( access( "./ctests/child_overflow", X_OK ) == 0 ) execl( "./ctests/child_overflow", "./ctests/child_overflow", ( TESTS_QUIET ? "TESTS_QUIET" : NULL ), NULL ); test_fail( name, __LINE__, "exec failed", 1 ); #elif defined(SYSTEM) HERE( "system(./child_overflow)" ); if ( access( "./child_overflow", X_OK ) == 0 ) ( TESTS_QUIET ? system( "./child_overflow TESTS_QUIET" ) : system( "./child_overflow" ) ); else if ( access( "./ctests/child_overflow", X_OK ) == 0 ) ( TESTS_QUIET ? system( "./ctests/child_overflow TESTS_QUIET" ) : system( "./ctests/child_overflow" ) ); test_pass( name, NULL, 0 ); #elif defined(SYSTEM2) HERE( "system(./burn)" ); if ( access( "./burn", X_OK ) == 0 ) ( TESTS_QUIET ? system( "./burn TESTS_QUIET" ) : system( "./burn" ) ); else if ( access( "./ctests/burn", X_OK ) == 0 ) ( TESTS_QUIET ? system( "./ctests/burn TESTS_QUIET" ) : system( "./ctests/burn" ) ); test_pass( name, NULL, 0 ); #else HERE( "fork" ); { int ret = fork( ); if ( ret < 0 ) test_fail( name, __LINE__, "fork failed", 1 ); if ( ret == 0 ) { /* * Child process. */ zero_count( ); my_papi_init( ); my_papi_start( ); run( "child", 5 ); HERE( "stop" ); my_papi_stop( ); sleep( 3 ); HERE( "end" ); exit( 0 ); } run( "main", 14 ); my_papi_stop( ); { int status; wait( &status ); HERE( "end" ); if ( WEXITSTATUS( status ) != 0 ) test_fail( name, __LINE__, "child failed", 1 ); else test_pass( name, NULL, 0 ); } } #endif exit( 0 ); } papi-5.3.0/src/ctests/zero_smp.c0000600003276200002170000001005112247131121016227 0ustar ralphundrgrad/* $Id$ */ /* This file performs the following test: start, stop and timer functionality for 2 slave native SMP threads - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC Each of 2 slave pthreads: - Get cyc. - Get us. - Start counters - Do flops - Stop and read counters - Get us. - Get cyc. Master pthread: - Get us. - Get cyc. - Fork threads - Wait for threads to exit - Get us. - Get cyc. */ #include "papi_test.h" #if defined(sun) && defined(sparc) #include #elif defined(mips) && defined(sgi) && defined(unix) #include #elif defined(_AIX) #include #endif void Thread( int t, int n ) { int retval, num_tests = 1; int EventSet1 = PAPI_NULL; int PAPI_event, mask1; int num_events1; long long **values; long long elapsed_us, elapsed_cyc; char event_name[PAPI_MAX_STR_LEN]; /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ EventSet1 = add_two_events( &num_events1, &PAPI_event, &mask1 ); retval = PAPI_event_code_to_name( PAPI_event, event_name ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); values = allocate_test_space( num_tests, num_events1 ); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); do_flops( n ); elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); remove_test_events( &EventSet1, mask1 ); if ( !TESTS_QUIET ) { printf( "Thread %#x %-12s : \t%lld\n", t, event_name, values[0][1] ); printf( "Thread %#x PAPI_TOT_CYC : \t%lld\n", t, values[0][0] ); } free_test_space( values, num_tests ); if ( !TESTS_QUIET ) { printf( "Thread %#x Real usec : \t%lld\n", t, elapsed_us ); printf( "Thread %#x Real cycles : \t%lld\n", t, elapsed_cyc ); } PAPI_unregister_thread( ); } int main( int argc, char **argv ) { int i, retval; long long elapsed_us, elapsed_cyc; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); #if defined(_AIX) retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( pthread_self ) ); if ( retval != PAPI_OK ) { if ( retval == PAPI_ECMP ) test_skip( __FILE__, __LINE__, "PAPI_thread_init", retval ); else test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } #pragma ibm parallel_loop #elif defined(sgi) && defined(mips) retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( mp_my_threadnum ) ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } #pragma parallel #pragma local(i) #pragma pfor #elif defined(sun) && defined(sparc) retval = PAPI_thread_init( ( unsigned long ( * )( void ) ) ( thr_self ) ); if ( retval != PAPI_OK ) { test_fail( __FILE__, __LINE__, "PAPI_thread_init", retval ); } #pragma MP taskloop private(i) #else test_skip(__FILE__, __LINE__, "Architecture not included in this test file yet.", 0); #endif for ( i = 1; i < 3; i++ ) Thread( i, 10000000 * i ); elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; if ( !TESTS_QUIET ) { printf( "Master real usec : \t%lld\n", elapsed_us ); printf( "Master real cycles : \t%lld\n", elapsed_cyc ); } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/max_multiplex.c0000600003276200002170000000464112247131121017271 0ustar ralphundrgrad/* this tests attempts to add the maximum number of pre-defined events */ /* to a multiplexed event set. This tests that we properly set the */ /* maximum events value. */ #include "papi.h" #include "papi_test.h" int main(int argc, char **argv) { int retval,max_multiplex,i,EventSet=PAPI_NULL; PAPI_event_info_t info; int added=0; int events_tried=0; /* Set TESTS_QUIET variable */ tests_quiet( argc, argv ); /* Initialize the library */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) { test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); } retval = PAPI_multiplex_init( ); if ( retval != PAPI_OK) { test_fail(__FILE__, __LINE__, "Multiplex not supported", 1); } max_multiplex=PAPI_get_opt( PAPI_MAX_MPX_CTRS, NULL ); if (!TESTS_QUIET) { printf("Maximum multiplexed counters=%d\n",max_multiplex); } if (!TESTS_QUIET) { printf("Trying to multiplex as many as possible:\n"); } retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) { test_fail(__FILE__, __LINE__, "PAPI_create_eventset", retval ); } retval = PAPI_assign_eventset_component( EventSet, 0 ); if ( retval != PAPI_OK ) { test_fail(__FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); } retval = PAPI_set_multiplex( EventSet ); if ( retval != PAPI_OK ) { test_fail(__FILE__, __LINE__, "PAPI_create_multiplex", retval ); } i = 0 | PAPI_PRESET_MASK; PAPI_enum_event( &i, PAPI_ENUM_FIRST ); do { retval = PAPI_get_event_info( i, &info ); if (retval==PAPI_OK) { if (!TESTS_QUIET) printf("Adding %s: ",info.symbol); } retval = PAPI_add_event( EventSet, info.event_code ); if (retval!=PAPI_OK) { if (!TESTS_QUIET) printf("Fail!\n"); } else { if (!TESTS_QUIET) printf("Success!\n"); added++; } events_tried++; } while (PAPI_enum_event( &i, PAPI_PRESET_ENUM_AVAIL ) == PAPI_OK ); PAPI_shutdown( ); if (!TESTS_QUIET) { printf("Added %d of theoretical max %d\n",added,max_multiplex); } if (events_tried * */ /* This file performs the following test: PAPI_library_init() PAPI_shutdown() fork() / \ parent child wait() PAPI_library_init() PAPI_shutdown() execlp() PAPI_library_init() */ #include "papi_test.h" #include int main( int argc, char **argv ) { int retval; int status; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ if ( ( argc > 1 ) && ( strcmp( argv[1], "xxx" ) == 0 ) ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "execed PAPI_library_init", retval ); } else { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "main PAPI_library_init", retval ); PAPI_shutdown( ); if ( fork( ) == 0 ) { retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "forked PAPI_library_init", retval ); PAPI_shutdown( ); if ( execlp( argv[0], argv[0], "xxx", NULL ) == -1 ) test_fail( __FILE__, __LINE__, "execlp", PAPI_ESYS ); } else { wait( &status ); if ( WEXITSTATUS( status ) != 0 ) test_fail( __FILE__, __LINE__, "fork", WEXITSTATUS( status ) ); } } test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/attach3.c0000600003276200002170000001525112247131121015727 0ustar ralphundrgrad/* This file performs the following test: start, stop and timer functionality for attached processes. - It attempts to use the following two counters. It may use less depending on hardware counter resource limitations. These are counted in the default counting domain and default granularity, depending on the platform. Usually this is the user domain (PAPI_DOM_USER) and thread context (PAPI_GRN_THR). + PAPI_FP_INS + PAPI_TOT_CYC - Get us. - Start counters - Do flops - Stop and read counters - Get us. */ #include "papi_test.h" #include #include #ifdef _AIX #define _LINUX_SOURCE_COMPAT #endif #if defined(__FreeBSD__) # define PTRACE_ATTACH PT_ATTACH # define PTRACE_CONT PT_CONTINUE #endif int wait_for_attach_and_loop( void ) { char *path; char newpath[PATH_MAX]; path = getenv("PATH"); sprintf(newpath, "PATH=./:%s", (path)?path:'\0' ); putenv(newpath); if (ptrace(PTRACE_TRACEME, 0, 0, 0) == 0) { execlp("attach_target","attach_target","100000000",NULL); perror("execl(attach_target) failed"); } perror("PTRACE_TRACEME"); return ( 1 ); } int main( int argc, char **argv ) { int status, retval, num_tests = 1, tmp; int EventSet1 = PAPI_NULL; long long **values; long long elapsed_us, elapsed_cyc, elapsed_virt_us, elapsed_virt_cyc; char event_name[PAPI_MAX_STR_LEN];; const PAPI_hw_info_t *hw_info; const PAPI_component_info_t *cmpinfo; pid_t pid; /* Fork before doing anything with the PMU */ setbuf(stdout,NULL); pid = fork( ); if ( pid < 0 ) test_fail( __FILE__, __LINE__, "fork()", PAPI_ESYS ); if ( pid == 0 ) exit( wait_for_attach_and_loop( ) ); tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ /* Master only process below here */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail_exit( __FILE__, __LINE__, "PAPI_library_init", retval ); if ( ( cmpinfo = PAPI_get_component_info( 0 ) ) == NULL ) test_fail_exit( __FILE__, __LINE__, "PAPI_get_component_info", 0 ); if ( cmpinfo->attach == 0 ) test_skip( __FILE__, __LINE__, "Platform does not support attaching", 0 ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail_exit( __FILE__, __LINE__, "PAPI_get_hardware_info", 0 ); /* add PAPI_TOT_CYC and one of the events in PAPI_FP_INS, PAPI_FP_OPS or PAPI_TOT_INS, depending on the availability of the event on the platform */ retval = PAPI_create_eventset(&EventSet1); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_attach", retval ); /* Force addition of component */ retval = PAPI_assign_eventset_component( EventSet1, 0 ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_assign_eventset_component", retval ); /* The following call causes this test to fail for perf_events */ retval = PAPI_attach( EventSet1, ( unsigned long ) pid ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_attach", retval ); sprintf(event_name,"PAPI_TOT_CYC"); retval = PAPI_add_event(EventSet1, PAPI_TOT_CYC); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_add_event(EventSet1, PAPI_FP_INS); if ( retval == PAPI_ENOEVNT ) { test_warn( __FILE__, __LINE__, "PAPI_FP_INS", retval); } else if ( retval != PAPI_OK ) { test_fail_exit( __FILE__, __LINE__, "PAPI_add_event", retval ); } values = allocate_test_space( 1, 2); elapsed_us = PAPI_get_real_usec( ); elapsed_cyc = PAPI_get_real_cyc( ); elapsed_virt_us = PAPI_get_virt_usec( ); elapsed_virt_cyc = PAPI_get_virt_cyc( ); printf("must_ptrace is %d\n",cmpinfo->attach_must_ptrace); pid_t child = wait( &status ); printf( "Debugger exited wait() with %d\n",child ); if (WIFSTOPPED( status )) { printf( "Child has stopped due to signal %d (%s)\n", WSTOPSIG( status ), strsignal(WSTOPSIG( status )) ); } if (WIFSIGNALED( status )) { printf( "Child %ld received signal %d (%s)\n", (long)child, WTERMSIG(status) , strsignal(WTERMSIG( status )) ); } printf("After %d\n",retval); retval = PAPI_start( EventSet1 ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_start", retval ); printf("Continuing\n"); if ( ptrace( PTRACE_CONT, pid, NULL, NULL ) == -1 ) { perror( "ptrace(PTRACE_CONT)" ); return 1; } do { child = wait( &status ); printf( "Debugger exited wait() with %d\n", child); if (WIFSTOPPED( status )) { printf( "Child has stopped due to signal %d (%s)\n", WSTOPSIG( status ), strsignal(WSTOPSIG( status )) ); } if (WIFSIGNALED( status )) { printf( "Child %ld received signal %d (%s)\n", (long)child, WTERMSIG(status) , strsignal(WTERMSIG( status )) ); } } while (!WIFEXITED( status )); printf("Child exited with value %d\n",WEXITSTATUS(status)); if (WEXITSTATUS(status) != 0) test_fail_exit( __FILE__, __LINE__, "Exit status of child to attach to", PAPI_EMISC); retval = PAPI_stop( EventSet1, values[0] ); if ( retval != PAPI_OK ) test_fail_exit( __FILE__, __LINE__, "PAPI_stop", retval ); elapsed_virt_us = PAPI_get_virt_usec( ) - elapsed_virt_us; elapsed_virt_cyc = PAPI_get_virt_cyc( ) - elapsed_virt_cyc; elapsed_us = PAPI_get_real_usec( ) - elapsed_us; elapsed_cyc = PAPI_get_real_cyc( ) - elapsed_cyc; retval = PAPI_cleanup_eventset(EventSet1); if (retval != PAPI_OK) test_fail_exit( __FILE__, __LINE__, "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset(&EventSet1); if (retval != PAPI_OK) test_fail_exit( __FILE__, __LINE__, "PAPI_destroy_eventset", retval ); printf( "Test case: 3rd party attach start, stop.\n" ); printf( "-----------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); printf( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); printf( "Default granularity is: %d (%s)\n", tmp, stringify_granularity( tmp ) ); printf( "Using %d iterations of c += a*b\n", NUM_FLOPS ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Test type : \t 1\n" ); printf( TAB1, "PAPI_TOT_CYC : \t", ( values[0] )[0] ); printf( TAB1, "PAPI_FP_INS : \t", ( values[0] )[1] ); printf( TAB1, "Real usec : \t", elapsed_us ); printf( TAB1, "Real cycles : \t", elapsed_cyc ); printf( TAB1, "Virt usec : \t", elapsed_virt_us ); printf( TAB1, "Virt cycles : \t", elapsed_virt_cyc ); printf ( "-------------------------------------------------------------------------\n" ); printf( "Verification: none\n" ); test_pass( __FILE__, values, num_tests ); exit( 1 ); } papi-5.3.0/src/ctests/overflow_force_software.c0000600003276200002170000002227712247131121021341 0ustar ralphundrgrad/* * File: overflow_force_software.c * CVS: $Id$ * Author: Kevin London * london@cs.utk.edu * Mods: Maynard Johnson * maynardj@us.ibm.com * Philip Mucci * mucci@cs.utk.edu * Haihang You * you@cs.utk.edu * * */ /* This file performs the following test: overflow dispatch of an eventset with just a single event. Using both Hardware and software overflows The Eventset contains: + PAPI_FP_INS (overflow monitor) - Start eventset 1 - Do flops - Stop and measure eventset 1 - Set up overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 - Set up forced software overflow on eventset 1 - Start eventset 1 - Do flops - Stop eventset 1 */ #include "papi_test.h" #define OVER_FMT "handler(%d) Overflow at %p overflow_vector=0x%llx!\n" #define OUT_FMT "%-12s : %16lld%16d%16lld\n" #define SOFT_TOLERANCE 0.90 #define MY_NUM_TESTS 5 static int total[MY_NUM_TESTS] = { 0, }; /* total overflows */ static int use_total = 0; /* which total field to bump */ static long long values[MY_NUM_TESTS] = { 0, }; void handler( int EventSet, void *address, long long overflow_vector, void *context ) { ( void ) context; if ( !TESTS_QUIET ) { fprintf( stderr, OVER_FMT, EventSet, address, overflow_vector ); } total[use_total]++; } int main( int argc, char **argv ) { int EventSet = PAPI_NULL; long long hard_min, hard_max, soft_min, soft_max; int retval; int PAPI_event = 0, mythreshold; char event_name[PAPI_MAX_STR_LEN]; PAPI_option_t opt; PAPI_event_info_t info; PAPI_option_t itimer; const PAPI_hw_info_t *hw_info = NULL; tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); /* query and set up the right instruction to monitor */ if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) { PAPI_get_event_info( PAPI_FP_INS, &info ); if ( info.count == 1 || !strcmp( info.derived, "DERIVED_CMPD" ) ) PAPI_event = PAPI_FP_INS; } } if ( PAPI_event == 0 ) { if ( PAPI_query_event( PAPI_FP_OPS ) == PAPI_OK ) { PAPI_get_event_info( PAPI_FP_OPS, &info ); if ( info.count == 1 || !strcmp( info.derived, "DERIVED_CMPD" ) ) PAPI_event = PAPI_FP_OPS; } } if ( PAPI_event == 0 ) { if ( PAPI_query_event( PAPI_TOT_INS ) == PAPI_OK ) { PAPI_get_event_info( PAPI_TOT_INS, &info ); if ( info.count == 1 || !strcmp( info.derived, "DERIVED_CMPD" ) ) PAPI_event = PAPI_TOT_INS; } } if ( PAPI_event == 0 ) test_skip( __FILE__, __LINE__, "No suitable event for this test found!", 0 ); hw_info = PAPI_get_hardware_info( ); if ( hw_info == NULL ) test_fail( __FILE__, __LINE__, "PAPI_get_hardware_info", 2 ); if ( PAPI_event == PAPI_FP_INS ) mythreshold = THRESHOLD; else #if defined(linux) mythreshold = ( int ) hw_info->cpu_max_mhz * 20000; #else mythreshold = THRESHOLD * 2; #endif retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_create_eventset", retval ); retval = PAPI_add_event( EventSet, PAPI_event ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_add_event", retval ); retval = PAPI_get_opt( PAPI_COMPONENTINFO, &opt ); if ( retval != PAPI_OK ) test_skip( __FILE__, __LINE__, "Platform does not support Hardware overflow", 0 ); do_stuff( ); /* Do reference count */ retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, &values[use_total] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); use_total++; /* Now do hardware overflow reference count */ retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, &values[use_total] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); use_total++; retval = PAPI_overflow( EventSet, PAPI_event, 0, 0, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); /* Now do software overflow reference count, uses SIGPROF */ retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, &values[use_total] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); use_total++; retval = PAPI_overflow( EventSet, PAPI_event, 0, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); /* Now do software overflow with SIGVTALRM */ memset( &itimer, 0, sizeof ( itimer ) ); itimer.itimer.itimer_num = ITIMER_VIRTUAL; itimer.itimer.itimer_sig = SIGVTALRM; if ( PAPI_set_opt( PAPI_DEF_ITIMER, &itimer ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, &values[use_total] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); use_total++; retval = PAPI_overflow( EventSet, PAPI_event, 0, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); /* Now do software overflow with SIGALRM */ memset( &itimer, 0, sizeof ( itimer ) ); itimer.itimer.itimer_num = ITIMER_REAL; itimer.itimer.itimer_sig = SIGALRM; if ( PAPI_set_opt( PAPI_DEF_ITIMER, &itimer ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_set_opt", retval ); retval = PAPI_overflow( EventSet, PAPI_event, mythreshold, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); do_stuff( ); retval = PAPI_stop( EventSet, &values[use_total] ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); use_total++; retval = PAPI_overflow( EventSet, PAPI_event, 0, PAPI_OVERFLOW_FORCE_SW, handler ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_overflow", retval ); if ( !TESTS_QUIET ) { if ( ( retval = PAPI_event_code_to_name( PAPI_event, event_name ) ) != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_event_code_to_name", retval ); printf ( "Test case: Software overflow of various types with 1 event in set.\n" ); printf ( "------------------------------------------------------------------------------\n" ); printf( "Threshold for overflow is: %d\n", mythreshold ); printf ( "------------------------------------------------------------------------------\n" ); printf( "Test type : %11s%13s%13s%13s%13s\n", "Reference", "Hardware", "ITIMER_PROF", "ITIMER_VIRT", "ITIMER_REAL" ); printf( "%-12s: %11lld%13lld%13lld%13lld%13lld\n", info.symbol, values[0], values[1], values[2], values[3], values[4] ); printf( "Overflows : %11d%13d%13d%13d%13d\n", total[0], total[1], total[2], total[3], total[4] ); printf ( "------------------------------------------------------------------------------\n" ); printf( "Verification:\n" ); printf ( "Overflow in Column 2 greater than or equal to overflows in Columns 3, 4, 5\n" ); printf( "Overflow in Columns 3, 4, 5 greater than 0\n" ); } hard_min = ( long long ) ( ( ( double ) values[0] * ( 1.0 - OVR_TOLERANCE ) ) / ( double ) mythreshold ); hard_max = ( long long ) ( ( ( double ) values[0] * ( 1.0 + OVR_TOLERANCE ) ) / ( double ) mythreshold ); soft_min = ( long long ) ( ( ( double ) values[0] * ( 1.0 - SOFT_TOLERANCE ) ) / ( double ) mythreshold ); soft_max = ( long long ) ( ( ( double ) values[0] * ( 1.0 + SOFT_TOLERANCE ) ) / ( double ) mythreshold ); if ( total[1] > hard_max || total[1] < hard_min ) test_fail( __FILE__, __LINE__, "Hardware Overflows outside limits", 1 ); if ( total[2] > soft_max || total[3] > soft_max || total[4] > soft_max ) test_fail( __FILE__, __LINE__, "Software Overflows exceed theoretical maximum", 1 ); if ( total[2] < soft_min || total[3] < soft_min || total[4] < soft_min ) printf( "WARNING: Software Overflow occuring but suspiciously low\n" ); if ( ( total[2] == 0 ) || ( total[3] == 0 ) || ( total[4] == 0 ) ) test_fail( __FILE__, __LINE__, "Software Overflows", 1 ); test_pass( __FILE__, NULL, 0 ); exit( 1 ); } papi-5.3.0/src/ctests/derived.c0000600003276200002170000000673212247131121016026 0ustar ralphundrgrad/* This file performs the following test: start, stop with a derived event */ #include "papi_test.h" #define OLD_TEST_DRIVER #ifdef OLD_TEST_DRIVER #define CPP_TEST_FAIL(string, retval) test_fail(__FILE__, __LINE__, string, retval); #define CPP_TEST_PASS() { test_pass(__FILE__, NULL, 0); exit(0); } #define CPP_TEST_SKIP() { test_skip(__FILE__,__LINE__,NULL,0); exit(0); } #else #define CPP_TEST_FAIL(function, retval) { fprintf(stderr,"%s:%d:%s:%d:%s:%s\n",__FILE__,__LINE__,function,retval,PAPI_strerror(retval),"$Id$\n"); test_fail(__FILE__, __LINE__, function, retval); } #define CPP_TEST_PASS() { fprintf(stderr,"$Id$\n%s:\tPASSED\n",__FILE__); exit(0); } #define CPP_TEST_SKIP() { fprintf(stderr,"$Id$\n%s:\tSKIPPED\n",__FILE__); exit(0); } #endif #define EVENTSLEN 2 #define QUIETPRINTF if (!TESTS_QUIET) printf unsigned int PAPI_events[EVENTSLEN] = { 0, 0 }; static const int PAPI_events_len = 1; extern int TESTS_QUIET; int main( int argc, char **argv ) { int retval, tmp; int EventSet = PAPI_NULL; int i; PAPI_event_info_t info; long long values; char event_name[PAPI_MAX_STR_LEN], add_event_str[PAPI_MAX_STR_LEN]; /*#if !defined(i386) || !defined(linux) CPP_TEST_SKIP(); #endif */ tests_quiet( argc, argv ); /* Set TESTS_QUIET variable */ retval = PAPI_library_init( PAPI_VER_CURRENT ); if ( retval != PAPI_VER_CURRENT ) test_fail( __FILE__, __LINE__, "PAPI_library_init", retval ); QUIETPRINTF( "Test case %s: start, stop with a derived counter.\n", __FILE__ ); QUIETPRINTF( "------------------------------------------------\n" ); tmp = PAPI_get_opt( PAPI_DEFDOM, NULL ); QUIETPRINTF( "Default domain is: %d (%s)\n", tmp, stringify_all_domains( tmp ) ); tmp = PAPI_get_opt( PAPI_DEFGRN, NULL ); QUIETPRINTF( "Default granularity is: %d (%s)\n\n", tmp, stringify_granularity( tmp ) ); i = PAPI_PRESET_MASK; do { if ( PAPI_get_event_info( i, &info ) == PAPI_OK ) { if ( info.count > 1 ) { PAPI_events[0] = ( unsigned int ) info.event_code; break; } } } while ( PAPI_enum_event( &i, 0 ) == PAPI_OK ); if ( PAPI_events[0] == 0 ) CPP_TEST_SKIP( ); retval = PAPI_create_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_create_eventset", retval ); for ( i = 0; i < PAPI_events_len; i++ ) { PAPI_event_code_to_name( ( int ) PAPI_events[i], event_name ); if ( !TESTS_QUIET ) QUIETPRINTF( "Adding %s\n", event_name ); retval = PAPI_add_event( EventSet, ( int ) PAPI_events[i] ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_add_event", retval ); } retval = PAPI_start( EventSet ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start", retval ); QUIETPRINTF( "Running do_stuff().\n" ); do_stuff( ); retval = PAPI_stop( EventSet, &values ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop", retval ); sprintf( add_event_str, "%-12s : \t", event_name ); QUIETPRINTF( TAB1, add_event_str, values ); QUIETPRINTF( "------------------------------------------------\n" ); retval = PAPI_cleanup_eventset( EventSet ); /* JT */ if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); retval = PAPI_destroy_eventset( &EventSet ); if ( retval != PAPI_OK ) CPP_TEST_FAIL( "PAPI_cleanup_eventset", retval ); #ifndef OLD_TEST_DRIVER PAPI_shutdown( ); #endif QUIETPRINTF( "Verification: Does it produce a non-zero value?\n" ); if ( values != 0 ) { QUIETPRINTF( "Yes: " ); QUIETPRINTF( LLDFMT, values ); QUIETPRINTF( "\n" ); } CPP_TEST_PASS( ); } papi-5.3.0/src/ctests/prof_utils.h0000600003276200002170000000352212247131121016571 0ustar ralphundrgrad/* * File: prof_utils.h * CVS: $Id$ * Author: Dan Terpstra * terpstra@cs.utk.edu * Mods: Maynard Johnson * maynardj@us.ibm.com * Mods: * */ /* This file contains utility definitions useful for all profiling tests It should be #included in: - profile.c, - sprofile.c, - profile_pthreads.c, - profile_twoevents.c, - earprofile.c, - future profiling tests. */ /* value for scale parameter that sets scale to 1 */ #define FULL_SCALE 65536 /* Internal prototype */ void prof_init(int argc, char **argv, const PAPI_exe_info_t **prginfo); int prof_events(int num_tests); void prof_print_address(char *title, const PAPI_exe_info_t *prginfo); void prof_print_prof_info(caddr_t start, caddr_t end, int threshold, char *event_name); void prof_alloc(int num, unsigned long plength); void prof_head(unsigned long blength, int bucket_size, int num_buckets, char *header); void prof_out(caddr_t start, int n, int bucket, int num_buckets, unsigned int scale); unsigned long prof_size(unsigned long plength, unsigned scale, int bucket, int *num_buckets); int prof_check(int n, int bucket, int num_buckets); int prof_buckets(int bucket); void do_no_profile(void); /* variables global to profiling tests */ extern long long **values; extern char event_name[PAPI_MAX_STR_LEN]; extern int PAPI_event; extern int EventSet; extern void *profbuf[5]; /* Itanium returns function descriptors instead of function addresses. I couldn't find the following structure in a header file, so I duplicated it below. */ #if (defined(ITANIUM1) || defined(ITANIUM2)) struct fdesc { void *ip; /* entry point (code address) */ void *gp; /* global-pointer */ }; #elif defined(__powerpc64__) struct fdesc { void * ip; // function entry point void * toc; void * env; }; #endif papi-5.3.0/src/ctests/matrix-hl.c0000600003276200002170000000746712247131121016317 0ustar ralphundrgrad/**************************************************************************** *C *C matrix-hl.f *C An example of matrix-matrix multiplication and using PAPI high level *C to look at the performance. written by Kevin London *C March 2000 *C Added to c tests to check stop *C**************************************************************************** */ #include "papi_test.h" #include int main( int argc, char **argv ) { #define NROWS1 175 #define NCOLS1 225 #define NROWS2 NCOLS1 #define NCOLS2 150 double p[NROWS1][NCOLS1], q[NROWS2][NCOLS2], r[NROWS1][NCOLS2]; int i, j, k, num_events, retval; /* PAPI standardized event to be monitored */ int event[2]; /* PAPI values of the counters */ long long values[2], tmp; extern int TESTS_QUIET; tests_quiet( argc, argv ); /* Setup default values */ num_events = 0; /* See how many hardware events at one time are supported * This also initializes the PAPI library */ num_events = PAPI_num_counters( ); if ( num_events < 2 ) { printf( "This example program requries the architecture to " "support 2 simultaneous hardware events...shutting down.\n" ); test_skip( __FILE__, __LINE__, "PAPI_num_counters", 1 ); } if ( !TESTS_QUIET ) printf( "Number of hardware counters supported: %d\n", num_events ); if ( PAPI_query_event( PAPI_FP_OPS ) == PAPI_OK ) event[0] = PAPI_FP_OPS; else if ( PAPI_query_event( PAPI_FP_INS ) == PAPI_OK ) event[0] = PAPI_FP_INS; else event[0] = PAPI_TOT_INS; /* Time used */ event[1] = PAPI_TOT_CYC; /* matrix 1: read in the matrix values */ for ( i = 0; i < NROWS1; i++ ) for ( j = 0; j < NCOLS1; j++ ) p[i][j] = i * j * 1.0; for ( i = 0; i < NROWS2; i++ ) for ( j = 0; j < NCOLS2; j++ ) q[i][j] = i * j * 1.0; for ( i = 0; i < NROWS1; i++ ) for ( j = 0; j < NCOLS2; j++ ) r[i][j] = i * j * 1.0; /* Set up the counters */ num_events = 2; retval = PAPI_start_counters( event, num_events ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_start_counters", retval ); /* Clear the counter values */ retval = PAPI_read_counters( values, num_events ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_read_counters", retval ); /* Compute the matrix-matrix multiplication */ for ( i = 0; i < NROWS1; i++ ) for ( j = 0; j < NCOLS2; j++ ) for ( k = 0; k < NCOLS1; k++ ) r[i][j] = r[i][j] + p[i][k] * q[k][j]; /* Stop the counters and put the results in the array values */ retval = PAPI_stop_counters( values, num_events ); if ( retval != PAPI_OK ) test_fail( __FILE__, __LINE__, "PAPI_stop_counters", retval ); /* Make sure the compiler does not optimize away the multiplication * with dummy(r); */ dummy( r ); if ( !TESTS_QUIET ) { if ( event[0] == PAPI_TOT_INS ) { printf( TAB1, "TOT Instructions:", values[0] ); } else { printf( TAB1, "FP Instructions:", values[0] ); } printf( TAB1, "Cycles:", values[1] ); } /* * Intel Core overreports flops by 50% when using -O * Use -O2 or -O3 to produce the expected # of flops */ if ( event[0] == PAPI_FP_INS ) { /* Compare measured FLOPS to expected value */ tmp = 2 * ( long long ) ( NROWS1 ) * ( long long ) ( NCOLS2 ) * ( long long ) ( NCOLS1 ); if ( abs( ( int ) values[0] - ( int ) tmp ) > ( double ) tmp * 0.05 ) { /* Maybe we are counting FMAs? */ tmp = tmp / 2; if ( abs( ( int ) values[0] - ( int ) tmp ) > ( double ) tmp * 0.05 ) { printf( "\n" TAB1, "Expected operation count: ", 2 * tmp ); printf( TAB1, "Or possibly (using FMA): ", tmp ); printf( TAB1, "Instead I got: ", values[0] ); test_fail( __FILE__, __LINE__, "Unexpected FLOP count (check vector operations)", 1 ); } } } test_pass( __FILE__, 0, 0 ); return ( PAPI_EMISC ); } papi-5.3.0/src/papi_user_events.h0000600003276200002170000000134112247131124016451 0ustar ralphundrgrad#ifndef _PAPI_USER_EVENTS_H #define _PAPI_USER_EVENTS_H #include "papi_internal.h" #define PAPI_UE_AND_MASK 0x3FFFFFFF #define PAPI_UE_MASK ((int)0xC0000000) #define USER_EVENT_OPERATION_LEN 512 extern int _papi_user_defined_events_setup(char *); extern void _papi_cleanup_user_events(); typedef struct { unsigned int count; int events[PAPI_EVENTS_IN_DERIVED_EVENT]; char operation[USER_EVENT_OPERATION_LEN]; /**< operation string: +,-,*,/,@(number of metrics), $(constant Mhz), %(1000000.0) */ char symbol[PAPI_MIN_STR_LEN]; char *short_desc; char *long_desc; } user_defined_event_t; extern user_defined_event_t * _papi_user_events; extern unsigned int _papi_user_events_count; #endif // _PAPI_USER_EVENTS_H papi-5.3.0/src/papi_libpfm4_events.c0000600003276200002170000000733612247131124017035 0ustar ralphundrgrad/* * File: papi_libpfm4_events.c * Author: Vince Weaver vincent.weaver @ maine.edu * based heavily on existing papi_libpfm3_events.c */ #include #include "papi.h" #include "papi_internal.h" #include "papi_vector.h" #include "papi_libpfm4_events.h" #include "perfmon/pfmlib.h" #include "perfmon/pfmlib_perf_event.h" /***********************************************************/ /* Exported functions */ /***********************************************************/ /** @class _papi_libpfm4_error * @brief convert libpfm error codes to PAPI error codes * * @param[in] pfm_error * -- a libpfm4 error code * * @returns returns a PAPI error code * */ int _papi_libpfm4_error( int pfm_error ) { switch ( pfm_error ) { case PFM_SUCCESS: return PAPI_OK; /* success */ case PFM_ERR_NOTSUPP: return PAPI_ENOSUPP; /* function not supported */ case PFM_ERR_INVAL: return PAPI_EINVAL; /* invalid parameters */ case PFM_ERR_NOINIT: return PAPI_ENOINIT; /* library not initialized */ case PFM_ERR_NOTFOUND: return PAPI_ENOEVNT; /* event not found */ case PFM_ERR_FEATCOMB: return PAPI_ECOMBO; /* invalid combination of features */ case PFM_ERR_UMASK: return PAPI_EATTR; /* invalid or missing unit mask */ case PFM_ERR_NOMEM: return PAPI_ENOMEM; /* out of memory */ case PFM_ERR_ATTR: return PAPI_EATTR; /* invalid event attribute */ case PFM_ERR_ATTR_VAL: return PAPI_EATTR; /* invalid event attribute value */ case PFM_ERR_ATTR_SET: return PAPI_EATTR; /* attribute value already set */ case PFM_ERR_TOOMANY: return PAPI_ECOUNT; /* too many parameters */ case PFM_ERR_TOOSMALL: return PAPI_ECOUNT; /* parameter is too small */ default: return PAPI_EINVAL; } } static int libpfm4_users=0; /** @class _papi_libpfm4_shutdown * @brief Shutdown any initialization done by the libpfm4 code * * @retval PAPI_OK We always return PAPI_OK * */ int _papi_libpfm4_shutdown(void) { APIDBG("Entry\n"); /* clean out and free the native events structure */ _papi_hwi_lock( NAMELIB_LOCK ); libpfm4_users--; /* Only free if we're the last user */ if (!libpfm4_users) { pfm_terminate(); } _papi_hwi_unlock( NAMELIB_LOCK ); return PAPI_OK; } /** @class _papi_libpfm4_init * @brief Initialize the libpfm4 code * * @param[in] my_vector * -- vector of the component doing the initialization * * @retval PAPI_OK We initialized correctly * @retval PAPI_ECMP There was an error initializing the component * */ int _papi_libpfm4_init(papi_vector_t *my_vector) { int version; pfm_err_t retval = PFM_SUCCESS; _papi_hwi_lock( NAMELIB_LOCK ); if (!libpfm4_users) { retval = pfm_initialize(); if ( retval != PFM_SUCCESS ) libpfm4_users--; } libpfm4_users++; _papi_hwi_unlock( NAMELIB_LOCK ); if ( retval != PFM_SUCCESS ) { PAPIERROR( "pfm_initialize(): %s", pfm_strerror( retval ) ); return PAPI_ESYS; } /* get the libpfm4 version */ SUBDBG( "pfm_get_version()\n"); if ( (version=pfm_get_version( )) < 0 ) { PAPIERROR( "pfm_get_version(): %s", pfm_strerror( retval ) ); return PAPI_ESYS; } /* Set the version */ sprintf( my_vector->cmp_info.support_version, "%d.%d", PFM_MAJ_VERSION( version ), PFM_MIN_VERSION( version ) ); /* Complain if the compiled-against version doesn't match current version */ if ( PFM_MAJ_VERSION( version ) != PFM_MAJ_VERSION( LIBPFM_VERSION ) ) { PAPIERROR( "Version mismatch of libpfm: compiled %#x vs. installed %#x\n", PFM_MAJ_VERSION( LIBPFM_VERSION ), PFM_MAJ_VERSION( version ) ); return PAPI_ESYS; } return PAPI_OK; } papi-5.3.0/src/freebsd_events.csv0000600003276200002170000003217412247131122016446 0ustar ralphundrgrad# # FreeBSD presets # these are needed as event names are different than those in libpfm4 # CPU,UNKNOWN PRESET,PAPI_TOT_CYC,NOT_DERIVED,CYCLES PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTRUCTIONS PRESET,PAPI_BR_INS,NOT_DERIVED,BRANCHES PRESET,PAPI_BR_INS,NOT_DERIVED,INTERRUPTS PRESET,PAPI_BR_MSP,NOT_DERIVED,BRANCH_MISPREDICTS PRESET,PAPI_L2_DCM,NOT_DERIVED,DC_MISSES PRESET,PAPI_L2_ICM,NOT_DERIVED,IC_MISSES PRESET,PAPI_L2_TCM,DERIVED_ADD, IC_MISSES,DC_MISSES CPU,INTEL_P6 CPU,INTEL_PII CPU,INTEL_PIII CPU,INTEL_CL CPU,INTEL_PM PRESET,PAPI_L1_DCM,NOT_DERIVED,DCU_LINES_IN # L2_IFETCH defaults to MESI PRESET,PAPI_L1_ICM,NOT_DERIVED,L2_IFETCH # BUS_TRAN_IFETCH defaults to SELF PRESET,PAPI_L2_DCM,DERIVED_SUB,L2_LINES_IN,BUS_TRAN_IFETCH # BUS_TRAN_IFETCH defaults to SELF PRESET,PAPI_L2_ICM,NOT_DERIVED,BUS_TRAN_IFETCH PRESET,PAPI_L1_TCM,NOT_DERIVED,L2_RQSTS PRESET,PAPI_L2_TCM,NOT_DERIVED,L2_LINES_IN PRESET,PAPI_CA_CLN,NOT_DERIVED,BUS_TRAN_RFO PRESET,PAPI_CA_ITV,NOT_DERIVED,BUS_TRAN_INVAL PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB.MISS PRESET,PAPI_L1_LDM,NOT_DERIVED,L2_LD PRESET,PAPI_L1_STM,NOT_DERIVED,L2_ST PRESET,PAPI_L2_LDM,DERIVED_SUB,L2_LINES_IN,L2M_LINES_INM PRESET,PAPI_L2_STM,NOT_DERIVED,L2M_LINES_INM PRESET,PAPI_BTAC_M,NOT_DERIVED,BTB_MISSES PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RX PRESET,PAPI_BR_CN,NOT_DERIVED,BR_INST_RETIRED PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_TAKEN_RETIRED PRESET,PAPI_BR_NTK,DERIVED_SUB,BR_INST_RETIRED,BR_TAKEN_RETIRED PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISS_PRED_RETIRED PRESET,PAPI_BR_PRC,DERIVED_SUB,BR_INST_RETIRED,BR_MISS_PRED_RETIRED PRESET,PAPI_TOT_IIS,NOT_DERIVED,INST_DECODED PRESET,PAPI_TOT_INS,NOT_DERIVED,INST_RETIRED PRESET,PAPI_FP_INS,NOT_DERIVED,FLOPS PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_RETIRED PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALL PRESET,PAPI_TOT_CYC,NOT_DERIVED,CPU_CLK_UNHALTED PRESET,PAPI_LST_INS,DERIVED_ADD,L2_LD,L2_ST PRESET,PAPI_L1_DCH,DERIVED_SUB,DATA_MEM_REFS, DCU_LINES_IN PRESET,PAPI_L1_DCA,NOT_DERIVED,DATA_MEM_REFS PRESET,PAPI_L2_DCA,DERIVED_ADD,L2_LD, L2_ST PRESET,PAPI_L2_DCR,NOT_DERIVED,L2_LD PRESET,PAPI_L2_DCW,NOT_DERIVED,L2_ST PRESET,PAPI_L1_ICH,DERIVED_SUB,IFU_FETCH, L2_IFETCH PRESET,PAPI_L2_ICH,DERIVED_SUB,L2_IFETCH, BUS_TRAN_IFETCH PRESET,PAPI_L1_ICA,NOT_DERIVED,IFU_FETCH PRESET,PAPI_L2_ICA,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L1_ICR,NOT_DERIVED,IFU_FETCH PRESET,PAPI_L2_ICR,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L2_TCH,DERIVED_SUB,L2_RQSTS, L2_LINES_IN PRESET,PAPI_L1_TCA,DERIVED_ADD,DATA_MEM_REFS, IFU_FETCH PRESET,PAPI_L2_TCA,NOT_DERIVED,L2_RQSTS PRESET,PAPI_L2_TCR,DERIVED_ADD,L2_LD, L2_IFETCH PRESET,PAPI_L2_TCW,NOT_DERIVED,L2_ST PRESET,PAPI_FML_INS,NOT_DERIVED,MUL PRESET,PAPI_FDV_INS,NOT_DERIVED,DIV PRESET,PAPI_FP_OPS,NOT_DERIVED,FLOPS CPU,INTEL_PM PRESET,PAPI_VEC_INS,DERIVED_ADD,MMX_INSTR_RET, EMON_SSE_SSE2_INST_RETIRED CPU,INTEL_PIII PRESET,PAPI_VEC_INS,DERIVED_ADD,MMX_INSTR_RET, EMON_KNI_INST_RETIRED CPU,INTEL_CL PRESET,PAPI_VEC_INS,NOT_DERIVED,MMX_INSTR_EXEC CPU,AMD_K7 PRESET,PAPI_L1_DCM,DERIVED_ADD,DC_REFILLS_FROM_SYSTEM, DC_REFILLS_FROM_L2 PRESET,PAPI_L1_ICM,NOT_DERIVED,IC_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,DC_REFILLS_FROM_SYSTEM PRESET,PAPI_L1_TCM,DERIVED_ADD,DC_REFILLS_FROM_SYSTEM, DC_REFILLS_FROM_L2, IC_MISSES PRESET,PAPI_TLB_DM,NOT_DERIVED,L1_AND_L2_DTLB_MISSES PRESET,PAPI_TLB_IM,NOT_DERIVED,L1_AND_L2_ITLB_MISSES PRESET,PAPI_TLB_TL,DERIVED_ADD,L1_AND_L2_DTLB_MISSES, L1_AND_L2_ITLB_MISSES PRESET,PAPI_L1_LDM,NOT_DERIVED,DC_REFILLS_FROM_L2_OES PRESET,PAPI_L1_STM,NOT_DERIVED,DC_REFILLS_FROM_L2_M PRESET,PAPI_L2_LDM,NOT_DERIVED,DC_REFILLS_FROM_SYSTEM_OES PRESET,PAPI_L2_STM,NOT_DERIVED,DC_REFILLS_FROM_SYSTEM_M PRESET,PAPI_HW_INT,NOT_DERIVED,HARDWARE_INTERRUPTS PRESET,PAPI_BR_UCN,NOT_DERIVED,RETIRED_FAR_CONTROL_TRANSFERS PRESET,PAPI_BR_CN,NOT_DERIVED,RETIRED_BRANCHES PRESET,PAPI_BR_TKN,NOT_DERIVED,RETIRED_TAKEN_BRANCHES PRESET,PAPI_BR_NTK,DERIVED_SUB,RETIRED_BRANCHES, RETIRED_TAKEN_BRANCHES PRESET,PAPI_BR_MSP,NOT_DERIVED,RETIRED_BRANCHES_MISPREDICTED PRESET,PAPI_BR_PRC,DERIVED_SUB,RETIRED_BRANCHES, RETIRED_BRANCHES_MISPREDICTED PRESET,PAPI_TOT_INS,NOT_DERIVED,RETIRED_INSTRUCTIONS PRESET,PAPI_BR_INS,NOT_DERIVED,RETIRED_TAKEN_BRANCHES PRESET,PAPI_L1_DCA,NOT_DERIVED,DC_ACCESSES PRESET,PAPI_L2_DCA,DERIVED_ADD,DC_REFILLS_FROM_SYSTEM, DC_REFILLS_FROM_L2 PRESET,PAPI_L1_ICA,NOT_DERIVED,IC_FETCHES PRESET,PAPI_L2_ICA,NOT_DERIVED,IC_MISSES PRESET,PAPI_L1_ICR,NOT_DERIVED,IC_FETCHES PRESET,PAPI_L1_TCA,DERIVED_ADD,DC_ACCESSES, IC_FETCHES CPU,AMD_K8 PRESET,PAPI_BR_INS,NOT_DERIVED,FR_RETIRED_BRANCHES PRESET,PAPI_RES_STL,NOT_DERIVED,FR_DISPATCH_STALLS PRESET,PAPI_TOT_CYC,NOT_DERIVED,BU_CPU_CLK_UNHALTED PRESET,PAPI_TOT_INS,NOT_DERIVED,FR_RETIRED_X86_INSTRUCTIONS PRESET,PAPI_STL_ICY,FR_DECODER_EMPTY PRESET,PAPI_HW_INT,NOT_DERIVED,FR_RETIRED_TAKEN_HARDWARE_INTERRUPTS PRESET,PAPI_BR_TKN,NOT_DERIVED,FR_RETIRED_TAKEN_BRANCHES PRESET,PAPI_BR_MSP,NOT_DERIVED,FR_RETIRED_TAKEN_BRANCHES_MISPREDICTED PRESET,PAPI_TLB_DM,NOT_DERIVED,DC_L1_DTLB_MISS_AND_L2_DTLB_MISS PRESET,PAPI_TLB_IM,NOT_DERIVED,IC_L1_ITLB_MISS_AND_L2_ITLB_MISS PRESET,PAPI_TLB_TL,DERIVED_ADD,DC_L1_DTLB_MISS_AND_L2_DTLB_MISS,IC_L1_ITLB_MISS_AND_L2_ITLB_MISS PRESET,PAPI_L1_DCA,NOT_DERIVED,DC_ACCESS PRESET,PAPI_L1_ICA,NOT_DERIVED,IC_FETCH PRESET,PAPI_L1_TCA,DERIVED_ADD,DC_ACCESS, IC_FETCH PRESET,PAPI_L1_ICR,NOT_DERIVED,IC_FETCH PRESET,PAPI_L2_ICH,NOT_DERIVED,IC_REFILL_FROM_L2 PRESET,PAPI_L2_DCH,NOT_DERIVED,DC_REFILL_FROM_L2 PRESET,PAPI_L2_DCM,NOT_DERIVED,DC_REFILL_FROM_SYSTEM_MOES PRESET,PAPI_L2_DCA,DERIVED_ADD,DC_REFILL_FROM_SYSTEM_MOES, DC_REFILL_FROM_L2_MOES PRESET,PAPI_L2_ICM,NOT_DERIVED,IC_REFILL_FROM_SYSTEM PRESET,PAPI_L2_DCR,NOT_DERIVED,DC_REFILL_FROM_L2_OES PRESET,PAPI_L2_DCW,NOT_DERIVED,DC_REFILL_FROM_L2_M PRESET,PAPI_L2_DCH,NOT_DERIVED,DC_REFILL_FROM_L2_MOES PRESET,PAPI_L1_LDM,NOT_DERIVED,DC_REFILL_FROM_L2_OES PRESET,PAPI_L1_STM,NOT_DERIVED,DC_REFILL_FROM_L2_M PRESET,PAPI_L2_LDM,NOT_DERIVED,DC_REFILL_FROM_SYSTEM_OES PRESET,PAPI_L2_STM,NOT_DERIVED,DC_REFILL_FROM_SYSTEM_M PRESET,PAPI_L1_DCM,DERIVED_ADD,DC_REFILL_FROM_SYSTEM_MOES, DC_REFILL_FROM_L2_MOES PRESET,PAPI_L1_ICM,DERIVED_ADD,IC_REFILL_FROM_L2, IC_REFILL_FROM_SYSTEM PRESET,PAPI_L1_TCM,DERIVED_ADD,DC_REFILL_FROM_SYSTEM_MOES,DC_REFILL_FROM_L2_MOES,IC_REFILL_FROM_SYSTEM,IC_REFILL_FROM_L2 PRESET,PAPI_L2_TCM,DERIVED_ADD,DC_REFILL_FROM_SYSTEM_MOES,IC_REFILL_FROM_SYSTEM PRESET,PAPI_L2_ICA,DERIVED_ADD,IC_REFILL_FROM_SYSTEM,IC_REFILL_FROM_L2 PRESET,PAPI_L2_TCH,DERIVED_ADD,IC_REFILL_FROM_L2,DC_REFILL_FROM_L2_MOES PRESET,PAPI_L2_TCA,DERIVED_ADD,IC_REFILL_FROM_L2,IC_REFILL_FROM_SYSTEM,DC_REFILL_FROM_L2_MOES,DC_REFILL_FROM_SYSTEM_MOES PRESET,PAPI_FML_INS,NOT_DERIVED,FP_DISPATCHED_FPU_MULS PRESET,PAPI_FAD_INS,NOT_DERIVED,FP_DISPATCHED_FPU_ADDS PRESET,PAPI_FP_OPS,NOT_DERIVED,FP_DISPATCHED_FPU_ADDS_AND_MULS PRESET,PAPI_FP_INS,NOT_DERIVED,FR_RETIRED_FPU_INSTRUCTIONS PRESET,PAPI_FPU_IDL,NOT_DERIVED,FP_CYCLES_WITH_NO_FPU_OPS_RETIRED CPU,INTEL_PIV PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALL PRESET,PAPI_TOT_CYC,NOT_DERIVED,GLOBAL_POWER_EVENTS PRESET,PAPI_L1_ICM,NOT_DERIVED,BPU_FETCH_REQUEST PRESET,PAPI_L1_ICA,NOT_DERIVED,UOP_QUEUE_WRITES_TC_BUILD_DELIVER PRESET,PAPI_TLB_DM,NOT_DERIVED,PAGE_WALK_TYPE_D PRESET,PAPI_TLB_IM,NOT_DERIVED,PAGE_WALK_TYPE_I PRESET,PAPI_TLB_TL,NOT_DERIVED,PAGE_WALK_TYPE PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_RETIRED_NON_BOGUS PRESET,PAPI_BR_INS,NOT_DERIVED,RETIRED_BRANCH_TYPE PRESET,PAPI_BR_TKN,NOT_DERIVED,BRANCH_RETIRED_TAKEN PRESET,PAPI_BR_NTK,NOT_DERIVED,BRANCH_RETIRED_NOT_TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BRANCH_RETIRED_MISPREDICTED PRESET,PAPI_BR_PRC,NOT_DERIVED,BRANCH_RETIRED_PREDICTED PRESET,PAPI_L2_TCH,NOT_DERIVED,BSQ_CACHE_REFERENCE_2L_HITS PRESET,PAPI_L2_TCM,NOT_DERIVED,BSQ_CACHE_REFERENCE_2L_MISSES PRESET,PAPI_L2_TCA,NOT_DERIVED,BSQ_CACHE_REFERENCE_2L_ACCESSES PRESET,PAPI_L3_TCH,NOT_DERIVED,BSQ_CACHE_REFERENCE_3L_HITS PRESET,PAPI_L3_TCM,NOT_DERIVED,BSQ_CACHE_REFERENCE_3L_MISSES PRESET,PAPI_L3_TCA,NOT_DERIVED,BSQ_CACHE_REFERENCE_3L_ACCESSES PRESET,PAPI_FP_INS,NOT_DERIVED,X87_FP_UOP CPU,INTEL_ATOM PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L1_DCA,NOT_DERIVED,L1D_ALL_REF PRESET,PAPI_L1_ICA,NOT_DERIVED,L1I_READS PRESET,PAPI_L1_TCA,DERIVED_ADD,L1D_ALL_REF, L1I_READS PRESET,PAPI_L2_DCA,NOT_DERIVED,L2_RQSTS PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB.MISSES PRESET,PAPI_TLB_DM,NOT_DERIVED,DATA_TLB_MISSES.DTLB_MISS PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_EXEC PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_INST_RETIRED.TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISSP_EXEC PRESET,PAPI_RES_STL,RESOURCE_STALLS.ANY PRESET,PAPI_TOT_CYC,CPU_CLK_UNHALTED.BUS PRESET,PAPI_TOT_INS,INST_RETIRED.ANY_P PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RCV PRESET,PAPI_L2_DCH,NOT_DERIVED,L2_LD PRESET,PAPI_FP_INS,NOT_DERIVED,X87_OPS_RETIRED.ANY PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,MEM_LOAD_RETIRED_L2_MISS PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISSES.ANY, ITLB.MISSES CPU,INTEL_CORE PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INSTR_RET PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALL PRESET,PAPI_TOT_CYC,NOT_DERIVED,UNHALTED_CORE_CYCLES PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR_RET PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RX PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_INSTR_RET PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISSP_EXEC PRESET,PAPI_TLB_DM,NOT_DERIVED,DTLB_MISS PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB.MISSES PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISS, ITLB.MISSES PRESET,PAPI_L2_DCH,NOT_DERIVED,L2_LD CPU,INTEL_CORE2 PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_RETIRED.ANY PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALLS.ANY PRESET,PAPI_TOT_CYC,NOT_DERIVED,CPU_CLK_UNHALTED.BUS PRESET,PAPI_TOT_INS,NOT_DERIVED,INST_RETIRED.ANY_P PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RCV PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_INST_RETIRED.TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISSP_EXEC PRESET,PAPI_TLB_DM,NOT_DERIVED,DTLB_MISSES.ANY PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB.MISSES PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISSES.ANY, ITLB.MISSES PRESET,PAPI_L1_DCA,NOT_DERIVED,L1D_ALL_REF PRESET,PAPI_L1_ICA,NOT_DERIVED,L1I_READS PRESET,PAPI_L1_TCA,DERIVED_ADD,L1D_ALL_REF, L1I_READS # PAPI_L2_ICH seems not to work PRESET,PAPI_L2_ICH,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L2_DCH,NOT_DERIVED,L2_LD PRESET,PAPI_FP_INS,NOT_DERIVED,X87_OPS_RETIRED.ANY PRESET,PAPI_L1_DCM,NOT_DERIVED,MEM_LOAD_RETIRED_L1D_MISS PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L1_TCM,DERIVED_ADD,MEM_LOAD_RETIRED_L1D_MISS, L1I_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,MEM_LOAD_RETIRED_L2_MISS CPU,INTEL_CORE2EXTREME PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_RETIRED.ANY PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALLS.ANY PRESET,PAPI_TOT_CYC,NOT_DERIVED,CPU_CLK_UNHALTED.BUS PRESET,PAPI_TOT_INS,NOT_DERIVED,INST_RETIRED.ANY_P PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RCV PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_INST_RETIRED.TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISSP_EXEC PRESET,PAPI_TLB_DM,NOT_DERIVED,DTLB_MISSES.ANY PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB.MISSES PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISSES.ANY, ITLB.MISSES PRESET,PAPI_L1_DCA,NOT_DERIVED,L1D_ALL_REF PRESET,PAPI_L1_ICA,NOT_DERIVED,L1I_READS PRESET,PAPI_L1_TCA, DERIVED_ADD, L1D_ALL_REF, L1I_READS # PAPI_L2_ICH seems not to work PRESET,PAPI_L2_ICH,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L2_DCH,NOT_DERIVED,L2_LD PRESET,PAPI_FP_INS,NOT_DERIVED,X87_OPS_RETIRED.ANY PRESET,PAPI_L1_DCM,NOT_DERIVED,MEM_LOAD_RETIRED.L1D_MISS PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L1_TCM,DERIVED_ADD,MEM_LOAD_RETIRED.L1D_MISS, L1I_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,MEM_LOAD_RETIRED.L2_MISS CPU,INTELCOREI7 PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_RETIRED.ALL_BRANCHES PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALLS.ANY PRESET,PAPI_TOT_CYC,NOT_DERIVED,CPU_CLK_UNHALTED.CORE PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR.RETIRED_ANY PRESET,PAPI_HW_INT,NOT_DERIVED,HW_INT_RCV PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_MISP_EXEC_TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISP_EXEC_ANY PRESET,PAPI_TLB_DM,NOT_DERIVED,DTLB_MISSES.ANY PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB_MISSES_ANY PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISSES.ANY, ITLB_MISSES_ANY PRESET,PAPI_L1_DCA,NOT_DERIVED,L1D_ALL_REF_ANY PRESET,PAPI_L1_ICA,NOT_DERIVED,L1I_READS PRESET,PAPI_L1_TCA, DERIVED_ADD, L1D_ALL_REF_ANY, L1I_READS # PAPI_L2_ICH seems not to work PRESET,PAPI_L2_ICH,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L2_DCH,NOT_DERIVED,MEM_LOAD_RETIRED.L2_HIT PRESET,PAPI_FP_INS,NOT_DERIVED,INST_RETIRED.X87 PRESET,PAPI_L1_DCM,NOT_DERIVED,L1D_PREFETCH_MISS PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L1_TCM,DERIVED_ADD,L1D_PREFETCH_MISS, L1I_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,L2_RQSTS_MISS CPU,INTEL_WESTMERE PRESET,PAPI_BR_INS,NOT_DERIVED,BR_INST_RETIRED.ALL_BRANCHES PRESET,PAPI_RES_STL,NOT_DERIVED,RESOURCE_STALLS.ANY PRESET,PAPI_TOT_CYC,NOT_DERIVED,CPU_CLK_UNHALTED.CORE PRESET,PAPI_TOT_INS,NOT_DERIVED,INSTR.RETIRED_ANY PRESET,PAPI_BR_TKN,NOT_DERIVED,BR_MISP_EXEC_TAKEN PRESET,PAPI_BR_MSP,NOT_DERIVED,BR_MISP_EXEC_ANY PRESET,PAPI_TLB_DM,NOT_DERIVED,DTLB_MISSES.ANY PRESET,PAPI_TLB_IM,NOT_DERIVED,ITLB_MISSES_ANY PRESET,PAPI_TLB_TL,DERIVED_ADD,DTLB_MISSES.ANY, ITLB_MISSES_ANY PRESET,PAPI_L1_ICA,NOT_DERIVED,L1I_READS # PAPI_L2_ICH seems not to work PRESET,PAPI_L2_ICH,NOT_DERIVED,L2_IFETCH PRESET,PAPI_L2_DCH,NOT_DERIVED,MEM_LOAD_RETIRED.L2_HIT PRESET,PAPI_FP_INS,NOT_DERIVED,INST_RETIRED.X87 PRESET,PAPI_L1_DCM,NOT_DERIVED,L1D_PREFETCH_MISS PRESET,PAPI_L1_ICM,NOT_DERIVED,L1I_MISSES PRESET,PAPI_L1_TCM, DERIVED_ADD, L1D_PREFETCH_MISS, L1I_MISSES PRESET,PAPI_L2_DCM,NOT_DERIVED,L2_RQSTS_MISS papi-5.3.0/src/libpfm4/0000700003276200002170000000000012247131324014263 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/examples/0000700003276200002170000000000012247131123016076 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/examples/check_events.c0000600003276200002170000001064312247131123020711 0ustar ralphundrgrad/* * check_events.c - show event encoding * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include int pmu_is_present(pfm_pmu_t p) { pfm_pmu_info_t pinfo; int ret; memset(&pinfo, 0, sizeof(pinfo)); ret = pfm_get_pmu_info(p, &pinfo); return ret == PFM_SUCCESS ? pinfo.is_present : 0; } int main(int argc, const char **argv) { pfm_pmu_info_t pinfo; pfm_pmu_encode_arg_t e; const char *arg[3]; const char **p; char *fqstr; pfm_event_info_t info; int i, j, ret; int total_supported_events = 0; int total_available_events = 0; /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "cannot initialize library: %s\n", pfm_strerror(ret)); memset(&pinfo, 0, sizeof(pinfo)); memset(&info, 0, sizeof(info)); printf("Supported PMU models:\n"); for(i=0; i < PFM_PMU_MAX; i++) { ret = pfm_get_pmu_info(i, &pinfo); if (ret != PFM_SUCCESS) continue; printf("\t[%d, %s, \"%s\"]\n", i, pinfo.name, pinfo.desc); } printf("Detected PMU models:\n"); for(i=0; i < PFM_PMU_MAX; i++) { ret = pfm_get_pmu_info(i, &pinfo); if (ret != PFM_SUCCESS) continue; if (pinfo.is_present) { printf("\t[%d, %s, \"%s\"]\n", i, pinfo.name, pinfo.desc); total_supported_events += pinfo.nevents; } total_available_events += pinfo.nevents; } printf("Total events: %d available, %d supported\n", total_available_events, total_supported_events); /* * be nice to user! */ if (argc < 2 && pmu_is_present(PFM_PMU_PERF_EVENT)) { arg[0] = "PERF_COUNT_HW_CPU_CYCLES"; arg[1] = "PERF_COUNT_HW_INSTRUCTIONS"; arg[2] = NULL; p = arg; } else { p = argv+1; } if (!*p) errx(1, "you must pass at least one event"); memset(&e, 0, sizeof(e)); while(*p) { /* * extract raw event encoding * * For perf_event encoding, use * #include * and the function: * pfm_get_perf_event_encoding() */ fqstr = NULL; e.fstr = &fqstr; ret = pfm_get_os_event_encoding(*p, PFM_PLM0|PFM_PLM3, PFM_OS_NONE, &e); if (ret != PFM_SUCCESS) { /* * codes is too small for this event * free and let the library resize */ if (ret == PFM_ERR_TOOSMALL) { free(e.codes); e.codes = NULL; e.count = 0; free(fqstr); continue; } if (ret == PFM_ERR_NOTFOUND && strstr(*p, "::")) errx(1, "%s: try setting LIBPFM_ENCODE_INACTIVE=1", pfm_strerror(ret)); errx(1, "cannot encode event %s: %s", *p, pfm_strerror(ret)); } ret = pfm_get_event_info(e.idx, PFM_OS_NONE, &info); if (ret != PFM_SUCCESS) errx(1, "cannot get event info: %s", pfm_strerror(ret)); ret = pfm_get_pmu_info(info.pmu, &pinfo); if (ret != PFM_SUCCESS) errx(1, "cannot get PMU info: %s", pfm_strerror(ret)); printf("Requested Event: %s\n", *p); printf("Actual Event: %s\n", fqstr); printf("PMU : %s\n", pinfo.desc); printf("IDX : %d\n", e.idx); printf("Codes :"); for(j=0; j < e.count; j++) printf(" 0x%"PRIx64, e.codes[j]); putchar('\n'); free(fqstr); p++; } if (e.codes) free(e.codes); return 0; } papi-5.3.0/src/libpfm4/examples/showevtinfo.c0000600003276200002170000004635512247131123020634 0ustar ralphundrgrad/* * showevtinfo.c - show event information * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #define MAXBUF 1024 #define COMBO_MAX 18 static struct { int compact; int sort; int encode; int combo; int combo_lim; int desc; char *csv_sep; pfm_event_info_t efilter; pfm_event_attr_info_t ufilter; pfm_os_t os; uint64_t mask; } options; typedef struct { uint64_t code; int idx; } code_info_t; static void show_event_info_compact(pfm_event_info_t *info); static const char *srcs[PFM_ATTR_CTRL_MAX]={ [PFM_ATTR_CTRL_UNKNOWN] = "???", [PFM_ATTR_CTRL_PMU] = "PMU", [PFM_ATTR_CTRL_PERF_EVENT] = "perf_event", }; static int event_has_pname(char *s) { char *p; return (p = strchr(s, ':')) && *(p+1) == ':'; } static int print_codes(char *buf, int plm, int max_encoding) { uint64_t *codes = NULL; int j, ret, count = 0; ret = pfm_get_event_encoding(buf, PFM_PLM0|PFM_PLM3, NULL, NULL, &codes, &count); if (ret != PFM_SUCCESS) { if (ret == PFM_ERR_NOTFOUND) errx(1, "encoding failed, try setting env variable LIBPFM_ENCODE_INACTIVE=1"); return -1; } for(j = 0; j < max_encoding; j++) { if (j < count) printf("0x%"PRIx64, codes[j]); printf("%s", options.csv_sep); } free(codes); return 0; } static int check_valid(char *buf, int plm) { uint64_t *codes = NULL; int ret, count = 0; ret = pfm_get_event_encoding(buf, PFM_PLM0|PFM_PLM3, NULL, NULL, &codes, &count); if (ret != PFM_SUCCESS) return -1; free(codes); return 0; } static int match_ufilters(pfm_event_attr_info_t *info) { uint32_t ufilter1 = 0; uint32_t ufilter2 = 0; if (options.ufilter.is_dfl) ufilter1 |= 0x1; if (info->is_dfl) ufilter2 |= 0x1; if (options.ufilter.is_precise) ufilter1 |= 0x2; if (info->is_precise) ufilter2 |= 0x2; if (!ufilter1) return 1; /* at least one filter matches */ return ufilter1 & ufilter2; } static int match_efilters(pfm_event_info_t *info) { pfm_event_attr_info_t ainfo; int n = 0; int i, ret; if (options.efilter.is_precise && !info->is_precise) return 0; memset(&ainfo, 0, sizeof(ainfo)); ainfo.size = sizeof(ainfo); pfm_for_each_event_attr(i, info) { ret = pfm_get_event_attr_info(info->idx, i, options.os, &ainfo); if (ret != PFM_SUCCESS) continue; if (match_ufilters(&ainfo)) return 1; if (ainfo.type == PFM_ATTR_UMASK) n++; } return n ? 0 : 1; } static void show_event_info_combo(pfm_event_info_t *info) { pfm_event_attr_info_t *ainfo; pfm_pmu_info_t pinfo; char buf[MAXBUF]; size_t len; int numasks = 0; int i, j, ret; uint64_t total, m, u; memset(&pinfo, 0, sizeof(pinfo)); pinfo.size = sizeof(pinfo); ret = pfm_get_pmu_info(info->pmu, &pinfo); if (ret != PFM_SUCCESS) errx(1, "cannot get PMU info"); ainfo = calloc(info->nattrs, sizeof(*ainfo)); if (!ainfo) err(1, "event %s : ", info->name); /* * extract attribute information and count number * of umasks * * we cannot just drop non umasks because we need * to keep attributes in order for the enumeration * of 2^n */ pfm_for_each_event_attr(i, info) { ainfo[i].size = sizeof(*ainfo); ret = pfm_get_event_attr_info(info->idx, i, options.os, &ainfo[i]); if (ret != PFM_SUCCESS) errx(1, "cannot get attribute info: %s", pfm_strerror(ret)); if (ainfo[i].type == PFM_ATTR_UMASK) numasks++; } if (numasks > options.combo_lim) { warnx("event %s has too many umasks to print all combinations, dropping to simple enumeration", info->name); free(ainfo); show_event_info_compact(info); return; } if (numasks) { if (info->nattrs > (int)((sizeof(total)<<3))) { warnx("too many umasks, cannot show all combinations for event %s", info->name); goto end; } total = 1ULL << info->nattrs; for (u = 1; u < total; u++) { len = sizeof(buf); len -= snprintf(buf, len, "%s::%s", pinfo.name, info->name); if (len <= 0) { warnx("event name too long%s", info->name); goto end; } for(m = u, j = 0; m; m >>=1, j++) { if (m & 0x1ULL) { /* we have hit a non umasks attribute, skip */ if (ainfo[j].type != PFM_ATTR_UMASK) break; if (len < (1 + strlen(ainfo[j].name))) { warnx("umasks combination too long for event %s", buf); break; } strncat(buf, ":", len-1);buf[len-1] = '\0'; len--; strncat(buf, ainfo[j].name, len-1);buf[len-1] = '\0'; len -= strlen(ainfo[j].name); } } /* if found a valid umask combination, check encoding */ if (m == 0) { if (options.encode) ret = print_codes(buf, PFM_PLM0|PFM_PLM3, pinfo.max_encoding); else ret = check_valid(buf, PFM_PLM0|PFM_PLM3); if (!ret) printf("%s\n", buf); } } } else { snprintf(buf, sizeof(buf)-1, "%s::%s", pinfo.name, info->name); buf[sizeof(buf)-1] = '\0'; ret = options.encode ? print_codes(buf, PFM_PLM0|PFM_PLM3, pinfo.max_encoding) : 0; if (!ret) printf("%s\n", buf); } end: free(ainfo); } static void show_event_info_compact(pfm_event_info_t *info) { pfm_event_attr_info_t ainfo; pfm_pmu_info_t pinfo; char buf[MAXBUF]; int i, ret, um = 0; memset(&ainfo, 0, sizeof(ainfo)); memset(&pinfo, 0, sizeof(pinfo)); pinfo.size = sizeof(pinfo); ainfo.size = sizeof(ainfo); ret = pfm_get_pmu_info(info->pmu, &pinfo); if (ret != PFM_SUCCESS) errx(1, "cannot get pmu info: %s", pfm_strerror(ret)); pfm_for_each_event_attr(i, info) { ret = pfm_get_event_attr_info(info->idx, i, options.os, &ainfo); if (ret != PFM_SUCCESS) errx(1, "cannot get attribute info: %s", pfm_strerror(ret)); if (ainfo.type != PFM_ATTR_UMASK) continue; if (!match_ufilters(&ainfo)) continue; snprintf(buf, sizeof(buf)-1, "%s::%s:%s", pinfo.name, info->name, ainfo.name); buf[sizeof(buf)-1] = '\0'; ret = 0; if (options.encode) { ret = print_codes(buf, PFM_PLM0|PFM_PLM3, pinfo.max_encoding); } if (!ret) { printf("%s", buf); if (options.desc) { printf("%s", options.csv_sep); printf("\"%s. %s.\"", info->desc, ainfo.desc); } putchar('\n'); } um++; } if (um == 0) { if (!match_efilters(info)) return; snprintf(buf, sizeof(buf)-1, "%s::%s", pinfo.name, info->name); buf[sizeof(buf)-1] = '\0'; if (options.encode) { ret = print_codes(buf, PFM_PLM0|PFM_PLM3, pinfo.max_encoding); if (ret) return; } printf("%s", buf); if (options.desc) { printf("%s", options.csv_sep); printf("\"%s.\"", info->desc); } putchar('\n'); } } int compare_codes(const void *a, const void *b) { const code_info_t *aa = a; const code_info_t *bb = b; uint64_t m = options.mask; if ((aa->code & m) < (bb->code &m)) return -1; if ((aa->code & m) == (bb->code & m)) return 0; return 1; } static void print_event_flags(pfm_event_info_t *info) { int n = 0; if (info->is_precise) { printf("[precise] "); n++; } if (!n) printf("None"); } static void print_attr_flags(pfm_event_attr_info_t *info) { int n = 0; if (info->is_dfl) { printf("[default] "); n++; } if (info->is_precise) { printf("[precise] "); n++; } if (!n) printf("None "); } static void show_event_info(pfm_event_info_t *info) { pfm_event_attr_info_t ainfo; pfm_pmu_info_t pinfo; int mod = 0, um = 0; int i, ret; const char *src; memset(&ainfo, 0, sizeof(ainfo)); memset(&pinfo, 0, sizeof(pinfo)); pinfo.size = sizeof(pinfo); ainfo.size = sizeof(ainfo); if (!match_efilters(info)) return; ret = pfm_get_pmu_info(info->pmu, &pinfo); if (ret) errx(1, "cannot get pmu info: %s", pfm_strerror(ret)); printf("#-----------------------------\n" "IDX : %d\n" "PMU name : %s (%s)\n" "Name : %s\n" "Equiv : %s\n", info->idx, pinfo.name, pinfo.desc, info->name, info->equiv ? info->equiv : "None"); printf("Flags : "); print_event_flags(info); putchar('\n'); printf("Desc : %s\n", info->desc ? info->desc : "no description available"); printf("Code : 0x%"PRIx64"\n", info->code); pfm_for_each_event_attr(i, info) { ret = pfm_get_event_attr_info(info->idx, i, options.os, &ainfo); if (ret != PFM_SUCCESS) errx(1, "cannot retrieve event %s attribute info: %s", info->name, pfm_strerror(ret)); if (ainfo.ctrl >= PFM_ATTR_CTRL_MAX) { warnx("event: %s has unsupported attribute source %d", info->name, ainfo.ctrl); ainfo.ctrl = PFM_ATTR_CTRL_UNKNOWN; } src = srcs[ainfo.ctrl]; switch(ainfo.type) { case PFM_ATTR_UMASK: if (!match_ufilters(&ainfo)) continue; printf("Umask-%02u : 0x%02"PRIx64" : %s : [%s] : ", um, ainfo.code, src, ainfo.name); print_attr_flags(&ainfo); putchar(':'); if (ainfo.equiv) printf(" Alias to %s", ainfo.equiv); else printf(" %s", ainfo.desc); putchar('\n'); um++; break; case PFM_ATTR_MOD_BOOL: printf("Modif-%02u : 0x%02"PRIx64" : %s : [%s] : %s (boolean)\n", mod, ainfo.code, src, ainfo.name, ainfo.desc); mod++; break; case PFM_ATTR_MOD_INTEGER: printf("Modif-%02u : 0x%02"PRIx64" : %s : [%s] : %s (integer)\n", mod, ainfo.code, src, ainfo.name, ainfo.desc); mod++; break; default: printf("Attr-%02u : 0x%02"PRIx64" : %s : [%s] : %s\n", i, ainfo.code, ainfo.name, src, ainfo.desc); } } } static int show_info(char *event, regex_t *preg) { pfm_pmu_info_t pinfo; pfm_event_info_t info; int i, j, ret, match = 0, pname; size_t len, l = 0; char *fullname = NULL; memset(&pinfo, 0, sizeof(pinfo)); memset(&info, 0, sizeof(info)); pinfo.size = sizeof(pinfo); info.size = sizeof(info); pname = event_has_pname(event); /* * scan all supported events, incl. those * from undetected PMU models */ pfm_for_all_pmus(j) { ret = pfm_get_pmu_info(j, &pinfo); if (ret != PFM_SUCCESS) continue; /* no pmu prefix, just look for detected PMU models */ if (!pname && !pinfo.is_present) continue; for (i = pinfo.first_event; i != -1; i = pfm_get_event_next(i)) { ret = pfm_get_event_info(i, options.os, &info); if (ret != PFM_SUCCESS) errx(1, "cannot get event info: %s", pfm_strerror(ret)); len = strlen(info.name) + strlen(pinfo.name) + 1 + 2; if (len > l) { l = len; fullname = realloc(fullname, l); if (!fullname) err(1, "cannot allocate memory"); } sprintf(fullname, "%s::%s", pinfo.name, info.name); if (regexec(preg, fullname, 0, NULL, 0) == 0) { if (options.compact) if (options.combo) show_event_info_combo(&info); else show_event_info_compact(&info); else show_event_info(&info); match++; } } } if (fullname) free(fullname); return match; } static int show_info_sorted(char *event, regex_t *preg) { pfm_pmu_info_t pinfo; pfm_event_info_t info; unsigned int j; int i, ret, n, match = 0; size_t len, l = 0; char *fullname = NULL; code_info_t *codes; memset(&pinfo, 0, sizeof(pinfo)); memset(&info, 0, sizeof(info)); pinfo.size = sizeof(pinfo); info.size = sizeof(info); pfm_for_all_pmus(j) { ret = pfm_get_pmu_info(j, &pinfo); if (ret != PFM_SUCCESS) continue; codes = malloc(pinfo.nevents * sizeof(*codes)); if (!codes) err(1, "cannot allocate memory\n"); /* scans all supported events */ n = 0; for (i = pinfo.first_event; i != -1; i = pfm_get_event_next(i)) { ret = pfm_get_event_info(i, options.os, &info); if (ret != PFM_SUCCESS) errx(1, "cannot get event info: %s", pfm_strerror(ret)); if (info.pmu != j) continue; codes[n].idx = info.idx; codes[n].code = info.code; n++; } qsort(codes, n, sizeof(*codes), compare_codes); for(i=0; i < n; i++) { ret = pfm_get_event_info(codes[i].idx, options.os, &info); if (ret != PFM_SUCCESS) errx(1, "cannot get event info: %s", pfm_strerror(ret)); len = strlen(info.name) + strlen(pinfo.name) + 1 + 2; if (len > l) { l = len; fullname = realloc(fullname, l); if (!fullname) err(1, "cannot allocate memory"); } sprintf(fullname, "%s::%s", pinfo.name, info.name); if (regexec(preg, fullname, 0, NULL, 0) == 0) { if (options.compact) show_event_info_compact(&info); else show_event_info(&info); match++; } } free(codes); } if (fullname) free(fullname); return match; } static void usage(void) { printf("showevtinfo [-L] [-E] [-h] [-s] [-m mask]\n" "-L\t\tlist one event per line (compact mode)\n" "-E\t\tlist one event per line with encoding (compact mode)\n" "-M\t\tdisplay all valid unit masks combination (use with -L or -E)\n" "-h\t\tget help\n" "-s\t\tsort event by PMU and by code based on -m mask\n" "-l\t\tmaximum number of umasks to list all combinations (default: %d)\n" "-F\t\tshow only events and attributes with certain flags (precise,...)\n" "-m mask\t\thexadecimal event code mask, bits to match when sorting\n" "-x sep\t\tuse sep as field separator in compact mode\n" "-D\t\t\tprint event description in compact mode\n" "-O os\t\tshow attributes for the specific operating system\n", COMBO_MAX); } /* * keep: [pmu::]event * drop everything else */ static void drop_event_attributes(char *str) { char *p; p = strchr(str, ':'); if (!p) return; str = p+1; /* keep PMU name */ if (*str == ':') str++; /* stop string at 1st attribute */ p = strchr(str, ':'); if (p) *p = '\0'; } #define EVENT_FLAGS(n, f, l) { .name = n, .ebit = f, .ubit = l } struct attr_flags { const char *name; int ebit; /* bit position in pfm_event_info_t.flags, -1 means ignore */ int ubit; /* bit position in pfm_event_attr_info_t.flags, -1 means ignore */ }; static const struct attr_flags event_flags[]={ EVENT_FLAGS("precise", 0, 1), EVENT_FLAGS("pebs", 0, 1), EVENT_FLAGS("default", -1, 0), EVENT_FLAGS("dfl", -1, 0), EVENT_FLAGS(NULL, 0, 0) }; static void parse_filters(char *arg) { const struct attr_flags *attr; char *p; while (arg) { p = strchr(arg, ','); if (p) *p++ = 0; for (attr = event_flags; attr->name; attr++) { if (!strcasecmp(attr->name, arg)) { switch(attr->ebit) { case 0: options.efilter.is_precise = 1; break; case -1: break; default: errx(1, "unknown event flag %d", attr->ebit); } switch (attr->ubit) { case 0: options.ufilter.is_dfl = 1; break; case 1: options.ufilter.is_precise = 1; break; case -1: break; default: errx(1, "unknown umaks flag %d", attr->ubit); } break; } } arg = p; } } static const struct { char *name; pfm_os_t os; } supported_oses[]={ { .name = "none", .os = PFM_OS_NONE }, { .name = "raw", .os = PFM_OS_NONE }, { .name = "pmu", .os = PFM_OS_NONE }, { .name = "perf", .os = PFM_OS_PERF_EVENT}, { .name = "perf_ext", .os = PFM_OS_PERF_EVENT_EXT}, { .name = NULL, } }; static const char *pmu_types[]={ "unknown type", "core", "uncore", "OS generic", }; static void setup_os(char *ostr) { int i; for (i = 0; supported_oses[i].name; i++) { if (!strcmp(supported_oses[i].name, ostr)) { options.os = supported_oses[i].os; return; } } fprintf(stderr, "unknown OS layer %s, choose from:", ostr); for (i = 0; supported_oses[i].name; i++) { if (i) fputc(',', stderr); fprintf(stderr, " %s", supported_oses[i].name); } fputc('\n', stderr); exit(1); } int main(int argc, char **argv) { static char *argv_all[2] = { ".*", NULL }; pfm_pmu_info_t pinfo; char *endptr = NULL; char default_sep[2] = "\t"; char *ostr = NULL; char **args; int i, match; regex_t preg; int ret, c; memset(&pinfo, 0, sizeof(pinfo)); pinfo.size = sizeof(pinfo); while ((c=getopt(argc, argv,"hELsm:Ml:F:x:DO:")) != -1) { switch(c) { case 'L': options.compact = 1; break; case 'F': parse_filters(optarg); break; case 'E': options.compact = 1; options.encode = 1; break; case 'M': options.combo = 1; break; case 's': options.sort = 1; break; case 'D': options.desc = 1; break; case 'l': options.combo_lim = atoi(optarg); break; case 'x': options.csv_sep = optarg; break; case 'O': ostr = optarg; break; case 'm': options.mask = strtoull(optarg, &endptr, 16); if (*endptr) errx(1, "mask must be in hexadecimal\n"); break; case 'h': usage(); exit(0); default: errx(1, "unknown option error"); } } ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "cannot initialize libpfm: %s", pfm_strerror(ret)); if (options.mask == 0) options.mask = ~0; if (optind == argc) { args = argv_all; } else { args = argv + optind; } if (!options.csv_sep) options.csv_sep = default_sep; /* avoid combinatorial explosion */ if (options.combo_lim == 0) options.combo_lim = COMBO_MAX; if (ostr) setup_os(ostr); else options.os = PFM_OS_NONE; if (!options.compact) { int total_supported_events = 0; int total_available_events = 0; printf("Supported PMU models:\n"); pfm_for_all_pmus(i) { ret = pfm_get_pmu_info(i, &pinfo); if (ret != PFM_SUCCESS) continue; printf("\t[%d, %s, \"%s\"]\n", i, pinfo.name, pinfo.desc); } printf("Detected PMU models:\n"); pfm_for_all_pmus(i) { ret = pfm_get_pmu_info(i, &pinfo); if (ret != PFM_SUCCESS) continue; if (pinfo.is_present) { if (pinfo.type >= PFM_PMU_TYPE_MAX) pinfo.type = PFM_PMU_TYPE_UNKNOWN; printf("\t[%d, %s, \"%s\", %d events, %d max encoding, %d counters, %s PMU]\n", i, pinfo.name, pinfo.desc, pinfo.nevents, pinfo.max_encoding, pinfo.num_cntrs + pinfo.num_fixed_cntrs, pmu_types[pinfo.type]); total_supported_events += pinfo.nevents; } total_available_events += pinfo.nevents; } printf("Total events: %d available, %d supported\n", total_available_events, total_supported_events); } while(*args) { /* drop umasks and modifiers */ drop_event_attributes(*args); if (regcomp(&preg, *args, REG_ICASE)) errx(1, "error in regular expression for event \"%s\"", *argv); if (options.sort) match = show_info_sorted(*args, &preg); else match = show_info(*args, &preg); if (match == 0) errx(1, "event %s not found", *args); args++; } regfree(&preg); pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/examples/Makefile0000600003276200002170000000425212247131123017543 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk ifeq ($(CONFIG_PFMLIB_ARCH_CRAYXT),y) CFLAGS += -DCONFIG_PFMLIB_ARCH_CRAYXT endif CFLAGS+= -I. -D_GNU_SOURCE LIBS += -lm ifeq ($(SYS),Linux) CFLAGS+= -pthread LIBS += -lrt endif ifeq ($(SYS),WINDOWS) LIBS += -lgnurx endif TARGETS=showevtinfo check_events EXAMPLESDIR=$(DESTDIR)$(DOCDIR)/examples all: $(TARGETS) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done $(TARGETS): %:%.o $(PFMLIB) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS) clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done $(RM) -f *.o $(TARGETS) *~ distclean: clean install_examples: $(TARGETS) @echo installing: $(TARGETS) -mkdir -p $(EXAMPLESDIR) $(INSTALL) -m 755 $(TARGETS) $(EXAMPLESDIR) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done # # examples are installed as part of the RPM install, typically in /usr/share/doc/libpfm-X.Y/ # .PHONY: install depend install_examples papi-5.3.0/src/libpfm4/docs/0000700003276200002170000000000012247131123015210 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/docs/Makefile0000600003276200002170000000541212247131123016654 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk .PHONY: all clean distclean depend ARCH_MAN= ifeq ($(CONFIG_PFMLIB_ARCH_X86),y) ARCH_MAN=libpfm_intel_core.3 \ libpfm_intel_x86_arch.3\ libpfm_amd64.3 \ libpfm_amd64_k7.3 \ libpfm_amd64_k8.3 \ libpfm_amd64_fam10h.3 \ libpfm_intel_atom.3 \ libpfm_intel_nhm.3 \ libpfm_intel_nhm_unc.3 \ libpfm_intel_wsm.3 \ libpfm_intel_wsm_unc.3 \ libpfm_intel_snb.3 \ libpfm_intel_snb_unc.3 \ libpfm_intel_ivb.3 \ libpfm_intel_ivb_unc.3 \ libpfm_intel_hsw.3 \ libpfm_intel_snbep_unc_cbo.3 \ libpfm_intel_snbep_unc_ha.3 \ libpfm_intel_snbep_unc_imc.3 \ libpfm_intel_snbep_unc_pcu.3 \ libpfm_intel_snbep_unc_qpi.3 \ libpfm_intel_snbep_unc_ubo.3 \ libpfm_intel_snbep_unc_r2pcie.3 \ libpfm_intel_snbep_unc_r3qpi.3 \ libpfm_intel_knc.3 ifeq ($(CONFIG_PFMLIB_ARCH_I386),y) ARCH_MAN += libpfm_intel_p6.3 libpfm_intel_coreduo.3 endif endif ifeq ($(CONFIG_PFMLIB_ARCH_ARM),y) ARCH_MAN += libpfm_arm_ac15.3 libpfm_arm_ac8.3 libpfm_arm_ac9.3 endif ifeq ($(CONFIG_PFMLIB_ARCH_MIPS),y) ARCH_MAN += libpfm_mips_74k.3 endif GEN_MAN= libpfm.3 \ pfm_find_event.3 \ pfm_get_event_attr_info.3 \ pfm_get_event_info.3 \ pfm_get_event_encoding.3 \ pfm_get_event_next.3 \ pfm_get_pmu_info.3 \ pfm_get_perf_event_encoding.3 \ pfm_get_os_event_encoding.3 \ pfm_get_version.3 \ pfm_initialize.3 \ pfm_terminate.3 \ pfm_strerror.3 MAN=$(GEN_MAN) $(ARCH_MAN) install: -mkdir -p $(DESTDIR)$(MANDIR)/man3 ( cd man3; $(INSTALL) -m 644 $(MAN) $(DESTDIR)$(MANDIR)/man3 ) papi-5.3.0/src/libpfm4/docs/man3/0000700003276200002170000000000012247131123016046 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/docs/man3/pfm_get_pmu_info.30000600003276200002170000001064212247131123021454 0ustar ralphundrgrad.TH LIBPFM 3 "December, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_pmu_info \- get PMU information .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_pmu_info(pfm_pmu_t " pmu ", pfm_pmu_info_t *" info ");" .sp .SH DESCRIPTION This function returns in \fBinfo\fR information about a PMU model designated by its identifier in \fBpmu\fR. The \fBpfm_pmu_info\fR structure is defined as follows: .nf typedef struct { const char *name; const char *desc; pfm_pmu_t pmu; pfm_pmu_type_t type; int size; int nevents; int first_event; int max_encoding; int num_cntrs; int num_fixed_cntrs; struct { int is_present:1; int is_arch_default:1; int is_core:1; int is_uncore:1; int reserved:28; }; } pfm_pmu_info_t; .fi The fields of this structure are defined as follows: .TP .B name This is the symbolic name of the PMU. This name can be used as a prefix in an event string. This is a read-only string. .TP .B desc This is the description of PMU. This is a read-only string. .TP .B pmu This is the unique PMU identification code. It is identical to the value passed in \fBpmu\fR and it provided only for completeness. .TP .B type This field contains the type of the PMU. The following types are defined: .RS .TP .B PFM_PMU_TYPE_UNKNOWN The type of the PMU could not be determined. .TP .B PFM_PMU_TYPE_CORE This field is set to one when the PMU is implemented by the processor core. .TP .B PFM_PMU_TYPE_UNCORE This field is set to one when the PMU is implemented on the processor die but at the socket level, i.e., capturing events for all cores. .PP .RE .TP .B nevents This is the number of available events for this PMU model based on the host processor. It is \fBonly\fR valid if the \fBis_present\fR field is set to 1. Event identifiers are not guaranteed contiguous. In other words, it is not because \fBnevents\fR is equal to 100, that event identifiers go from 0 to 99. The iterator function \fBpfm_get_event_next()\fR must be used to go from one identifier to the next. .TP .B first_event This field returns the opaque index of the first event for this PMU model. The index can be used with \fBpfm_get_event_info()\fR or \fBpfm_get_event_next()\fR functions. In case no event is available, this field contains \fB-1\fR. .TP .B num_cntrs This field contains the number of generic counters supported by the PMU. A counter is generic if it can count more than one event. When it is not possible to determine the number of generic counters, this field contains \fb-1\fR. .TP .B num_fixed_cntrs This field contains the number of fixed counters supported by the PMU. A counter is fixed if it hardwired to count only one event. When it is not possible to determine the number of generic counters, this field contains \fb-1\fR. .TP .B size This field contains the size of the struct passed. This field is used to provide for extensibility of the struct without compromising backward compatibility. The value should be set to \fBsizeof(pfm_pmu_info_t)\fR. If instead, a value of \fB0\fR is specified, the library assumes the struct passed is identical to the first ABI version which size is \fBPFM_PMU_INFO_ABI0\fR. Thus, if fields were added after the first ABI, they will not be set by the library. The library does check that bytes beyond what is implemented are zeroes. .TP .B max_encoding This field returns the number of event codes returned by \fBpfm_get_event_encoding()\fR. .TP .B is_present This field is set to one is the PMU model has been detected on the host system. .TP .B is_dfl This field is set to one if the PMU is the default PMU for this architecture. Otherwise this field is zero. .PP .SH RETURN If successful, the function returns \fBPFM_SUCCESS\fR and PMU information in \fBinfo\fR, otherwise it returns an error code. .SH ERRORS .TP .B PFMLIB_ERR_NOINIT Library has not been initialized properly. .TP .B PFMLIB_ERR_NOTSUPP PMU model is not supported by the library. .TP .B PFMLIB_ERR_INVAL The \fBpmu\fR argument is invalid or \fBinfo\fR is \fBNULL\fR or \fBsize\fR is not zero. .SH SEE ALSO pfm_get_event_next(3) .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_wsm.30000600003276200002170000000604012247131123021466 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_wsm - support for Intel Westmere core PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: wsm .B PMU desc: Intel Westmere .B PMU name: wsm_dp .B PMU desc: Intel Westmere DP .sp .SH DESCRIPTION The library supports the Intel Westmere core PMU. It should be noted that this PMU model only covers the each core's PMU and not the socket level PMU. It is provided separately. Support is provided for the Intel Core i7 and Core i5 processors (models 37, 44). .SH MODIFIERS The following modifiers are supported on Intel Westmere processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .TP .B ldlat Pass a latency threshold to the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD event. This is an integer attribute that must be in the range [3:65535]. It is required for this event. Note that the event must be used with precise sampling (PEBS). .SH OFFCORE_RESPONSE events The library is able to encode the OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1 events. Those are special events because they, each, need a second MSR (0x1a6 and 0x1a7 respectively) to be programmed for the event to count properly. Thus two values are necessary for each event. The first value can be programmed on any of the generic counters. The second value goes into the dedicated MSR (0x1a6 or 0x1a7). The OFFCORE_RESPONSE events are exposed as normal events with several umasks which are divided in two groups: request and response. The user must provide \fBat least\fR one umask from each group. For instance, OFFCORE_RESPONSE_0:ANY_DATA:LOCAL_DRAM. When using \fBpfm_get_event_encoding()\fR, two 64-bit values are returned. The first value corresponds to what needs to be programmed into any of the generic counters. The second value must be programmed into the corresponding dedicated MSR (0x1a6 or 0x1a7). When using an OS-specific encoding routine, the way the event is encoded is OS specific. Refer to the corresponding man page for more information. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_version.30000600003276200002170000000150012247131123021316 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_version \- get library version .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_version(void)"; .sp .SH DESCRIPTION This function can be called at any time to get the revision level of the library. It is not necessary to have invoked \fBpfm_initialize()\fR prior to calling this function. The revision number is composed of two fields: a major number and a minor number. Both can be extracted using macros provided in the header file: .TP .B PFMLIB_MAJ_VERSION(v) returns the major number encoded in v. .TP .B PFMLIB_MIN_VERSION(v) returns the minor number encoded in v. .SH RETURN The function is always successful, i.e., it always returns the 32-bit version number. .SH ERRORS .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_nhm.30000600003276200002170000000564212247131123021451 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_nhm - support for Intel Nehalem core PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: nhm .B PMU desc: Intel Nehalem .B PMU name: nhm_ex .B PMU desc: Intel Nehalem EX .sp .SH DESCRIPTION The library supports the Intel Nehalem core PMU. It should be noted that this PMU model only covers the each core's PMU and not the socket level PMU. It is provided separately. Support is provided for the Intel Core i7 and Core i5 processors. .SH MODIFIERS The following modifiers are supported on Intel Nehalem processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .TP .B ldlat Pass a latency threshold to the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD event. This is an integer attribute that must be in the range [3:65535]. It is required for this event. Note that the event must be used with precise sampling (PEBS). .SH OFFCORE_RESPONSE event The library is able to encode the OFFCORE_RESPONSE_0 event. This is a special event because it needs a second MSR (0x1a6) to be programmed for the event to count properly. Thus two values are necessary. The first value can be programmed on any of the generic counters. The second value goes into the dedicated MSR (0x1a6). The OFFCORE_RESPONSE event is exposed as a normal event with several umasks which are divided in two groups: request and response. The user must provide \fBat least\fR one umask from each group. For instance, OFFCORE_RESPONSE_0:ANY_DATA:LOCAL_DRAM. When using \fBpfm_get_event_encoding()\fR, two 64-bit values are returned. The first value corresponds to what needs to be programmed into any of the generic counters. The second value must be programmed into the dedicated MSR (0x1a6). When using an OS-specific encoding routine, the way the event is encoded is OS specific. Refer to the corresponding man page for more information. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snb_unc.30000600003276200002170000000361412247131123022313 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snb_unc - support for Intel Sandy Bridge uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snb_unc_cbo0, snb_unc_cbo1, snb_unc_cbo2, snb_unc_cbo3 .B PMU desc: Intel Sandy Bridge C-box uncore .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge client part (model 42) uncore PMU. The support is currently limited to the Coherency Box, so called C-Box for up to 4 physical cores. Each physical core has an associated C-Box which it uses to communictate with the L3 cache. The C-boxes all support the same set of events. However, Core 0 C-box (snb_unc_cbo0) supports an additional uncore clock ticks event: \fBUNC_CLOCKTICKS\fR. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .P Both the \fBUNC_CBO_CACHE_LOOKUP\fR and \fBUNC_CBO_XSNP_RESPONSE\fR requires two umasks to be valid. For \fBUNC_CBO_CACHE_LOOKUP\fR the first umask must be one of the MESI state umasks, the second has to be one of the filters. For \fBUNC_CBO_XSNP_RESPONSE\fR the first umask must be one of the snoop types, the second has to be one of the filters. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_perf_event_encoding.30000600003276200002170000001233512247131123023644 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_perf_event_encoding \- encode event for perf_event API .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_perf_event_encoding(const char *" str ", int " dfl_plm ", struct perf_event_attr *" attr ", char **" fstr ", int *" idx ");" .sp .SH DESCRIPTION This function can be used in conjunction with the perf_events Linux kernel API which provides access to hardware performance counters, kernel software counters and tracepoints. The function takes an event string in \fBstr\fR and a default privilege level mask in \fBdfl_plm\fR and fills out the relevant parts of the perf_events specific data structure in \fBattr\fR. This function is \fBdeprecated\fR. It is superseded by \fBpfm_get_os_event_encoding()\fR with the OS argument set to either \fBPFM_OS_PERF_EVENT\fR or \fBPFM_OS_PERF_EVENT_EXT\fR. Using this function provides extended support for perf_events. Certain perf_event configuration option are only available through this new interface. The following examples illustrates the transition: .nf struct perf_event_attr attr; int i, count = 0; uint64_t *codes; memset(&attr, 0, sizeof(attr)); ret = pfm_get_perf_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, &attrs, NULL, NULL); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); .fi is equivalent to: .nf #include struct perf_event_attr attr; pfm_perf_encode_arg_t arg; memset(&arg, 0, sizeof(arg)); arg.size = sizeof(arg); arg.attr = &attr; ret = pfm_get_os_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, PFM_OS_PERF, &arg); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); .nf The \fBdfl_plm\fR cannot be zero, though it may not necessarily be used by the event. Depending on the event, combination of the following privilege levels may be used: .TP .B PFM_PLM3 Measure at privilege level 3. This usually corresponds to user level. On X86, it corresponds to privilege levels 3, 2, 1. Check the PMU specific man page to verify if this level is supported by your PMU model. .TP .B PFM_PLM2 Measure at privilege level 2. Check the PMU specific man page to verify if this level is supported by your PMU model. .TP .B PFM_PLM1 Measure at privilege level 1. Check the PMU specific man page to verify if this level is supported by your PMU model. .TP .B PFM_PLM0 Measure at privilege level 0. This usually corresponds to kernel level. Check the PMU specific man page to verify if this level is supported by your PMU model. .TP .B PFM_PLMH Measure at hypervisor privilege level. This is used in conjunction with hardware virtualization. Check the PMU specific man page to verify if this level is supported by your PMU model. .PP If \fBfstr\fR is not NULL, the function will make it point to the fully qualified event string, i.e., a string with the event name, all unit masks set, and the value of all modifiers. The library will allocate memory to store the event string but it is the responsibility of the caller to eventually free that string using free(). If \fBidx\fR is not NULL, it returns the corresponding unique event identifier. Only select fields are modified by the function, the others are untouched. The following fields in \fBattr\fR are modified: .TP .B type The type of the event .TP .B config The encoding of the event .TP .B exclude_user Whether or not user level execution should be excluded from monitoring. The definition of user is PMU model specific. .TP .B exclude_kernel Whether or not kernel level execution should be excluded from monitoring. The definition of kernel is PMU model specific. .TP .B exclude_hv Whether or not hypervisor level execution should be excluded from monitoring. The definition of hypervisor is PMU model specific. .PP By default, if no privilege level modifier is specified in the event string, the library clears \fBexclude_user\fR, \fBexclude_kernel\fR and \fBexclude_hv\fR, resulting in the event being measured at all levels subject to hardware support. The function is able to work on only one event at a time. For convenience, it accepts event strings with commas. In that case, it will translate the first event up to the first comma. This is handy in case tools gets passed events as a comma-separated list. .SH RETURN The function returns in \fBattr\fR the perf_event encoding which corresponds to the event string. If \fBidx\fR is not NULL, then it will contain the unique event identifier upon successful return. The value \fBPFM_SUCCESS\fR is returned if successful, otherwise a negative error code is returned. .SH ERRORS .TP .B PFM_ERR_TOOSMALL The \fBcode\fR argument is too small for the encoding. .TP .B PFM_ERR_INVAL The \fBattr\fR argument is \fBNULL\fR. .TP .B PFM_ERR_NOMEM Not enough memory. .TP .B PFM_ERR_NOTFOUND Event not found. .TP .B PFM_ERR_ATTR Invalid event attribute (unit mask or modifier) .TP .B PFM_ERR_ATTR_VAL Invalid modifier value. .TP .B PFM_ERR_ATTR_SET attribute already set, cannot be changed. .TP .B PFM_ERR_ATTR_UMASK Missing unit mask. .TP .B PFM_ERR_ATTR_FEATCOMB Unit masks or features cannot be combined into a single event. .SH AUTHOR Stephane Eranian .SH SEE ALSO pfm_get_os_event_encoding(3) papi-5.3.0/src/libpfm4/docs/man3/libpfm_amd64_k8.30000600003276200002170000000266712247131123021015 0ustar ralphundrgrad.TH LIBPFM 3 "April, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_amd64_k8 - support for AMD64 K8 processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: amd64_k8_revb, amd64_k8_revc, amd64_k8_revd, amd64_k8_reve, amd64_k8_revf, amd64_k8_revg .B PMU desc: AMD64 K8 RevB, AMD64 K8 RevC, AMD64 K8 RevD, AMD64 K8 RevE, AMD64 K8 RevF, AMD64 K8 RevG .sp .SH DESCRIPTION The library supports AMD K8 processors in both 32 and 64-bit modes. They correspond to processor family 15. .SH MODIFIERS The following modifiers are supported on AMD64 K8 processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_event_encoding.30000600003276200002170000001034012247131123022622 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_event_encoding \- get raw event encoding .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_event_encoding(const char *" str ",int " dfl_plm ", char **" fstr ", int *" idx ", uint64_t *"code ", int *" count ");" .sp .SH DESCRIPTION This function is used to retrieve the raw event encoding corresponding to the event string in \fBstr\fR. The string may contain unit masks and modifiers. The default privilege level mask is passed in \fBdfl_plm\fR. It may be used depending on the event. This function is \fBdeprecated\fR. It is superseded by \fBpfm_get_os_event_encoding()\fR where the OS is set to \fBPFM_OS_NONE\fR. Encoding is retrieve through the \fBpfm_pmu_encode_arg_t\fR structure. The following examples illustrates the transition: .nf int i, count = 0; uint64_t *codes; ret = pfm_get_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, NULL, &codes, &count); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); for(i=0; i < count; i++) printf("count[%d]=0x%"PRIx64"\\n", i, codes[i]); .fi is equivalent to: .nf pfm_pmu_encode_arg_t arg; int i; memset(&arg, 0, sizeof(arg)); arg.size = sizeof(arg); ret = pfm_get_os_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, PFM_OS_NONE, &arg); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); for(i=0; i < arg.count; i++) printf("count[%d]=0x%"PRIx64"\\n", i, arg.codes[i]); free(arg.codes); .nf The encoding may take several 64-bit integers. The function can use the array passed in \fBcode\fR if the number of entries passed in \fBcount\fR is big enough. However, if both \fB*codes\fR is \fBNULL\fR and \fBcount\fR is 0, the function allocates the memory necessary to store the encoding. It is up to the caller to eventually free the memory. The number of 64-bit entries in \fBcodes\fR is reflected in \fB*count\fR upon return regardless of whether the \fBcodes\fR was allocated or used as is. If the number of 64-bit integers is greater than one, then the order in which each component is returned is PMU-model specific. Refer to the PMU specific man page. The raw encoding means the encoding as mandated by the underlying PMU model. It may not be directly suitable to pass to a kernel API. You may want to use API-specific library calls to ensure the correct encoding is passed. If \fBfstr\fR is not NULL, it will point to the fully qualified event string upon succesful return. The string contains the event name, any umask set, and the value of all the modifiers. It reflects what the encoding will actually measure. The function allocates the memory to store the string. The caller must eventually free the string. Here is a example of how this function could be used: .nf #include #include #include int main(int argc, char **argv) { uint64_t *codes 0; int count = 0; int ret; ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) err(1", cannot initialize library %s", pfm_strerror(ret)); ret = pfm_get_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, NULL, &codes, &count); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); for(i=0; i < count; i++) printf("count[%d]=0x%"PRIx64"\\n", i, codes[i]); free(codes); return 0; } .fi .SH RETURN The function returns in \fB*codes\fR the encoding of the event and in \fB*count\fR the number of 64-bit integers to support that encoding. Upon success, \fBPFM_SUCCESS\fR is returned otherwise a specific error code is returned. .SH ERRORS .TP .B PFM_ERR_TOOSMALL The \fBcode\fR argument is too small for the encoding. .TP .B PFM_ERR_INVAL The \fBcode\fR or \fBcount\fR argument is \fBNULL\fR. .TP .B PFM_ERR_NOMEM Not enough memory. .TP .B PFM_ERR_NOTFOUND Event not found. .TP .B PFM_ERR_ATTR Invalid event attribute (unit mask or modifier) .TP .B PFM_ERR_ATTR_VAL Invalid modifier value. .TP .B PFM_ERR_ATTR_SET attribute already set, cannot be changed. .TP .B PFM_ERR_ATTR_UMASK Missing unit mask. .TP .B PFM_ERR_ATTR_FEATCOMB Unit masks or features cannot be combined into a single event. .SH AUTHOR Stephane Eranian .SH SEE ALSO pfm_get_os_event_encoding(3) papi-5.3.0/src/libpfm4/docs/man3/pfm_get_event_attr_info.30000600003276200002170000001407512247131123023032 0ustar ralphundrgrad.TH LIBPFM 3 "December, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_event_attr_info \- get event attribute information .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_event_attr_info(int " idx ", int " attr ", pfm_os_t " os ", pfm_event_attr_info_t *" info ");" .sp .SH DESCRIPTION This function returns in \fBinfo\fR information about the attribute designated by \fBattr\fR for the event specified in \fBidx\fR and the os layer in \fBos\fR. The \fBpfm_os_t\fR enumeration provides the following choices: .TP .B PFM_OS_NONE The returned information pertains only to what the PMU hardware exports. No operating system attributes is taken into account. .TP .B PFM_OS_PERF_EVENT The returned information includes the actual PMU hardware and the additional attributes exported by the perf_events kernel interface. The perf_event attributes pertain only the PMU hardware. In case perf_events is not detected, an error is returned. .TP .B PFM_OS_PERF_EVENT_EXT The returned information includes all of what is already provided by \fBPFM_OS_PERF_EVENT\fR plus all the software attributes controlled by perf_events, such as sampling period, precise sampling. .PP The \fBpfm_event_attr_info_t\fR structure is defined as follows: .nf typedef struct { const char *name; const char *desc; const char *equiv; size_t size; uint64_t code; pfm_attr_t type; int idx; pfm_attr_ctrl_t ctrl; int reserved1; struct { int is_dfl:1; int is_precise:1; int reserved:30; }; union { uint64_t dfl_val64; const char *dfl_str; int dfl_bool; int dfl_int; }; } pfm_event_attr_info_t; .fi The fields of this structure are defined as follows: .TP .B name This is the name of the attribute. This is a read-only string. .TP .B desc This is the description of the attribute. This is a read-only string. It may contain multiple sentences. .TP .B equiv Certain attributes may be just variations of other attributes for the same event. They may be provided as handy shortcuts to avoid supplying a long list of attributes. For those attributes, this field is not NULL and contains the complete equivalent attribute string. This string, once appended to the event name, may be used library calls requiring an event string. .TP .B code This is the raw attribute code. For PFM_ATTR_UMASK, this is the unit mask code. For all other attributes, this is an opaque index. .TP .B type This is the type of the attribute. Attributes represent either sub-events or extra filters that can be applied to the event. Filters (also called modifiers) may be tied to the event or the PMU register the event is programmed into. The type of an attribute determines how it must be specified. The following types are defined: .RS .TP .B PFM_ATTR_UMASK This is a unit mask, i.e., a sub-event. It is specified using its name. Depending on the event, it may be possible to specify multiple unit masks. .TP .B PFM_ATTR_MOD_BOOL This is a boolean attribute. It has a value of 0, 1, y or n. The value is specified after the equal sign, e.g., foo=1. As a convenience, the equal sign and value may be omitted, in which case this is equivalent to =1. .TP .B PFM_ATTR_MOD_INTEGER This is an integer attribute. It has a value which must be passed after the equal sign. The range of valid values depends on the attribute and is usually specified in its description. .PP .RE .TP .B idx This is the attribute index. It is identical to the value of \fBattr\fR passed to the call and is provided for completeness. .TP .B size This field contains the size of the struct passed. This field is used to provide for extensibility of the struct without compromising backward compatibility. The value should be set to \fBsizeof(pfm_event_attr_info_t)\fR. If instead, a value of \fB0\fR is specified, the library assumes the struct passed is identical to the first ABI version which size is \fBPFM_ATTR_INFO_ABI0\fR. Thus, if fields were added after the first ABI, they will not be set by the library. The library does check that bytes beyond what is implemented are zeroes. .TP .B is_dfl This field indicates whether or not this attribute is set by default. This applies mostly for PFM_ATTR_UMASK. If a unit mask is marked as default, and no unit mask is specified in the event string, then the library uses it by default. Note that there may be multiple defaults per event depending on how unit masks are grouped. .TP .B is_precise This field indicates whether or not this umask supports precise sampling. Precise sampling is a hardware mechanism that avoids instruction address skid when using interrupt-based sampling. On Intel X86 processors, this field indicates that the umask supports Precise Event-Based Sampling (PEBS). .TP .B dfl_val64, dfl_str, dfl_bool, dfl_int This union contains the value of an attribute. For PFM_ATTR_UMASK, the is the unit mask code, for all other types this is the actual value of the attribute. .TP .B ctrl This field indicates which layer or source controls the attribute. The following sources are defined: .RS .TP .B PFM_ATTR_CTRL_UNKNOWN The source controlling the attribute is not known. .TP .B PFM_ATTR_CTRL_PMU The attribute is controlled by the PMU hardware. .TP .B PFM_ATTR_CTRL_PERF_EVENT The attribute is controlled by the perf_events kernel interface. .RE .TP .B reserved These fields must be set to zero. .PP .SH RETURN If successful, the function returns \fBPFM_SUCCESS\fR and attribute information in \fBinfo\fR, otherwise it returns an error code. .SH ERRORS .TP .B PFMLIB_ERR_NOINIT Library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL The \fBidx\fR or \fBattr\fR arguments are invalid or \fBinfo\fR is \fBNULL\fR or \fBsize\fR is not zero. .TP .B PFM_ERR_NOTSUPP The requested os layer has not been detected on the host system. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_event_next.30000600003276200002170000000446212247131123022022 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_event_next \- iterate over events .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_event_next(int "idx ");" .sp .SH DESCRIPTION Events are uniquely identified with opaque integer identifiers. There is no guaranteed order within identifiers. Thus, to list all the events, it is necessary to use iterators. Events are grouped in tables within the libary. A table usually corresponds to a PMU model or family. The library contains support for multiple PMU models, thus it has multiple tables. Based on the host hardware and software environments, tables get activated when the library is initialized via \fBpfm_initialize()\fR. Events from activated tables are called active events. Events from non-activated tables are called supported events. Event identifiers are usually retrieved via \fBpfm_find_event()\fR or when encoding events. To iterate over a list of events for a given PMU model, all that is needed is an initial identifier for the PMU. The first event identifier is usually obainted via \fBpfm_get_pmu_info()\fR. The \fBpfm_get_event_next()\fR function returns the identifier of next supported event after the one passed in \fBidx\fR. This iterator stops when the last event for the PMU is passed as argument, in which case the function returns -1. .sp .nf void list_pmu_events(pfm_pmu_t pmu) { struct pfm_event_info info; struct pfm_pmu_info pinfo; int i, ret; memset(&info, 0, sizeof(info)); memset(&pinfo, 0, sizeof(pinfo)); info.size = sizeof(info); pinfo.size = sizeof(pinfo); ret = pfm_get_pmu_info(pmu, &pinfo); if (ret != PFM_SUCCESS) errx(1, "cannot get pmu info"); for (i = pinfo.first_event; i != -1; i = pfm_get_event_next(i)) { ret = pfm_get_event_info(i, &info); if (ret != PFM_SUCCESS) errx(1, "cannot get event info"); printf("%s Event: %s::%s\\n", pinfo.present ? "Active" : "Supported", pinfo.name, info.name); } } .fi .SH RETURN The function returns the identifier of the next supported event. It returns -1 when the argument is already the last event for the PMU. .SH ERRORS No error code, besides -1, is returned by this function. .SH SEE ALSO pfm_find_event(3) .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_p6.30000600003276200002170000000265212247131123021212 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_p6 - support for Intel P5 based processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: pm, ppro, pii, piii, p6 .B PMU desc: Intel Pentium M, Intel Pentium Pro, Intel Pentium II, Intel Pentium III, Intel P6 .sp .SH DESCRIPTION The library supports all Intel P6-based processors all the way back to the Pentium Pro. Although all those processors offers the same PMU architecture, they differ in the events they provide. .SH MODIFIERS The following modifiers are supported on all Intel P6 processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_x86_arch.30000600003276200002170000000336612247131123022312 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_x86_arch - support for Intel X86 architectural PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: ix86arch .B PMU desc: Intel X86 architectural PMU .sp .SH DESCRIPTION The library supports \fbany\fR processor implementing the Intel architectural PMU. This is a minimal PMU with a variable number of counters but predefined set of events. It is implemented in all recent processors starting with Intel Core Duo/Core Solo. It acts as a default PMU support in case the library is run on a very recent processor for which the specific support has not yet been implemented. .SH MODIFIERS The following modifiers are supported on Intel architectural PMU: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This modifier requires at least version 3 of the architectural PMU. This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_terminate.30000600003276200002170000000104112247131123020762 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_terminate \- free resources used by library .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_terminate(void);" .sp .SH DESCRIPTION This is the last function that a program \fBmust\fR call to free all the resources allocated by the library, e.g., memory. The function is not reentrant, caller must ensure only one thread at a time is executing it. .SH RETURN There is no return value to this function .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm.30000600003276200002170000001273112247131123017411 0ustar ralphundrgrad.TH LIBPFM 3 "May, 2010" "" "Linux Programmer's Manual" .SH NAME libpfm \- a helper library to develop monitoring tools .SH SYNOPSIS .nf .B #include .SH DESCRIPTION This is a helper library used by applications to program specific performance monitoring events. Those events are typically provided by the hardware or the OS kernel. The most common hardware events are provided by the Performance Monitoring Unit (PMU) of modern processors. They can measure elapsed cycles or the number of cache misses. Software events usually count kernel events such as the number of context switches, or pages faults. The library groups events based on which source is providing them. The term PMU is generalized to any event source, not just hardware sources. The library supports hardware performance events from most common processors, each group under a specific PMU name, such as Intel Core, IBM Power 6. Programming events is usually done through a kernel API, such as Oprofile, perfmon, perfctr, or perf_events on Linux. The library provides support for perf_events which is available in the Linux kernel as of v2.6.31. Perf_events supports selected PMU models and several software events. At its core, the library provides a simple translation service, whereby a user specifies an event to measure as a string and the library returns the parameters needed to invoke the kernel API. It is important to realize that the library does \fBnot\fR make the system call to program the event. \fBNote:\fR You must first call \fBpfm_initialize()\fR in order to use any of the other provided functions in the library. A first part of the library provides an event listing and query interface. This can be used to discover the events available on a specific hardware platform. The second part of the library provides a set of functions to obtain event encodings form event strings. Event encoding depends primarily on the underlying hardware but also on the kernel API. The library offers a generic API to address the first situation but it also provides entry points for specific kernel APIs such as perf_events. In that case, it is able to prepare the data structure which must be passed to the kernel to program a specific event. .SH EVENT DETECTION When the library is initialized via \fBpfm_initialize()\fR, it first detects the underlying hardware and software configuration. Based on this information it enables certain PMU support. Multiple events tables may be activated. It is possible to force activation of a specific PMU (group of events) using an environment variable. .SH EVENT STRINGS Events are expressed as strings. Those string are structured and may contain several components depending on the type of event and the underlying hardware. String parsing is always case insensitive. The string structure is defined as follows: .sp .ce .B [pmu::][event_name][:unit_mask][:modifier|:modifier=val] The components are defined as follows: .TP .B pmu Optional name of the PMU (group of events) to which the event belongs to. This is useful to disambiguate events in case events from difference sources have the same name. If not specified, the first match is used. .TP .B event_name The name of the event. It must be the complete name, partial matches are not accepted. This component is required. .TP .B unit_mask This designate an optional sub-events. Some events can be refined using sub-events. Event may have multiple unit masks and it may or may be possible to combine them. If more than one unit masks needs to be passed, then the [:unit_mask] pattern can be repeated. .TP .B modifier A modifier is an optional filter which modifies how the event counts. Modifiers have a type and a value. The value is specified after the equal sign. No space is allowed. In case of boolean modifiers, it is possible to omit the value true (1). The presence of the modifier is interpreted as meaning true. Events may support multiple modifiers, in which case the [:modifier|:modifier=val] pattern can be repeated. The is no ordering constraint between modifier and unit masks. Modifiers may be specified before unit masks and vice-versa. .SH ENVIRONMENT VARIABLES It is possible to enable certain debug features of the library using environment variables. The following variables are defined: .TP .B LIBPFM_VERBOSE Enable verbose output. Value must be 0 or 1. .TP .B LIBPFM_DEBUG Enable debug output. Value must be 0 or 1 .TP .B LIBPFM_DEBUG_STDOUT Redirect verbose and debug output to the standard output file descriptor (stdout). By default, the output is directed to the standard error file descriptor (stderr). .TP .B LIBPFM_FORCE_PMU Force a specific PMU model to be activated. In this mode, only that one model is activated. The value of the variable must be the PMU name as returned by the \fBpfm_get_pmu_name()\fR function. Note for some PMU models, it may be possible to specify additional options, such as specific processor models or stepping. Additional parameters necessarily appears after a comma. For instance, LIBPFM_FORCE_PMU=amd64,16,2,1. .TP .B LIBPFM_ENCODE_INACTIVE Set this variable to 1 to enable encoding of events for non detected, but supported, PMUs models. .SH AUTHORS .nf Stephane Eranian Robert Richter .fi .SH SEE ALSO libpfm_amd64_k7(3), libpfm_amd64_k8(3), libpfm_amd64_fam10h(3), libpfm_intel_core(3), libpfm_intel_atom(3), libpfm_intel_p6(3), libpfm_intel_nhm(3), libpfm_intel_nhm_unc(3), pfm_get_perf_event_encoding(3), pfm_initialize(3) .sp Some examples are shipped with the library papi-5.3.0/src/libpfm4/docs/man3/libpfm_mips_74k.30000600003276200002170000000363012247131123021124 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2011" "" "Linux Programmer's Manual" .SH NAME libpfm_mips_74k - support for MIPS 74k processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: mips_74k .B PMU desc: MIPS 74k .sp .SH DESCRIPTION The library supports MIPS 74k processors in big or little endian modes. .SH ENCODINGS On this processor, what is measured by an event depends on the event code and on the counter it is programmed on. Usually the meaning of the event code changes between odd and even indexed counters. For instance, event code \fB0x2\fR means 'PREDICTED_JR31' when programmed on even-indexed counters and it means 'JR_31_MISPREDICTIONS' when programmed on odd-indexed counters. To correctly measure an event, one needs both the event encoding and a list of possible counters. When \fRpfm_get_os_event_encoding()\fR is used with \fBPFM_OS_NONE\fR to return the raw PMU encoding, the library returns two values: the event encoding as per the architecture manual and a bitmask of valid counters to program it on. For instance, for 'JR_31_MISPREDICTIONS' The library returns codes[0] = 0x4a, codes[1]= 0xa (supported on counter 1, 3). The encoding for a specific kernel interface may vary and is handled internally by the library. .SH MODIFIERS The following modifiers are supported on MIPS 74k. .TP .B u Measure at user level. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B e Measure at exception level. This corresponds to \fBPFM_PLM2\fR. This is a boolean modifier. .TP .B s Measure at supervisor level. This corresponds to \fBPFM_PLM1\fR. This is a boolean modifier. It should be noted that those modifiers are available for encoding as raw mode with \fBPFM_OS_NONE\fR but they may not all be present with specific kernel interfaces. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_os_event_encoding.30000600003276200002170000001633212247131123023332 0ustar ralphundrgrad.TH LIBPFM 3 "January, 2011" "" "Linux Programmer's Manual" .SH NAME pfm_get_os_event_encoding \- get event encoding for a specific operating system .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_os_event_encoding(const char *" str ", int " dfl_plm ", pfm_os_t " os ", void *" arg ");" .sp .SH DESCRIPTION This is the key function to retrieve the encoding of an event for a specific operating system interface. The event string passed in \fBstr\fR is parsed and encoded for the operating system specified by \fBos\fR. The event is encoded to monitor at the privilege levels specified by the \fBdfl_plm\fR mask, if supported, otherwise this parameter is ignored. The operating system specific input and output arguments are passed in \fBarg\fR. The event string, \fBstr\fR, may contains sub-event masks (umask) and any other supported modifiers. Only one event is parsed from the string. For convenience, it is possible to pass a comma-separated list of events in \fBstr\fR but only the first event is encoded. The following values are supported for \fBos\fR: .TP .B PFM_OS_NONE This value causes the event to be encoded purely as specified by the PMU hardware. The \fBarg\fR argument must be a pointer to a \fBpfm_raw_pmu_encode_arg_t\fR structure which is defined as follows: .nf typedef struct { uint64_t *codes; char **fstr; size_t size; int count; int idx; } pfm_pmu_encode_arg_t; .fi The fields are defined as follows: .RS .TP .B codes A pointer to an array of 64-bit values. On input, if \fBcodes\fR is NULL, then the library allocates whatever is necessary to store the encoding of the event. If \fBcodes\fR is not NULL on input, then \fBcount\fR must reflect its actual number of elements. If \fBcount\fR is big enough, the library stores the encoding at the address provided. Otherwise, an error is returned. .TP .B count On input, the field contains the maximum number of elements in the array \fBcodes\fR. Upon return, it contains the number of actual entries in \fBcodes\fR. If \fBcodes\fR is NULL, then count must be zero. .TP .B fstr If the caller is interested in retrieving the fully qualified event string where all used unit masks and all modifiers are spelled out, this field must be set to a non-null address of a pointer to a string (char **). Upon return, if \fBfstr\fR was not NULL, then the string pointer passed on entry points to the event string. The string is dynamically allocated and needs to eventually be freed. If \fBfstr\fR was NULL on entry, then nothing is returned in this field. The typical calling sequence looks as follows: .nf char *fstr = NULL pfm_pmu_encode_arg_t arg; arg.fstr = &fstr; ret = pfm_get_os_event_encoding("event", PFM_PLM0|PFM_PLM3, PFM_OS_NONE, &e); if (ret == PFM_SUCCESS) { printf("fstr=%s\n", fstr); free(fstr); } .fi .TP .B size This field contains the size of the struct passed. This field is used to provide for extensibility of the struct without compromising backward compatibility. The value should be set to \fBsizeof(pfm_pmu_encode_arg_t)\fR. If instead, a value of \fB0\fR is specified, the library assumes the struct passed is identical to the first ABI version which size is \fBPFM_RAW_ENCODE_ABI0\fR. Thus, if fields were added after the first ABI, they will not be set by the library. The library does check that bytes beyond what is implemented are zeroes. .TP .B idx Upon return, this field contains the opaque unique identifier for the event described in \fBstr\fR. This index can be used to retrieve information about the event using \fBpfm_get_event_info()\fR, for instance. .RE .TP .B PERF_OS_PERF_EVENT, PERF_OS_PERF_EVENT_EXT This value causes the event to be encoded for the perf_event Linux kernel interface (available since 2.6.31). The \fBarg\fR must be a pointer to a \fBpfm_perf_encode_arg_t\fR structure. The PERF_OS_PERF_EVENT layer provides the modifiers exported by the underlying PMU hardware, some of which may actually be overridden by the perf_event interface, such as the monitoring privilege levels. The \fBPERF_OS_PERF_EVENT_EXT\fR extends \fBPERF_OS_EVENT\fR to add modifiers controlled only by the perf_event interface, such as sampling period (\fBperiod\fR), frequency (\fBfreq\fR) and exclusive resource access (\fBexcl\fR). .nf typedef struct { struct perf_event_attr *attr; char **fstr; size_t size; int idx; int cpu; int flags; } pfm_perf_encode_arg_t; .fi The fields are defined as follows: .RS .TP .B attr A pointer to a struct perf_event_attr as defined in perf_event.h. This field cannot be NULL on entry. The struct is not completely overwritten by the call. The library only modifies the fields it knows about, thereby allowing perf_event ABI mismatch between caller and library. .TP .B fstr Same behavior as is described for PFM_OS_NONE above. .TP .B size This field contains the size of the struct passed. This field is used to provide for extensibility of the struct without compromising backward compatibility. The value should be set to \fBsizeof(pfm_perf_encode_arg_t)\fR. If instead, a value of \fB0\fR is specified, the library assumes the struct passed is identical to the first ABI version which size is \fBPFM_PERF_ENCODE_ABI0\fR. Thus, if fields were added after the first ABI, they will not be set by the library. The library does check that bytes beyond what is implemented are zeroes. .TP .B idx Upon return, this field contains the opaque unique identifier for the event described in \fBstr\fR. This index can be used to retrieve information about the event using \fBpfm_get_event_info()\fR, for instance. .TP .B cpu Not used yet. .TP .B flags Not used yet. .RE .PP Here is a example of how this function could be used with PERF_OS_NONE: .nf #include #include #include int main(int argc, char **argv) { pfm_raw_pmu_encode_t raw; int ret; ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) errx(1, "cannot initialize library %s", pfm_strerror(ret)); memset(&raw, 0, sizeof(raw)); ret = pfm_get_os_event_encoding("RETIRED_INSTRUCTIONS", PFM_PLM3, PFM_OS_NONE, &raw); if (ret != PFM_SUCCESS) err(1", cannot get encoding %s", pfm_strerror(ret)); for(i=0; i < raw.count; i++) printf("count[%d]=0x%"PRIx64"\\n", i, raw.codes[i]); free(raw.codes); return 0; } .fi .SH RETURN The function returns in \fBarg\fR the encoding of the event for the os passed in \fBos\fR. The content of \fBarg\fR depends on the \fBos\fR argument. Upon success, \fBPFM_SUCCESS\fR is returned otherwise a specific error code is returned. .SH ERRORS .TP .B PFM_ERR_TOOSMALL The \fBcode\fR argument is too small for the encoding. .TP .B PFM_ERR_INVAL The \fBcode\fR or \fBcount\fR argument is \fBNULL\fR. .TP .B PFM_ERR_NOMEM Not enough memory. .TP .B PFM_ERR_NOTFOUND Event not found. .TP .B PFM_ERR_ATTR Invalid event attribute (unit mask or modifier) .TP .B PFM_ERR_ATTR_VAL Invalid modifier value. .TP .B PFM_ERR_ATTR_SET attribute already set, cannot be changed. .TP .B PFM_ERR_ATTR_UMASK Missing unit mask. .TP .B PFM_ERR_ATTR_FEATCOMB Unit masks or features cannot be combined into a single event. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_r3qpi.30000600003276200002170000000245112247131123023754 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_r3qpi - support for Intel Sandy Bridge-EP R3QPI uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_r3qpi0, snbep_unc_r3qpi0 .B PMU desc: Intel Sandy Bridge-EP R3QPI uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge R3QPI uncore PMU. This PMU model only exists on Sandy Bridge model 45. There are two R3QPI PMUs per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge R3PQI uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count R3QPI cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of R3QPI cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:15]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_ha.30000600003276200002170000000243712247131123023312 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_ha - support for Intel Sandy Bridge-EP Home Agent (HA) uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_ha .B PMU desc: Intel Sandy Bridge-EP HA uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge Home Agent (HA) uncore PMU. This PMU model only exists on Sandy Bridge model 45. There is only one Home Agent per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count HA cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of HA cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_get_event_info.30000600003276200002170000001055412247131123021776 0ustar ralphundrgrad.TH LIBPFM 3 "December, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_get_event_info \- get event information .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_event_info(int " idx ", pfm_os_t " os ", pfm_event_info_t *" info ");" .sp .SH DESCRIPTION This function returns in \fBinfo\fR information about a specific event designated by its opaque unique identifier in \fBidx\fR for the operating system specified in \fBos\fR. The \fBpfm_event_info_t\fR structure is defined as follows: .nf typedef struct { const char *name; const char *desc; const char *equiv; size_t size; uint64_t code; pfm_pmu_t pmu; pfm_dtype_t dtype int idx; int nattrs; struct { unsigned int is_precise:1; unsigned int reserved_bits:31; }; } pfm_event_info_t; .fi The fields of this structure are defined as follows: .TP .B name This is the name of the event. This is a read-only string. .TP .B desc This is the description of the event. This is a read-only string. It may contain multiple sentences. .TP .B equiv Certain events may be just variations of actual events. They may be provided as handy shortcuts to avoid supplying a long list of attributes. For those events, this field is not NULL and contains the complete equivalent event string. .TP .B code This is the raw event code. It should not be confused with the encoding of the event. This field represents only the event selection code, it does not include any unit mask or attribute settings. .TP .B pmu This is the identification of the PMU model this event belongs to. It is of type \fBpfm_pmu_t\fR. Using this value and the \fBpfm_get_pmu_info\fR function, it is possible to get PMU information. .TP .B dtype This field returns the representation of the event data. By default, it is \fBPFM_DATA_UINT64\fR. .B idx This is the event unique opaque identifier. It is identical to the idx passed to the call and is provided for completeness. .TP .B nattrs This is the number of attributes supported by this event. Attributes may be unit masks or modifiers. If the event has not attribute, then the value of this field is simply 0. .TP .B size This field contains the size of the struct passed. This field is used to provide for extensibility of the struct without compromising backward compatibility. The value should be set to \fBsizeof(pfm_event_info_t)\fR. If instead, a value of \fB0\fR is specified, the library assumes the struct passed is identical to the first ABI version which size is \fBPFM_EVENT_INFO_ABI0\fR. Thus, if fields were added after the first ABI, they will not be set by the library. The library does check that bytes beyond what is implemented are zeroes. .TP .B is_precise This bitfield indicates whether or not the event support precise sampling. Precise sampling is a hardware mechanism that avoids instruction address skid when using interrupt-based sampling. When the event has umasks, this field means that at least one umask supports precise sampling. On Intel X86 processors, this indicates whether the event supports Precise Event-Based Sampling (PEBS). .PP The \fBpfm_os_t\fR enumeration provides the following choices: .TP .B PFM_OS_NONE The returned information pertains only to what the PMU hardware exports. No operating system attributes is taken into account. .TP .B PFM_OS_PERF_EVENT The returned information includes the actual PMU hardware and the additional attributes exported by the perf_events kernel interface. The perf_event attributes pertain only the PMU hardware. In case perf_events is not detected, an error is returned. .TP .B PFM_OS_PERF_EVENT_EXT The returned information includes all of what is already provided by \fBPFM_OS_PERF_EVENT\fR plus all the software attributes controlled by perf_events, such as sampling period, precise sampling. .PP .SH RETURN If successful, the function returns \fBPFM_SUCCESS\fR and event information in \fBinfo\fR, otherwise it returns an error code. .SH ERRORS .TP .B PFMLIB_ERR_NOINIT Library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL The \fBidx\fR argument is invalid or \fBinfo\fR is \fBNULL\fR or \fBsize\fR is not zero. .TP .B PFMLIB_ERR_NOTSUPP The requested \fBos\fR is not detected or supported. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_strerror.30000600003276200002170000000224412247131123020662 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_strerror \- return constant string describing error code .SH SYNOPSIS .nf .B #include .sp .BI "const char *pfm_strerror(int "code); .sp .SH DESCRIPTION This function returns a string which describes the libpfm error value in \fBcode\fR. The string returned by the call is \fBread-only\fR. The function must \fBonly\fR be used with libpfm calls documented to return specific error codes. The value -1 is not considered a specific error code. Strings and \fBpfm_pmu_t\fR return values cannot be used with this function. Typically \fBNULL\fR is returned in case of error for string values, and \fBPFM_PMU_NONE\fR is returned for \fBpfm_pmu_t\fR values. The function is also not designed to handle OS system call errors, i.e., errno values. .SH RETURN The function returns a pointer to the constant string describing the error code. The string is in English. If code is invalid then a default error message is returned. .SH ERRORS If the error code is invalid, then the function returns a pointer to a string which says "unknown error code". .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_coreduo.30000600003276200002170000000274212247131123022325 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_coreduo - support for Intel Core Duo/Solo processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: coreduo .B PMU desc: Intel Core Duo .sp .SH DESCRIPTION The library supports all Intel Yonah-based processors such as Intel Core Duo and Intel Core Solo processors. .SH MODIFIERS The following modifiers are supported on Intel Core Duo processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH ENVIRONMENT VARIABLES It is possible to force activation of the Intel Core Duo support using the \fBLIBPFM_FORCE_PMU\fR variable. The PMU name, coreduo, must be passed. No additional options are supported. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_cbo.30000600003276200002170000000643512247131123023467 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_cbo - support for Intel Sandy Bridge-EP C-Box uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_cbo[0-7] .B PMU desc: Intel Sandy Bridge-EP C-Box uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge C-Box (coherency engine) uncore PMU. This PMU model only exists on Sandy Bridge model 45. There is one C-box PMU per physical core. Therefore there are eight identical C-Box PMU instances numbered frmo 0 to 7. On dual-socket systems, the number refers to the C-Box PMU on the socket where the program runs. For instance, if running on CPU8, then snbep_unc_cbo0 refers to the C-Box for physical core 0 on socket 1. Conversely, if running on CPU0, then the same snbep_unc_cbo0 refers to the C-Box for physical core 0 but on socket 0. Each C-Box PMU implements 4 generic counters and a filter register used only with certain events and umasks. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count C-Box cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of C-Box cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B nf Node filter. Certain events, such as UNC_C_LLC_LOOKUP, UNC_C_LLC_VICTIMS, provide a \fBNID\fR umask. Sometimes the \fBNID\fR is combined with other filtering capabilities, such as opcodes. The node filter is an 8-bit max bitmask. A node corresponds to a processor socket. The legal values therefore depdend on the underlying hardware configuration. For dual-socket systems, the bitmask has two valid bits [0:1]. .TP .B cf Core Filter. This is a 3-bit filter which is used to filter based on phyiscal core origin of the C-Box request. Possible values are 0-7. If the filter is not specified, then no filtering takes place. .TP .B tf Thread Filter. This is a 1-bit filter which is used to filter C-Box requests based on logical processor (hyper-thread) identification. Possibles values are 0-1. If the filter is not specified, then no filtering takes place. .SH Opcode filtering Certain events, such as UNC_C_TOR_INSERTS supports opcode matching on the C-BOX transaction type. To use this feature, first an opcode matching umask must be selected, e.g., MISS_OPCODE. Second, the opcode to match on must be selected via a second umasks amongs the OPC_* umasks. For instance, UNC_C_TOR_INSERTS:OPCODE:OPC_RFO, counts the number of TOR insertions for RFO transactions. Opcode matching may be combined with node filtering with certain umasks. In general the filtering support is encoded into the umask name, e.g., NID_OPCODE supports both node and opcode filtering. For instance, UNC_C_TOR_INSERTS:NID_OPCODE:OPC_RFO:nf=1. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_knc.30000600003276200002170000000250412247131123021434 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_knc - support for Intel Knights Corner .SH SYNOPSIS .nf .B #include .sp .B PMU name: knc .B PMU desc: Intel Knights Corner .sp .SH DESCRIPTION The library supports Intel Knights Corner processors. .SH MODIFIERS The following modifiers are supported on Intel Knights Corner processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on all threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_ivb_unc.30000600003276200002170000000360212247131123022306 0ustar ralphundrgrad.TH LIBPFM 3 "June, 2013" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_ivb_unc - support for Intel Ivy Bridge uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: ivb_unc_cbo0, ivb_unc_cbo1, ivb_unc_cbo2, ivb_unc_cbo3 .B PMU desc: Intel Ivy Bridge C-box uncore .sp .SH DESCRIPTION The library supports the Intel Ivy Bridge client part (model 58) uncore PMU. The support is currently limited to the Coherency Box, so called C-Box for up to 4 physical cores. Each physical core has an associated C-Box which it uses to communictate with the L3 cache. The C-boxes all support the same set of events. However, Core 0 C-box (snb_unc_cbo0) supports an additional uncore clock ticks event: \fBUNC_CLOCKTICKS\fR. .SH MODIFIERS The following modifiers are supported on Intel Ivy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .P Both the \fBUNC_CBO_CACHE_LOOKUP\fR and \fBUNC_CBO_XSNP_RESPONSE\fR requires two umasks to be valid. For \fBUNC_CBO_CACHE_LOOKUP\fR the first umask must be one of the MESI state umasks, the second has to be one of the filters. For \fBUNC_CBO_XSNP_RESPONSE\fR the first umask must be one of the snoop types, the second has to be one of the filters. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_arm_ac15.30000600003276200002170000000143312247131123021056 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_arm_ac15 - support for Arm Cortex A15 PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: arm_ac15 .B PMU desc: ARM Cortex A15 .sp .SH DESCRIPTION The library supports the ARM Cortex A15 core PMU. This PMU supports 6 counters and privilege levels filtering. .SH MODIFIERS The following modifiers are supported on ARM Cortex A15: .TP .B u Measure at the user level. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at the kernel level. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B hv Measure at the hypervisor level. This corresponds to \fBPFM_PLMH\fR. This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_imc.30000600003276200002170000000250112247131123023462 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_imc - support for Intel Sandy Bridge-EP Integrated Memory Controller (IMC) uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_imc[0-3] .B PMU desc: Intel Sandy Bridge-EP IMC uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge Integrated Memory Controller (IMC) uncore PMU. This PMU model only exists on Sandy Bridge model 45. There are four IMC PMUs per socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count IMC cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of IMC cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_ubo.30000600003276200002170000000533512247131123023507 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_ubo - support for Intel Sandy Bridge-EP U-Box uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_ubo .B PMU desc: Intel Sandy Bridge-EP U-Box uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge system configuration unit (U-Box) uncore PMU. This PMU model only exists on Sandy Bridge model 45. There is only one U-Box PMU per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge U-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count HA cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of HA cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:15]. .TP .B oi Invert the meaning of the occupancy event POWER_STATE_OCCUPANCY. The counter will now count PCU cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B oe Enable edge detection for the occupancy event POWER_STATE_OCCUPANCY. The event now counts only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B ff Enable frequency band filtering. This modifier applies only to the UNC_P_FREQ_BANDx_CYCLES events, where x is [0-3]. The modifiers expects an integer in the range [0-255]. The value is interpreted as a frequency value to be multipled by 100Mhz. Thus if the value is 32, then all cycles where the processor is running at 3.2GHz and more are counted. .SH Frequency band filtering There are 3 events which support frequency band filtering, namely, UNC_P_FREQ_BAND0_CYCLES, UNC_P_FREQ_BAND1_CYCLES, UNC_P_FREQ_BAND2_CYCLES, UNC_P_FREQ_BAND3_CYCLES. The frequency filter (available via the ff modifier) is stored into a PMU shared register which hold all 4 possible frequency bands, one per event. However, the library generate the encoding for each event individually because it processes events one at a time. The caller or the underlying kernel interface may have to merge the band filter settings to program the filter register properly. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_pcu.30000600003276200002170000000434512247131123023511 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_pcu - support for Intel Sandy Bridge-EP Power Controller Unit (PCU) uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_pcu .B PMU desc: Intel Sandy Bridge-EP PCU uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge Power Controller Unit uncore PMU. This PMU model only exists on Sandy Bridge model 45. There is only one PCU PMU per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge C-Box uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count HA cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of HA cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:15]. .TP .B ff Enable frequency band filtering. This modifier applies only to the UNC_P_FREQ_BANDx_CYCLES events, where x is [0-3]. The modifiers expects an integer in the range [0-255]. The value is interpreted as a frequency value to be multipled by 100Mhz. Thus if the value is 32, then all cycles where the processor is running at 3.2GHz and more are counted. .SH Frequency band filtering There are 3 events which support frequency band filtering, namely, UNC_P_FREQ_BAND0_CYCLES, UNC_P_FREQ_BAND1_CYCLES, UNC_P_FREQ_BAND2_CYCLES, UNC_P_FREQ_BAND3_CYCLES. The frequency filter (available via the ff modifier) is stored into a PMU shared register which hold all 4 possible frequency bands, one per event. However, the library generate the encoding for each event individually because it processes events one at a time. The caller or the underlying kernel interface may have to merge the band filter settings to program the filter register properly. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snb.30000600003276200002170000000747412247131123021456 0ustar ralphundrgrad.TH LIBPFM 3 "January, 2011" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snb - support for Intel Sandy Bridge core PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snb .B PMU desc: Intel Sandy Bridge .B PMU name: snb_ep .B PMU desc: Intel Sandy Bridge EP .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge core PMU. It should be noted that this PMU model only covers each core's PMU and not the socket level PMU. For that refer to the Sandy Bridge uncore PMU support. On Sandy Bridge, the number of generic counters depends on the Hyperthreading (HT) mode. When HT is on, then only 4 generic counters are available. When HT is off, then 8 generic counters are available. The \fBpfm_get_pmu_info()\fR function returns the maximum number of generic counters in \fBnum_cntrs\fr. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .TP .B ldlat Pass a latency threshold to the MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD event. This is an integer attribute that must be in the range [3:65535]. It is required for this event. Note that the event must be used with precise sampling (PEBS). .SH OFFCORE_RESPONSE events Intel Sandy Bridge provides two offcore_response events, like Intel Westmere. They are called OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1. Those events need special treatment in the performance monitoring infrastructure because each event uses an extra register to store some settings. Thus, in case multiple offcore_response events are monitored simultaneously, the kernel needs to manage the sharing of that extra register. The offcore_response events are exposed as a normal events by the library. The extra settings are exposed as regular umasks. The library takes care of encoding the events according to the underlying kernel interface. On Intel Sandy Bridge, the umasks are divided into three categories: request, supplier and snoop. The user must provide at least one umask for each category. The categories are shown in the umask descriptions. There is also the special response umask called \fBANY_RESPONSE\fR. When this umask is used then it overrides any supplier and snoop umasks. In other words, users can specify either \fBANY_RESPONSE\fR \fBOR\fR any combinations of supplier + snoops. In case no supplier or snoop is specified, the library defaults to using \fBANY_RESPONSE\fR. For instance, the following are valid event selections: .TP .B OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_REQUEST .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY .P But the following are illegal: .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY:ANY_RESPONSE .SH SEE ALSO libpfm_snb_unc(3) .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_wsm_unc.30000600003276200002170000000246512247131123022342 0ustar ralphundrgrad.TH LIBPFM 3 "February, 2010" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_wsm_unc \- support for Intel Westmere uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: wsm_unc .B PMU desc: Intel Westmere uncore .sp .SH DESCRIPTION The library supports the Intel Westmere uncore PMU as implemented by processors such as Intel Core i7, and Intel Core i5 (models 37, 44). The PMU is located at the socket-level and is therefore shared between the various cores. By construction it can only measure at all privilege levels. .SH MODIFIERS The following modifiers are supported on Intel Westmere processors: .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B o Causes the queue occupancy counter associated with the event to be cleared (zeroed). This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_qpi.30000600003276200002170000000243312247131123023507 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_qpi - support for Intel Sandy Bridge-EP QPI uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_qpi0, snbep_unc_qpi1 .B PMU desc: Intel Sandy Bridge-EP QPI uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge Power QPI uncore PMU. This PMU model only exists on Sandy Bridge model 45. There are two QPI PMUs per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge QPI uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count QPI cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of QPI cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_initialize.30000600003276200002170000000147612247131123021147 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_initialize \- initialize library .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_initialize(void);" .sp .SH DESCRIPTION This is the first function that a program \fBmust\fR call otherwise the library will not function at all. This function probes the underlying hardware looking for valid PMUs event tables to activate. Multiple distinct PMU tables may be activated at the same time. The function must be called only once. .SH RETURN The function returns whether or not it was successful, i.e., at least one PMU was activated. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is an error code. .SH ERRORS .TP .B PFMLIB_ERR_NOTSUPP No PMU was activated. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_core.30000600003276200002170000000261112247131123021610 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_core - support for Intel Core-based processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: core .B PMU desc: Intel Core .sp .SH DESCRIPTION The library supports all Intel Core-based processors that includes models 15, 23, 29. .SH MODIFIERS The following modifiers are supported on Intel Core processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_atom.30000600003276200002170000000252412247131123021623 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_atom - support for Intel Atom processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: atom .B PMU desc: Intel Atom .sp .SH DESCRIPTION The library supports all Intel Atom-based processors that includes family 6 model 28. .SH MODIFIERS The following modifiers are supported on Intel Atom processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_nhm_unc.30000600003276200002170000000243512247131123022313 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_nhm_unc \- support for Intel Nehalem uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: nhm_unc .B PMU desc: Intel Nehalem uncore .sp .SH DESCRIPTION The library supports the Nehalem uncore PMU as implemented by processors such as Intel Core i7, and Intel Core i5. The PMU is located at the socket-level and is therefore shared between the various cores. By construction it can only measure at all privilege levels. .SH MODIFIERS The following modifiers are supported on Intel Nehalem processors: .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B o Causes the queue occupancy counter associated with the event to be cleared (zeroed). This is a boolean modifier. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_snbep_unc_r2pcie.30000600003276200002170000000244712247131123024107 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_snbep_unc_r2pcie - support for Intel Sandy Bridge-EP R2 PCIe uncore PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: snbep_unc_r2pcie .B PMU desc: Intel Sandy Bridge-EP R2 PCIe uncore PMU .sp .SH DESCRIPTION The library supports the Intel Sandy Bridge R2 PCIe uncore PMU. This PMU model only exists on Sandy Bridge model 45. There is only one R2PCIe PMU per processor socket. .SH MODIFIERS The following modifiers are supported on Intel Sandy Bridge R2PCIe uncore PMU: .TP .B i Invert the meaning of the event. The counter will now count R2 PCIe cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a threshold modifier (t) with a value greater or equal to one. This is a boolean modifier. .TP .B t Set the threshold value. When set to a non-zero value, the counter counts the number of R2PCIe cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:15]. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_arm_ac9.30000600003276200002170000000070212247131123020777 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_arm_ac9 - support for ARM Cortex A9 PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: arm_ac9 .B PMU desc: ARM Cortex A9 .sp .SH DESCRIPTION The library supports the ARM Cortex A9 core PMU. This PMU supports 2 counters and has no privilege levels filtering. No event modifiers are available. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/pfm_find_event.30000600003276200002170000000746712247131123021135 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2009" "" "Linux Programmer's Manual" .SH NAME pfm_find_event \- search for an event masks .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_find_event(const char *"str ");" .sp .SH DESCRIPTION This function is used to convert an event string passed in \fBstr\fR into an opaque event identifier, i.e., the return value. Events are first manipulated a strings which contain the event name, sub-event names and optional filters and modifiers. This function analyzes the string and try to find the matching event. The event string is a structured string and it is composed as follows: .TP .B [pmu_name::]event_name[:unit_mask][:modifer|:modifier=val] .PP The various components are separated by \fB:\fR or \fB::\fR, they are defined as follows: .TP .B pmu_name This is an optional prefix to designte a specific PMU model. With the prefix the event which matches the event_name is used. In case multiple PMU models are activated, there may be conflict with identical event names to mean the same or different things. In that case, it is necessary to fully specify the event with a pmu_name. That string corresponds to what is returned by \fBpfm_get_pmu_name()\fR. .TP .B event_name This is the event name and is required. The library is not case sensitive on event string. The event name must match \fBcompletely\fR the actual event name; it cannot be a substring. .TP .B unit_mask The optional unit mask which can be considered like a sub-event of the major event. If a event has unit masks, and there is no default, then at least one unit mask must be passed in the string. Multiple unit masks may be specified for a single event. .TP .B modifier A modifier is an optional filter which is provided by the hardware register hosting the event or by the underlying kernel infrastructure. Typical modifiers include privilege level filters. Some modifiers are simple boolean, in which case just passing their names is equivalent to setting their value to \fBtrue\fR. Other modifiers need a specific value, in which case it is provided after the equal sign. No space is tolerate around the equal sign. The list of modifiers depends on the host PMU and underlying kernel API. They are documented in PMU-specific documentation. Multiple modifiers may be passed. There is not order between unit masks and modifiers. .PP The library uses the generic term \fBattribute\fR to designate both unit masks and modifiers. Here are a few examples of event strings: .TP .B amd64::RETIRED_INSTRUCTIONS:u Event RETIRED_INSTRUCTION on AMD64 processor, measure at user privilege level only .TP .B RS_UOPS_DISPATCHED:c=1:i:u Event RS_UOPS_DISPATCHED measured at user privilege level only, and with counter-mask set to 1 .PP For the purpose of this function, only the pmu_name and event_name are considered, everything else is parsed, thus must be valid, but is ignored. The function searches only for one event per call. As a convenience, the function will identify the event up to the first comma. In other words, if \fBstr\fR is equal to "EVENTA,EVENTB", then the function will only look at EVENTA and will not return an error because of invalid event string. This is handy when parsing constant event strings containing multiple, comma-separated, events. .SH RETURN The function returns the opaque event identifier that corresponds that the event string. In case of error, a negative error code is returned instead. .SH ERRORS .TP .B PFMLIB_ERR_NOINIT The library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL The event string is NULL. .TP .B PFMLIB_ERR_NOMEM The library ran out of memory. .TP .B PFMLIB_ERR_NOTFOUND The event was not found .TP .B PFMLIB_ERR_ATTR Invalid event attribute .TP .B PFMLIB_ERR_ATTR_VAL Invalid event attribute value .TP .B PFMLIB_ERR_TOOMANY Too many event attributes passed .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_hsw.30000600003276200002170000000767312247131123021476 0ustar ralphundrgrad.TH LIBPFM 3 "April, 2013" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_hsw - support for Intel Haswell core PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: hsw .B PMU desc: Intel Haswell .sp .SH DESCRIPTION The library supports the Intel Haswell core PMU. It should be noted that this PMU model only covers each core's PMU and not the socket level PMU. On Haswell, the number of generic counters depends on the Hyperthreading (HT) mode. When HT is on, then only 4 generic counters are available. When HT is off, then 8 generic counters are available. The \fBpfm_get_pmu_info()\fR function returns the maximum number of generic counters in \fBnum_cntrs\fr. .SH MODIFIERS The following modifiers are supported on Intel Haswell processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .TP .B ldlat Pass a latency threshold to the MEM_TRANS_RETIRED:LOAD_LATENCY event. This is an integer attribute that must be in the range [3:65535]. It is required for this event. Note that the event must be used with precise sampling (PEBS). .TP .B intx Monitor the event only when executing inside a transactional memory region (in tx). Event does not count otherwise. This is a boolean modifiers. Default value is 0. .TP .B intxcp Do not count occurrences of the event when they are inside an aborted transactional memory region. This is a boolean modifier. Default value is 0. .SH OFFCORE_RESPONSE events Intel Haswell provides two offcore_response events. They are called OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1. Those events need special treatment in the performance monitoring infrastructure because each event uses an extra register to store some settings. Thus, in case multiple offcore_response events are monitored simultaneously, the kernel needs to manage the sharing of that extra register. The offcore_response events are exposed as a normal events by the library. The extra settings are exposed as regular umasks. The library takes care of encoding the events according to the underlying kernel interface. On Intel Haswell, the umasks are divided into three categories: request, supplier and snoop. The user must provide at least one umask for each category. The categories are shown in the umask descriptions. There is also the special response umask called \fBANY_RESPONSE\fR. When this umask is used then it overrides any supplier and snoop umasks. In other words, users can specify either \fBANY_RESPONSE\fR \fBOR\fR any combinations of supplier + snoops. In case no supplier or snoop is specified, the library defaults to using \fBANY_RESPONSE\fR. For instance, the following are valid event selections: .TP .B OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_REQUEST .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY .P But the following are illegal: .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY:ANY_RESPONSE .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_amd64_fam10h.30000600003276200002170000000341412247131123021536 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2010" "" "Linux Programmer's Manual" .SH NAME libpfm_amd64_fam10h - support for AMD64 Family 10h processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: amd64_fam10h_barcelona, amd64_fam10h_shanghai, amd64_fam10h_istanbul .B PMU desc: AMD64 Fam10h Barcelona, AMD64 Fam10h Shanghai, AMD64 Fam10h Istanbul .sp .SH DESCRIPTION The library supports AMD Family 10h processors in both 32 and 64-bit modes. They correspond to processor family 16. .SH MODIFIERS The following modifiers are supported on AMD64 Family 10h (16) processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B h Measure at while executing in host mode (when using virtualization). This corresponds to \fBPFM_PLMH\fR. This modifier is available starting with Fam10h. This is a boolean modifier. .TP .B g Measure at while executing in guest mode (when using virtualization). This modifier is available starting with Fam10h. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_amd64_k7.30000600003276200002170000000242712247131123021006 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2010" "" "Linux Programmer's Manual" .SH NAME libpfm_amd64_k7 - support for AMD64 K7 processors .SH SYNOPSIS .nf .B #include .sp .B PMU name: amd64_k7 .B PMU desc: AMD64 K7 .sp .SH DESCRIPTION The library supports AMD K7 processors in both 32 and 64-bit modes. They correspond to processor family 6. .SH MODIFIERS The following modifiers are supported on AMD64 K7 processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_intel_ivb.30000600003276200002170000000727712247131123021455 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_intel_ivb - support for Intel Ivy Bridge core PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: ivb .B PMU desc: Intel Ivy Bridge .B PMU name: ivb_ep .B PMU desc: Intel Ivy Bridge EP .sp .SH DESCRIPTION The library supports the Intel Ivy Bridge core PMU. It should be noted that this PMU model only covers each core's PMU and not the socket level PMU. On Ivy Bridge, the number of generic counters depends on the Hyperthreading (HT) mode. When HT is on, then only 4 generic counters are available. When HT is off, then 8 generic counters are available. The \fBpfm_get_pmu_info()\fR function returns the maximum number of generic counters in \fBnum_cntrs\fr. .SH MODIFIERS The following modifiers are supported on Intel Ivy Bridge processors: .TP .B u Measure at user level which includes privilege levels 1, 2, 3. This corresponds to \fBPFM_PLM3\fR. This is a boolean modifier. .TP .B k Measure at kernel level which includes privilege level 0. This corresponds to \fBPFM_PLM0\fR. This is a boolean modifier. .TP .B i Invert the meaning of the event. The counter will now count cycles in which the event is \fBnot\fR occurring. This is a boolean modifier .TP .B e Enable edge detection, i.e., count only when there is a state transition from no occurrence of the event to at least one occurrence. This modifier must be combined with a counter mask modifier (m) with a value greater or equal to one. This is a boolean modifier. .TP .B c Set the counter mask value. The mask acts as a threshold. The counter will count the number of cycles in which the number of occurrences of the event is greater or equal to the threshold. This is an integer modifier with values in the range [0:255]. .TP .B t Measure on both threads at the same time assuming hyper-threading is enabled. This is a boolean modifier. .TP .B ldlat Pass a latency threshold to the MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD event. This is an integer attribute that must be in the range [3:65535]. It is required for this event. Note that the event must be used with precise sampling (PEBS). .SH OFFCORE_RESPONSE events Intel Ivy Bridge provides two offcore_response events. They are called OFFCORE_RESPONSE_0 and OFFCORE_RESPONSE_1. Those events need special treatment in the performance monitoring infrastructure because each event uses an extra register to store some settings. Thus, in case multiple offcore_response events are monitored simultaneously, the kernel needs to manage the sharing of that extra register. The offcore_response events are exposed as a normal events by the library. The extra settings are exposed as regular umasks. The library takes care of encoding the events according to the underlying kernel interface. On Intel Ivy Bridge, the umasks are divided into three categories: request, supplier and snoop. The user must provide at least one umask for each category. The categories are shown in the umask descriptions. There is also the special response umask called \fBANY_RESPONSE\fR. When this umask is used then it overrides any supplier and snoop umasks. In other words, users can specify either \fBANY_RESPONSE\fR \fBOR\fR any combinations of supplier + snoops. In case no supplier or snoop is specified, the library defaults to using \fBANY_RESPONSE\fR. For instance, the following are valid event selections: .TP .B OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_REQUEST .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY .P But the following are illegal: .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:ANY_RESPONSE .TP .B OFFCORE_RESPONSE_0:ANY_RFO:LLC_HITM:SNOOP_ANY:ANY_RESPONSE .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_arm_ac8.30000600003276200002170000000070212247131123020776 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2012" "" "Linux Programmer's Manual" .SH NAME libpfm_arm_ac8 - support for ARM Cortex A8 PMU .SH SYNOPSIS .nf .B #include .sp .B PMU name: arm_ac8 .B PMU desc: ARM Cortex A8 .sp .SH DESCRIPTION The library supports the ARM Cortex A8 core PMU. This PMU supports 2 counters and has no privilege levels filtering. No event modifiers are available. .SH AUTHORS .nf Stephane Eranian .if .PP papi-5.3.0/src/libpfm4/docs/man3/libpfm_amd64.30000600003276200002170000000121512247131123020377 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2010" "" "Linux Programmer's Manual" .SH NAME libpfm_amd64 - support for AMD64 processors .SH SYNOPSIS .nf .B #include .sp .SH DESCRIPTION The library supports all AMD64 processors in both 32 and 64-bit modes. The support is broken down in three groups: .TP .B AMD K7 processors (family 6) .TP .B AMD K8 processors (family 15) .TP .B AMD Family 10h processors (family 16) .sp .TP Each group has a distinct man page. See links below. .SH SEE ALSO libpfm_amd64_k7(3), libpfm_amd64_k8(3), libpfm_amd64_fam10h(3) .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP papi-5.3.0/src/libpfm4/Makefile0000600003276200002170000000501712247131123015725 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # # # Look in config.mk for options # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi) include config.mk EXAMPLE_DIRS=examples DIRS=lib tests $(EXAMPLE_DIRS) include docs ifeq ($(SYS),Linux) EXAMPLE_DIRS +=perf_examples endif ifneq ($(CONFIG_PFMLIB_NOPYTHON),y) DIRS += python endif TAR=tar --exclude=.git --exclude=.gitignore CURDIR=$(shell basename "$$PWD") PKG=libpfm-4.$(REVISION).$(AGE) TARBALL=$(PKG).tar.gz all: @echo Compiling for \'$(ARCH)\' target @echo Compiling for \'$(SYS)\' system @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done lib: $(MAKE) -C lib clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done distclean: clean @(cd debian; $(RM) -f *.log *.debhelper *.substvars; $(RM) -rf libpfm4-dev libpfm4 python-libpfm4 tmp files) $(RM) -f tags depend: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done tar: clean ln -s $$PWD ../$(PKG) && cd .. && $(TAR) -zcf $(TARBALL) $(PKG)/. && rm $(PKG) @echo generated ../$(TARBALL) install: @echo installing in $(DESTDIR) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done install_examples: @set -e ; for d in $(EXAMPLE_DIRS) ; do $(MAKE) -C $$d $@ ; done tags: @echo creating tags $(MAKE) -C lib $@ static: make all CONFIG_PFMLIB_SHARED=n .PHONY: all clean distclean depend tar install install_examples lib static # DO NOT DELETE papi-5.3.0/src/libpfm4/README0000600003276200002170000001233412247131123015145 0ustar ralphundrgrad ------------------------------------------------------------ libpfm-4.x: a helper library to program the performance monitoring events ------------------------------------------------------------ Copyright (c) 2009 Google, Inc Contributed by Stephane Eranian Copyright (c) 2001-2007 Hewlett-Packard Development Company, L.P. Contributed by Stephane Eranian This package provides a library, called libpfm4 which is used to develop monitoring tools exploiting the performance monitoring events such as those provided by the Performance Monitoring Unit (PMU) of modern processors. This is a complete rewrite of libpfm3 and it is NOT backward compatible with it. Libpfm4 helps convert from an event name, expressed as a string, to the event encoding that is either the raw event as documented by HW vendor or the OS-specific encoding. In the latter case, the library is able to prepare the OS-specific data structures needed by the kernel to setup the event. The current libpfm4 provides support for the perf_events interface which was introduced in Linux v2.6.31. Perfmon support is not present yet. The library does not make any performance monitoring system calls. It is portable and supports other operating system environments beyond Linux, such as Mac OS X, and Windows. The library supports many PMUs. The current version can handle: - For AMD X86: AMD64 K7, K8 AMD64 Fam10h (Barcelona, Shanghai, Istanbul) AMD64 Fam11h (Turion) AMD64 Fam12h (Llano) AMD64 Fam14h (Bobcat) AMD64 Fam15h (Bulldozer) - For Intel X86: Intel P6 (Pentium II, Pentium Pro, Pentium III, Pentium M) Intel Yonah (Core Duo/Core Solo), Intel Core (Merom, Penryn, Dunnington) Intel Atom Intel Nehalem, Westmere Intel Sandy Bridge Intel Ivy Bridge Intel Haswell Intel Knights Corner Intel architectural perfmon v1, v2, v3 - For ARM: ARMV7 Cortex A8 ARMV7 Cortex A9 ARMV7 Cortex A15 - For SPARC Ultra I, II Ultra III, IIIi, III+ Ultra IV+ Niagara I, Niagara II - For IBM Power 4 Power 5 Power 6 Power 7 Power 8 PPC970 Torrent System z (s390x) - For MIPS Mips 74k WHAT'S THERE ------------- - the library source code including support for all processors listed above - a set of generic examples showing how to list and query events. They are in examples. - a set of examples showing how the library can be used with the perf_events interface. They are in perf_examples. - a set of library header files used to compile the library and perf_examples - man pages for all the library entry points - Python bindings for the library - a SPEC file to build RPMs from the library - the Debian-style config file to build a .deb package from the library INSTALLATION ------------ - edit config.mk to : - update some of the configuration variables - select your compiler options - type make - type make install - The default installation location is /usr/local. You can specify a diffierent install location as follows: $ make PREFIX= install Depending on your install location, you may need to run the 'ldconfig' command or use LD_LIBRARY_PATH when you build and run tools that link to the libpfm4 library. - By default, libpfm library files are installed in /lib. If 'make' builds 64-bit libraries on your system, and your target architecture expects 64-bit libraries to be located in a library named "lib64", then you should use the LIBDIR variable when installing, as follows: $ make LIBDIR=/lib64 install - To compile and install the Python bindings, you need to go to the python sub-directory and type make. Python may not be systematically built. - to compile the library for another ABI (e.g. 32-bit x86 on a 64-bit x86) system, you can pass the ABI flag to the compiler as follows (assuming you have the multilib version of gcc): $ make OPTIM="-m32 -O2" PACKAGING --------- The library comes with the config files necessary to generate RPMs or Debian packages. The source code produces 3 packages: - libpfm : runtime library - libpfm-dev: development files (headers, manpages, static library) - libpfm-python: Python bindings for the library To generate the RPMs: $ rpmbuild -ba libpfm.spec To generate the Debian packages: $ debuild -i -us -uc -b You may need to install some extra packages to make Debian package generation possible. REQUIREMENTS: ------------- - to run the programs in the perf_examples subdir, you MUST be using a linux kernel with perf_events. That means v2.6.31 or later. - to compile the Python bindings, you need to have SWIG and the python development packages installed - To compile on Windows, you need the MinGW and MSYS compiler environment (see www.mingw.org). The environment needs to be augmented with the mingw regex user contributed package (mingw-libgnurx-2.5.1.dev.tar.gz). - To compile on Mac OS X, you need to have gmake installed. DOCUMENTATION ------------- - man pages for all entry points. It is recommended you start with: man libpfm - More information can be found on library web site: http://perfmon2.sf.net papi-5.3.0/src/libpfm4/rules.mk0000600003276200002170000000300412247131124015743 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # # This file is part of libpfm, a performance monitoring support library for # applications on Linux/ia64. # .SUFFIXES: .c .S .o .lo .cpp .S.o: $(CC) $(CFLAGS) -c $*.S .c.o: $(CC) $(CFLAGS) -c $*.c .cpp.o: $(CXX) $(CFLAGS) -c $*.cpp .c.lo: $(CC) -fPIC -DPIC $(CFLAGS) -c $*.c -o $*.lo .S.lo: $(CC) -fPIC -DPIC $(CFLAGS) -c $*.S -o $*.lo papi-5.3.0/src/libpfm4/python/0000700003276200002170000000000012247131124015602 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/python/Makefile0000600003276200002170000000276512247131124017256 0ustar ralphundrgrad# # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk PYTHON_PREFIX=$(PREFIX) all: CFLAGS="-O2 -g" ./setup.py build install: CFLAGS="-O2 -g" ./setup.py install --prefix=$(DESTDIR)$(PYTHON_PREFIX) clean: $(RM) src/perfmon_int_wrap.c src/perfmon_int.py src/*.pyc $(RM) -r build papi-5.3.0/src/libpfm4/python/README0000600003276200002170000000041012247131124016457 0ustar ralphundrgradRequirements: To use the python bindings, you need the following packages: 1. swig (http://www.swig.org) 2. python-dev (http://www.python.org) 3. module-linux (http://code.google.com/p/module-linux) linux.sched is python package that comes with module-linux. papi-5.3.0/src/libpfm4/python/sys.py0000700003276200002170000000435512247131124017004 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # # System wide monitoring example. Copied from syst.c # # Run as: ./sys.py -c cpulist -e eventlist import sys import os import optparse import time import struct import perfmon if __name__ == '__main__': parser = optparse.OptionParser() parser.add_option("-e", "--events", help="Events to use", action="store", dest="events") parser.add_option("-c", "--cpulist", help="CPUs to monitor", action="store", dest="cpulist") parser.set_defaults(cpulist="0") parser.set_defaults(events="PERF_COUNT_HW_CPU_CYCLES") (options, args) = parser.parse_args() cpus = options.cpulist.split(',') cpus = [ int(c) for c in cpus ] if options.events: events = options.events.split(",") else: raise "You need to specify events to monitor" s = perfmon.SystemWideSession(cpus, events) s.start() # Measuring loop while 1: time.sleep(1) # read the counts for c in cpus: for i in range(0, len(events)): count = struct.unpack("L", s.read(c, i))[0] print """CPU%d: %s\t%lu""" % (c, events[i], count) papi-5.3.0/src/libpfm4/python/self.py0000700003276200002170000000406212247131124017112 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # # Self monitoring example. Copied from self.c import os import optparse import random import errno import struct import perfmon if __name__ == '__main__': parser = optparse.OptionParser() parser.add_option("-e", "--events", help="Events to use", action="store", dest="events") parser.set_defaults(events="PERF_COUNT_HW_CPU_CYCLES") (options, args) = parser.parse_args() if options.events: events = options.events.split(",") else: raise "You need to specify events to monitor" s = perfmon.PerThreadSession(int(os.getpid()), events) s.start() # code to be measured # # note that this is not identical to what examples/self.c does # thus counts will be different in the end for i in range(1, 1000000): random.random() # read the counts for i in range(0, len(events)): count = struct.unpack("L", s.read(i))[0] print """%s\t%lu""" % (events[i], count) papi-5.3.0/src/libpfm4/python/src/0000700003276200002170000000000012247131124016371 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/python/src/perfmon_int.i0000600003276200002170000000617012247131124021071 0ustar ralphundrgrad/* * * Copyright (c) 2008 Google, Inc. * Contributed by Arun Sharma * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Python Bindings for perfmon. */ %module perfmon_int %{ #include #define SWIG #include #include #include static PyObject *libpfm_err; %} %include "typemaps.i" %include "carrays.i" %include "cstring.i" %include /* Convert libpfm errors into exceptions */ %typemap(out) os_err_t { if (result == -1) { PyErr_SetFromErrno(PyExc_OSError); SWIG_fail; } resultobj = SWIG_From_int((int)(result)); }; %typemap(out) pfm_err_t { if (result != PFM_SUCCESS) { PyObject *obj = Py_BuildValue("(i,s)", result, pfm_strerror(result)); PyErr_SetObject(libpfm_err, obj); SWIG_fail; } else { PyErr_Clear(); } resultobj = SWIG_From_int((int)(result)); } /* Generic return structures via pointer output arguments */ %define ptr_argout(T) %typemap(argout) T* output { if (!PyTuple_Check($result)) { PyObject *x = $result; $result = PyTuple_New(1); PyTuple_SET_ITEM($result, 0, x); } PyObject *o = SWIG_NewPointerObj((void *)$1, $descriptor, 0); $result = SWIG_AppendOutput($result, o); } %typemap(in, numinputs=0) T* output { $1 = (T*) malloc(sizeof(T)); memset($1, 0, sizeof(T)); } %extend T { ~T() { free(self); } } %enddef ptr_argout(pfm_pmu_info_t); ptr_argout(pfm_event_info_t); ptr_argout(pfm_event_attr_info_t); %typedef int pid_t; /* Kernel interface */ %include ptr_argout(perf_event_attr_t); /* Library interface */ /* We never set the const char * members. So no memory leak */ #pragma SWIG nowarn=451 %include /* OS specific library interface */ extern pfm_err_t pfm_get_perf_event_encoding(const char *str, int dfl_plm, perf_event_attr_t *output, char **fstr, int *idx); %init %{ libpfm_err = PyErr_NewException("perfmon.libpfmError", NULL, NULL); PyDict_SetItemString(d, "libpfmError", libpfm_err); %} papi-5.3.0/src/libpfm4/python/src/session.py0000600003276200002170000000472312247131124020436 0ustar ralphundrgrad# # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # from perfmon import * import os import sys # Common base class class Session: def __init__(self, events): self.system = System() self.event_names = events self.events = [] self.fds = [] for e in events: err, encoding = pfm_get_perf_event_encoding(e, PFM_PLM0 | PFM_PLM3, None, None) self.events.append(encoding) def __del__(self): pass def read(self, fd): # TODO: determine counter width return os.read(fd, 8) class SystemWideSession(Session): def __init__(self, cpus, events): self.cpus = cpus Session.__init__(self, events) def __del__(self): Session.__del__(self) def start(self): self.cpu_fds = [] for c in self.cpus: self.cpu_fds.append([]) cur_cpu_fds = self.cpu_fds[-1] for e in self.events: cur_cpu_fds.append(perf_event_open(e, -1, c, -1, 0)) def read(self, c, i): index = self.cpus.index(c) return Session.read(self, self.cpu_fds[index][i]) class PerThreadSession(Session): def __init__(self, pid, events): self.pid = pid Session.__init__(self, events) def __del__(self): Session.__del__(self) def start(self): for e in self.events: self.fds.append(perf_event_open(e, self.pid, -1, -1, 0)) def read(self, i): return Session.read(self, self.fds[i]) papi-5.3.0/src/libpfm4/python/src/pmu.py0000600003276200002170000000561112247131124017551 0ustar ralphundrgrad# # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # import os from perfmon import * def public_members(self): s = "{ " for k, v in self.__dict__.iteritems(): if not k[0] == '_': s += "%s : %s, " % (k, v) s += " }" return s class System: # Use the os that gives us everything os = PFM_OS_PERF_EVENT_EXT def __init__(self): self.ncpus = os.sysconf('SC_NPROCESSORS_ONLN') self.pmus = [] for i in range(0, PFM_PMU_MAX): try: pmu = PMU(i) except: pass else: self.pmus.append(pmu) def __repr__(self): return public_members(self) class Event: def __init__(self, info): self.info = info self.__attrs = [] def __repr__(self): return '\n' + public_members(self) def __parse_attrs(self): info = self.info for index in range(0, info.nattrs): self.__attrs.append(pfm_get_event_attr_info(info.idx, index, System.os)[1]) def attrs(self): if not self.__attrs: self.__parse_attrs() return self.__attrs class PMU: def __init__(self, i): self.info = pfm_get_pmu_info(i)[1] self.__events = [] def __parse_events(self): index = self.info.first_event while index != -1: self.__events.append(Event(pfm_get_event_info(index, System.os)[1])) index = pfm_get_event_next(index) def events(self): if not self.__events: self.__parse_events() return self.__events def __repr__(self): return public_members(self) if __name__ == '__main__': from perfmon import * s = System() for pmu in s.pmus: info = pmu.info if info.flags.is_present: print info.name, info.size, info.nevents for e in pmu.events(): print e.info.name, e.info.code for a in e.attrs(): print '\t\t', a.name, a.code papi-5.3.0/src/libpfm4/python/src/__init__.py0000600003276200002170000000012412247131124020501 0ustar ralphundrgradfrom perfmon_int import * from pmu import * from session import * pfm_initialize() papi-5.3.0/src/libpfm4/python/setup.py0000700003276200002170000000124012247131124017314 0ustar ralphundrgrad#!/usr/bin/env python from distutils.core import setup, Extension from distutils.command.install_data import install_data setup(name='perfmon', version='4.0', author='Arun Sharma', author_email='arun.sharma@google.com', description='libpfm wrapper', packages=['perfmon'], package_dir={ 'perfmon' : 'src' }, py_modules=['perfmon.perfmon_int'], ext_modules=[Extension('perfmon._perfmon_int', sources = ['src/perfmon_int.i'], libraries = ['pfm'], library_dirs = ['../lib'], include_dirs = ['../include'], swig_opts=['-I../include'])]) papi-5.3.0/src/libpfm4/debian/0000700003276200002170000000000012247131123015502 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/debian/control0000600003276200002170000000335012247131123017110 0ustar ralphundrgradSource: libpfm4 Priority: extra Maintainer: Stephane Eranian Build-Depends: debhelper (>= 7), dpatch, python (>= 2.4), python-support, python-dev (>= 2.4), swig Standards-Version: 3.8.4 Section: libs Homepage: http://perfmon2.sourceforge.net/ Package: libpfm4-dev Section: libdevel Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: A library to program the performance monitoring events Libpfm4 helps convert from an event name, expressed as a string, to the event encoding. The encoding can then be used with specific OS interfaces. Libpfm4 also provides OS-specific interfaces to directly setup OS-specific data structures to be passed to the kernel. The current libpfm4, for instance, provides support for the perf_events interface which was introduced in Linux v2.6.31. Package: libpfm4 Section: libs Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends} Description: A library to program the performance monitoring events Libpfm4 helps convert from an event name, expressed as a string, to the event encoding. The encoding can then be used with specific OS interfaces. Libpfm4 also provides OS-specific interfaces to directly setup OS-specific data structures to be passed to the kernel. The current libpfm4, for instance, provides support for the perf_events interface which was introduced in Linux v2.6.31. Package: python-libpfm4 Depends: libpfm4, python, ${shlibs:Depends}, ${misc:Depends} Architecture: any Section: python Description: Python bindings for libpfm4 This package allows you to write simple python scripts that monitor various hardware performance monitoring events. It may be more efficient to use this approach instead of parsing the output of other tools. papi-5.3.0/src/libpfm4/debian/compat0000600003276200002170000000000212247131123016702 0ustar ralphundrgrad7 papi-5.3.0/src/libpfm4/debian/libpfm4-dev.install0000600003276200002170000000003512247131123021203 0ustar ralphundrgradusr/include/* usr/lib/lib*.a papi-5.3.0/src/libpfm4/debian/docs0000600003276200002170000000000712247131123016354 0ustar ralphundrgradREADME papi-5.3.0/src/libpfm4/debian/pyversions0000600003276200002170000000000512247131123017643 0ustar ralphundrgrad2.4- papi-5.3.0/src/libpfm4/debian/README0000600003276200002170000000026012247131123016362 0ustar ralphundrgradThe Debian Package libpfm4 ---------------------------- libpfm4 packaging tested on Ubuntu Lucid (amd64). -- Arun Sharma Mon, 21 Jun 2010 15:17:22 -0700 papi-5.3.0/src/libpfm4/debian/README.source0000600003276200002170000000013112247131123017656 0ustar ralphundrgradSources were slightly modified to compile with -Werror -Arun Sharma (aruns@google.com) papi-5.3.0/src/libpfm4/debian/copyright0000600003276200002170000000270312247131123017441 0ustar ralphundrgradThis work was packaged for Debian by: Arun Sharma on Mon, 21 Jun 2010 15:17:22 -0700 It was downloaded from: git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4 Upstream Author(s): Stephane Eranian Packaging by: Copyright (C) 2010 Arun Sharma Library and packaging released under the following license: Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. papi-5.3.0/src/libpfm4/debian/python-libpfm4.install0000600003276200002170000000012612247131123021747 0ustar ralphundrgradusr/lib/python*/site-packages/perfmon/*.py usr/lib/python*/site-packages/perfmon/*.so papi-5.3.0/src/libpfm4/debian/libpfm4-dev.manpages0000600003276200002170000000001612247131123021327 0ustar ralphundrgraddocs/man3/*.3 papi-5.3.0/src/libpfm4/debian/changelog0000600003276200002170000000262212247131123017360 0ustar ralphundrgradlibpfm4 (4.0) unstable; urgency=low * Intel IVB-EP support * Intel IVB updates support * Intel SNB updates support * Intel SNB-EP uncore support * ldlat support (PEBS-LL) * New Intel Atom support * bug fixes -- Stephane Eranian Fri, 08 JUn 2013 18:45:01 +0200 libpfm4 (3.0) unstable; urgency=low * ARM Cortex A15 support * updated Intel Sandy Bridge core PMU events * Intel Sandy Bridge desktop (model 42) uncore PMU support * Intel Ivy Bridge support * full perf_events generic event support * updated perf_examples * enabled Intel Nehalem/Westmere uncore PMU support * AMD LLano processor supoprt (Fam 12h) * AMD Turion rocessor supoprt (Fam 11h) * Intel Atom Cedarview processor support * Win32 compilation support * perf_events excl attribute * perf_events generic hw event aliases support * many bug fixes -- Stephane Eranian Mon, 27 Aug 2012 17:45:22 +0200 libpfm4 (2.0) unstable; urgency=low * updated event tables for Intel X86 processors * new AMD Fam15h support * new MIPS 74k support * updated ARM Cortex A8/A9 support * 30% size reduction for Intel/AMD X86 event tables * bug fixes and other improvements -- Stephane Eranian Fri, 7 Oct 2011 15:55:22 +0200 libpfm4 (1.0) unstable; urgency=low * Initial Release. -- Arun Sharma Mon, 21 Jun 2010 15:17:22 -0700 papi-5.3.0/src/libpfm4/debian/rules0000700003276200002170000000127012247131123016562 0ustar ralphundrgrad#!/usr/bin/make -f # -*- makefile -*- # Sample debian/rules that uses debhelper. # This file was originally written by Joey Hess and Craig Small. # As a special exception, when this file is copied by dh-make into a # dh-make output file, you may use that output file without restriction. # This special exception was added by Craig Small in version 0.37 of dh-make. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 include /usr/share/dpatch/dpatch.make build: patch-stamp clean: unpatch override_dh_auto_install: build dh_auto_test dh_testroot dh_prep dh_installdirs make install DESTDIR=$(CURDIR)/debian/tmp PREFIX=/usr CONFIG_PFMLIB_NOPYTHON=n LDCONFIG=true %: dh $@ papi-5.3.0/src/libpfm4/debian/source/0000700003276200002170000000000012247131123017002 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/debian/source/format0000600003276200002170000000001412247131123020212 0ustar ralphundrgrad3.0 (quilt) papi-5.3.0/src/libpfm4/debian/libpfm4-dev.dirs0000600003276200002170000000002412247131123020474 0ustar ralphundrgradusr/lib usr/include papi-5.3.0/src/libpfm4/debian/libpfm4.install0000600003276200002170000000002212247131123020423 0ustar ralphundrgradusr/lib/lib*.so.* papi-5.3.0/src/libpfm4/perf_examples/0000700003276200002170000000000012247131124017113 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/perf_examples/rtop.c0000600003276200002170000003006212247131124020246 0ustar ralphundrgrad/* rtop.c - a simple PMU-based CPU utilization tool * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2004-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define RTOP_VERSION "0.2" /* * max number of cpus (threads) supported */ #define RTOP_MAX_CPUS 2048 /* MUST BE power of 2 */ #define RTOP_CPUMASK_BITS (sizeof(unsigned long)<<3) #define RTOP_CPUMASK_COUNT (RTOP_MAX_CPUS/RTOP_CPUMASK_BITS) #define RTOP_CPUMASK_SET(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] |= (1UL << ((g) % RTOP_CPUMASK_BITS))) #define RTOP_CPUMASK_CLEAR(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] &= ~(1UL << ((g) % RTOP_CPUMASK_BITS))) #define RTOP_CPUMASK_ISSET(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] & (1UL << ((g) % RTOP_CPUMASK_BITS))) typedef unsigned long rtop_cpumask_t[RTOP_CPUMASK_COUNT]; typedef struct { struct { int opt_verbose; int opt_delay; /* refresh delay in second */ int opt_delay_set; } program_opt_flags; rtop_cpumask_t cpu_mask; /* which CPUs to use in system wide mode */ long online_cpus; long selected_cpus; unsigned long cpu_mhz; } program_options_t; #define opt_verbose program_opt_flags.opt_verbose #define opt_delay program_opt_flags.opt_delay #define opt_delay_set program_opt_flags.opt_delay_set static program_options_t options; static struct termios saved_tty; static int time_to_quit; static int term_rows, term_cols; static void get_term_size(void) { int ret; struct winsize ws; ret = ioctl(1, TIOCGWINSZ, &ws); if (ret) err(1, "cannot determine screen size"); if (ws.ws_row > 10) { term_cols = ws.ws_col; term_rows = ws.ws_row; } else { term_cols = 80; term_rows = 24; } if (term_rows < options.selected_cpus) errx(1, "you need at least %ld rows on your terminal to display all CPUs", options.selected_cpus); } static void sigwinch_handler(int n) { get_term_size(); } static void setup_screen(void) { int ret; ret = tcgetattr(0, &saved_tty); if (ret == -1) errx(1, "cannot save tty settings\n"); get_term_size(); initscr(); nocbreak(); resizeterm(term_rows, term_cols); } static void close_screen(void) { endwin(); tcsetattr(0, TCSAFLUSH, &saved_tty); } static void fatal_errorw(char *fmt, ...) { va_list ap; close_screen(); va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static void sigint_handler(int n) { time_to_quit = 1; } static unsigned long find_cpu_speed(void) { FILE *fp1; unsigned long f1 = 0, f2 = 0; char buffer[128], *p, *value; memset(buffer, 0, sizeof(buffer)); fp1 = fopen("/proc/cpuinfo", "r"); if (fp1 == NULL) return 0; for (;;) { buffer[0] = '\0'; p = fgets(buffer, 127, fp1); if (p == NULL) break; /* skip blank lines */ if (*p == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) break; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncasecmp("cpu MHz", buffer, 7)) { float fl; sscanf(value, "%f", &fl); f1 = lroundf(fl); break; } if (!strncasecmp("BogoMIPS", buffer, 8)) { float fl; sscanf(value, "%f", &fl); f2 = lroundf(fl); } } fclose(fp1); return f1 == 0 ? f2 : f1; } static void setup_signals(void) { struct sigaction act; sigset_t my_set; /* * SIGINT is a asynchronous signal * sent to the process (not a specific thread). POSIX states * that one and only one thread will execute the handler. This * could be any thread that does not have the signal blocked. */ /* * install SIGINT handler */ memset(&act,0,sizeof(act)); sigemptyset(&my_set); act.sa_handler = sigint_handler; sigaction (SIGINT, &act, 0); /* * install SIGWINCH handler */ memset(&act,0,sizeof(act)); sigemptyset(&my_set); act.sa_handler = sigwinch_handler; sigaction (SIGWINCH, &act, 0); } static struct option rtop_cmd_options[]={ { "help", 0, 0, 1 }, { "version", 0, 0, 2 }, { "delay", 0, 0, 3 }, { "cpu-list", 1, 0, 4 }, { "verbose", 0, &options.opt_verbose, 1 }, { 0, 0, 0, 0} }; #define MAX_EVENTS 2 typedef struct { uint64_t prev_values[MAX_EVENTS]; int fd[MAX_EVENTS]; int cpu; } cpudesc_t; /* * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; */ typedef struct { uint64_t nr; uint64_t time_enabled; uint64_t time_running; uint64_t values[2]; } rtop_grp_t; static void mainloop(void) { struct perf_event_attr ev[MAX_EVENTS]; unsigned long itc_delta; cpudesc_t *cpus; int i, j = 0, k, ncpus = 0; int num, ret; ncpus = options.selected_cpus; cpus = calloc(ncpus, sizeof(cpudesc_t)); if (!cpus) err(1, "cannot allocate file descriptors"); memset(ev, 0, sizeof(ev)); /* measure user cycles */ ev[0].type = PERF_TYPE_HARDWARE; ev[0].config = PERF_COUNT_HW_CPU_CYCLES; ev[0].read_format = PERF_FORMAT_SCALE|PERF_FORMAT_GROUP; ev[0].exclude_kernel = 1; ev[0].disabled = 1; ev[0].pinned = 0; /* measure kernel cycles */ ev[1].type = PERF_TYPE_HARDWARE; ev[1].config = PERF_COUNT_HW_CPU_CYCLES; ev[1].exclude_user = 1; ev[1].disabled = 1; ev[1].pinned = 0; num = 2; for(i=0, k = 0; ncpus; i++) { if (RTOP_CPUMASK_ISSET(options.cpu_mask, i) == 0) continue; cpus[k].cpu = i; cpus[k].fd[0] = -1; for(j=0 ; j < num; j++) { cpus[k].fd[j] = perf_event_open(ev+j, -1, i, cpus[k].fd[0], 0); if (cpus[k].fd[j] == -1) fatal_errorw("cannot open event %d on CPU%d: %s\n", j, i, strerror(errno)); } ncpus--; k++; } ncpus = options.selected_cpus; itc_delta = options.opt_delay * options.cpu_mhz * 1000000; for(i=0; i < ncpus; i++) for(j=0; j < num; j++) ioctl(cpus[i].fd[j], PERF_EVENT_IOC_ENABLE, 0); for(;time_to_quit == 0;) { sleep(options.opt_delay); move(0, 0); for(i=0; i < ncpus; i++) { uint64_t values[MAX_EVENTS]; uint64_t raw_values[5]; double k_cycles, u_cycles, i_cycles, ratio; /* * given our events are in the same group, we can do a * group read and get both counts + scaling information */ ret = read(cpus[i].fd[0], raw_values, sizeof(raw_values)); if (ret != sizeof(raw_values)) fatal_errorw("cannot read count for event %d on CPU%d\n", j, cpus[i].cpu); printw("nr=%"PRIu64"\n", raw_values[0]); printw("ena=%"PRIu64"\n", raw_values[1]); printw("run=%"PRIu64"\n", raw_values[2]); raw_values[0] = raw_values[3]; values[0] = perf_scale(raw_values); raw_values[0] = raw_values[4]; values[1] = perf_scale(raw_values); ratio = perf_scale_ratio(raw_values); k_cycles = (double)(values[1] - cpus[i].prev_values[1])*100.0/ (double)itc_delta; u_cycles = (double)(values[0] - cpus[i].prev_values[0])*100.0/ (double)itc_delta; i_cycles = 100.0 - (k_cycles + u_cycles); cpus[i].prev_values[0] = values[0]; cpus[i].prev_values[1] = values[1]; /* * adjust for rounding errors */ if (i_cycles < 0.0) i_cycles = 0.0; if (i_cycles > 100.0) i_cycles = 100.0; if (k_cycles > 100.0) k_cycles = 100.0; if (u_cycles > 100.0) u_cycles = 100.0; printw("CPU%-2ld %6.2f%% usr %6.2f%% sys %6.2f%% idle (scaling ratio %.2f%%)\n", i, u_cycles, k_cycles, i_cycles, ratio*100.0); } refresh(); } for(i=0; i < ncpus; i++) for(j=0; j < num; j++) close(cpus[i].fd[j]); free(cpus); } void populate_cpumask(char *cpu_list) { char *p; unsigned long start_cpu, end_cpu = 0; unsigned long i, count = 0; options.online_cpus = sysconf(_SC_NPROCESSORS_ONLN); if (options.online_cpus == -1) errx(1, "cannot figure out the number of online processors"); if (cpu_list == NULL) { if (options.online_cpus >= RTOP_MAX_CPUS) errx(1, "rtop can only handle to %u CPUs", RTOP_MAX_CPUS); for(i=0; i < options.online_cpus; i++) RTOP_CPUMASK_SET(options.cpu_mask, i); options.selected_cpus = options.online_cpus; return; } while(isdigit(*cpu_list)) { p = NULL; start_cpu = strtoul(cpu_list, &p, 0); /* auto-detect base */ if (start_cpu == ULONG_MAX || (*p != '\0' && *p != ',' && *p != '-')) goto invalid; if (p && *p == '-') { cpu_list = ++p; p = NULL; end_cpu = strtoul(cpu_list, &p, 0); /* auto-detect base */ if (end_cpu == ULONG_MAX || (*p != '\0' && *p != ',')) goto invalid; if (end_cpu < start_cpu) goto invalid_range; } else { end_cpu = start_cpu; } if (start_cpu >= RTOP_MAX_CPUS || end_cpu >= RTOP_MAX_CPUS) goto too_big; for (; start_cpu <= end_cpu; start_cpu++) { if (start_cpu >= options.online_cpus) goto not_online; /* XXX: assume contiguous range of CPUs */ if (RTOP_CPUMASK_ISSET(options.cpu_mask, start_cpu)) continue; RTOP_CPUMASK_SET(options.cpu_mask, start_cpu); count++; } if (*p) ++p; cpu_list = p; } options.selected_cpus = count; return; invalid: errx(1, "invalid cpu list argument: %s", cpu_list); /* no return */ not_online: errx(1, "cpu %lu is not online", start_cpu); /* no return */ invalid_range: errx(1, "cpu range %lu - %lu is invalid", start_cpu, end_cpu); /* no return */ too_big: errx(1, "rtop is limited to %u CPUs", RTOP_MAX_CPUS); /* no return */ } static void usage(void) { printf( "usage: rtop [options]:\n" "-h, --help\t\t\tdisplay this help and exit\n" "-v, --verbose\t\t\tverbose output\n" "-V, --version\t\t\tshow version and exit\n" "-d nsec, --delay=nsec\t\tnumber of seconds between refresh (default=1s)\n" "--cpu-list=cpu1,cpu2\t\tlist of CPUs to monitor(default=all)\n" ); } int main(int argc, char **argv) { int c; char *cpu_list = NULL; //if (geteuid()) err(1, "perf_event requires root privileges to create system-wide measurments\n"); while ((c=getopt_long(argc, argv,"+vhVd:", rtop_cmd_options, 0)) != -1) { switch(c) { case 0: continue; /* fast path for options */ case 'v': options.opt_verbose = 1; break; case 1: case 'h': usage(); exit(0); case 2: case 'V': printf("rtop version " RTOP_VERSION " Date: " __DATE__ "\n" "Copyright (C) 2009 Google, Inc\n"); exit(0); case 3: case 'd': options.opt_delay = atoi(optarg); if (options.opt_delay < 0) errx(1, "invalid delay, must be >= 0"); options.opt_delay_set = 1; break; case 4: if (*optarg == '\0') errx(1, "--cpu-list needs an argument\n"); cpu_list = optarg; break; default: errx(1, "unknown option\n"); } } /* * default refresh delay */ if (options.opt_delay_set == 0) options.opt_delay = 1; options.cpu_mhz = find_cpu_speed(); populate_cpumask(cpu_list); setup_signals(); setup_screen(); mainloop(); close_screen(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/self_pipe.c0000600003276200002170000001312312247131124021227 0ustar ralphundrgrad/* * self_pipe.c - dual process ping-pong example to stress PMU context switch of one process * * Copyright (c) 2008 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" static struct { const char *events; int cpu; int delay; } options; int pin_cpu(pid_t pid, unsigned int cpu) { cpu_set_t mask; CPU_ZERO(&mask); CPU_SET(cpu, &mask); return sched_setaffinity(pid, sizeof(mask), &mask); } static volatile int quit; void sig_handler(int n) { quit = 1; } static void do_child(int fr, int fw) { char c; ssize_t ret; for(;;) { ret = read(fr, &c, 1); if (ret < 0) break; ret = write(fw, "c", 1); if (ret < 0) break; } printf("child exited\n"); exit(0); } static void measure(void) { perf_event_desc_t *fds = NULL; int num_fds = 0; uint64_t values[3]; ssize_t n; int i, ret; int pr[2], pw[2]; pid_t pid; char cc = '0'; ret = pfm_initialize(); if (ret != PFM_SUCCESS) err(1, "cannot initialize libpfm"); if (options.cpu == -1) { srandom(getpid()); options.cpu = random() % sysconf(_SC_NPROCESSORS_ONLN); } ret = pipe(pr); if (ret) err(1, "cannot create read pipe"); ret = pipe(pw); if (ret) err(1, "cannot create write pipe"); ret = perf_setup_list_events(options.events, &fds, &num_fds); if (ret || !num_fds) exit(1); for(i=0; i < num_fds; i++) { fds[i].hw.disabled = 1; fds[i].hw.read_format = PERF_FORMAT_SCALE; fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, -1, 0); if (fds[i].fd == -1) err(1, "cannot open event %d", i); } /* * Pin to CPU0, inherited by child process. That will enforce * the ping-pionging and thus stress the PMU context switch * which is what we want */ ret = pin_cpu(getpid(), options.cpu); if (ret) err(1, "cannot pin to CPU%d", options.cpu); printf("Both processes pinned to CPU%d, running for %d seconds\n", options.cpu, options.delay); /* * create second process which is not monitoring at the moment */ switch(pid=fork()) { case -1: err(1, "cannot create child\n"); exit(1); /* not reached */ case 0: /* do not inherit session fd */ for(i=0; i < num_fds; i++) close(fds[i].fd); /* pr[]: write master, read child */ /* pw[]: read master, write child */ close(pr[1]); close(pw[0]); do_child(pr[0], pw[1]); exit(1); } close(pr[0]); close(pw[1]); /* * Let's roll now */ prctl(PR_TASK_PERF_EVENTS_ENABLE); signal(SIGALRM, sig_handler); alarm(options.delay); /* * ping pong loop */ while(!quit) { n = write(pr[1], "c", 1); if (n < 1) err(1, "write failed"); n = read(pw[0], &cc, 1); if (n < 1) err(1, "read failed"); } prctl(PR_TASK_PERF_EVENTS_DISABLE); for(i=0; i < num_fds; i++) { uint64_t val; double ratio; ret = read(fds[i].fd, values, sizeof(values)); if (ret == -1) err(1,"pfm_read error"); if (ret != sizeof(values)) errx(1, "did not read correct amount %d", ret); val = perf_scale(values); ratio = perf_scale_ratio(values); if (ratio == 1.0) printf("%20"PRIu64" %s\n", val, fds[i].name); else if (ratio == 0.0) printf("%20"PRIu64" %s (did not run: competing session)\n", val, fds[i].name); else printf("%20"PRIu64" %s (scaled from %.2f%% of time)\n", val, fds[i].name, ratio*100.0); } /* * kill child process */ kill(SIGKILL, pid); /* * close pipes */ close(pr[1]); close(pw[0]); /* * and destroy our session */ for(i=0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); } static void usage(void) { printf("usage: self_pipe [-h] [-c cpu] [-d delay] [-e event1,event2,...]\n"); } int main(int argc, char **argv) { int c; options.cpu = -1; options.delay = -1; while ((c=getopt(argc, argv,"he:c:d:")) != -1) { switch(c) { case 'e': options.events = optarg; break; case 'c': options.cpu = atoi(optarg); break; case 'd': options.delay = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown error"); } } if (!options.events) options.events = "cycles,instructions"; if (options.delay == -1) options.delay = 10; measure(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/evt2raw.c0000600003276200002170000000563412247131124020663 0ustar ralphundrgrad/* * evt2raw.c - example which converts an event string (event + modifiers) to * a raw event code usable by the perf tool. * * Copyright (c) 2010 IBM Corp. * Contributed by Corey Ashford * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include static void usage(void) { printf("usage: evt2raw [-v] \n" " is the symbolic event, including modifiers, to " "translate to a raw code.\n"); } #define MAX_MODIFIER_CHARS 5 /* u,k,h plus the colon and null terminator */ int main(int argc, char **argv) { int ret, c, verbose = 0; struct perf_event_attr pea; char *event_str, *fstr = NULL; char modifiers[MAX_MODIFIER_CHARS]; if (argc < 2) { usage(); return 1; } while ( (c=getopt(argc, argv, "hv")) != -1) { switch(c) { case 'h': usage(); exit(0); case 'v': verbose = 1; break; default: exit(1); } } event_str = argv[optind]; ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "Internal error: pfm_initialize returned %s", pfm_strerror(ret)); pea.size = sizeof(struct perf_event_attr); ret = pfm_get_perf_event_encoding(event_str, PFM_PLM0|PFM_PLM3|PFM_PLMH, &pea, &fstr, NULL); if (ret != PFM_SUCCESS) errx(1, "Error: pfm_get_perf_encoding returned %s", pfm_strerror(ret)); if (pea.type != PERF_TYPE_RAW) errx(1, "Error: %s is not a raw hardware event", event_str); modifiers[0] = '\0'; if (pea.exclude_user | pea.exclude_kernel | pea.exclude_hv) { strcat(modifiers, ":"); if (!pea.exclude_user) strcat(modifiers, "u"); if (!pea.exclude_kernel) strcat(modifiers, "k"); if (!pea.exclude_hv) strcat(modifiers, "h"); } if (verbose) printf("r%"PRIx64"%s\t%s\n", pea.config, modifiers, fstr); else printf("r%"PRIx64"%s\n", pea.config, modifiers); if (fstr) free(fstr); return 0; } papi-5.3.0/src/libpfm4/perf_examples/self_basic.c0000600003276200002170000000766412247131124021370 0ustar ralphundrgrad/* * self-basic.c - example of a simple self monitoring task no-helper * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #include #define N 30 static unsigned long fib(unsigned long n) { if (n == 0) return 0; if (n == 1) return 2; return fib(n-1)+fib(n-2); } int main(int argc, char **argv) { struct perf_event_attr attr; int fd, ret; uint64_t count = 0, values[3]; setlocale(LC_ALL, ""); /* * Initialize libpfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "cannot initialize library: %s", pfm_strerror(ret)); memset(&attr, 0, sizeof(attr)); /* * 1st argument: event string * 2nd argument: default privilege level (used if not specified in the event string) * 3rd argument: the perf_event_attr to initialize */ ret = pfm_get_perf_event_encoding("cycles", PFM_PLM0|PFM_PLM3, &attr, NULL, NULL); if (ret != PFM_SUCCESS) errx(1, "cannot find encoding: %s", pfm_strerror(ret)); /* * request timing information because event may be multiplexed * and thus it may not count all the time. The scaling information * will be used to scale the raw count as if the event had run all * along */ attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING; /* do not start immediately after perf_event_open() */ attr.disabled = 1; /* * create the event and attach to self * Note that it attaches only to the main thread, there is no inheritance * to threads that may be created subsequently. * * if mulithreaded, then getpid() must be replaced by gettid() */ fd = perf_event_open(&attr, getpid(), -1, -1, 0); if (fd < 0) err(1, "cannot create event"); /* * start counting now */ ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); if (ret) err(1, "ioctl(enable) failed"); printf("Fibonacci(%d)=%lu\n", N, fib(N)); /* * stop counting */ ret = ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); if (ret) err(1, "ioctl(disable) failed"); /* * read the count + scaling values * * It is not necessary to stop an event to read its value */ ret = read(fd, values, sizeof(values)); if (ret != sizeof(values)) err(1, "cannot read results: %s", strerror(errno)); /* * scale count * * values[0] = raw count * values[1] = TIME_ENABLED * values[2] = TIME_RUNNING */ if (values[2]) count = (uint64_t)((double)values[0] * values[1]/values[2]); printf("count=%'"PRIu64"\n", count); close(fd); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/self_smpl_multi.c0000600003276200002170000002641412247131124022466 0ustar ralphundrgrad/* * * self_smpl_multi.c - multi-thread self-sampling program * * Copyright (c) 2009 Google, Inc * Modified by Stephane Eranian * * Based on: * Copyright (c) 2008 Mark W. Krentel * Contributed by Mark W. Krentel * Modified by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * Test perfmon overflow without PAPI. * * Create a new thread, launch perfmon overflow counters in both * threads, print the number of interrupts per thread and per second, * and look for anomalous interrupts. Look for mismatched thread * ids, bad message type, or failed pfm_restart(). * * self_smpl_multi is a test program to stress signal delivery in the context * of a multi-threaded self-sampling program which is common with PAPI and HPC. * * There is an issue with existing (as of 2.6.30) kernel which do not provide * a reliable way of having the signal delivered to the thread in which the * counter overflow occurred. This is problematic for many self-monitoring * program. * * This program demonstrates the issue by tracking the number of times * the signal goes to the wrong thread. The bad behavior is exacerbated * if the monitored threads, themselves, already use signals. Here we * use SIGLARM. * * Note that kernel developers have been made aware of this problem and * a fix has been proposed. It introduces a new F_SETOWN_EX command to * fcntl(). */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define PROGRAM_TIME 8 #define THRESHOLD 20000000 static int program_time = PROGRAM_TIME; static int threshold = THRESHOLD; static int signum = SIGIO; static pthread_barrier_t barrier; static int buffer_pages = 1; #define MAX_THR 128 /* * the following definitions come * from the F_SETOWN_EX patch from Peter Zijlstra * Check out: http://lkml.org/lkml/2009/8/4/128 */ #ifndef F_SETOWN_EX #define F_SETOWN_EX 15 #define F_GETOWN_EX 16 #define F_OWNER_TID 0 #define F_OWNER_PID 1 #define F_OWNER_PGRP 2 struct f_owner_ex { int type; pid_t pid; }; #endif struct over_args { int fd; pid_t tid; int id; perf_event_desc_t *fds; }; struct over_args fd2ov[MAX_THR]; long count[MAX_THR]; long total[MAX_THR]; long iter[MAX_THR]; long mismatch[MAX_THR]; long bad_msg[MAX_THR]; long bad_restart[MAX_THR]; int fown; static __thread int myid; /* TLS */ static __thread perf_event_desc_t *fds; /* TLS */ static __thread int num_fds; /* TLS */ pid_t gettid(void) { return (pid_t)syscall(__NR_gettid); } void user_callback(int m) { count[m]++; total[m]++; } void do_cycles(void) { struct timeval start, last, now; unsigned long x, sum; gettimeofday(&start, NULL); last = start; count[myid] = 0; total[myid] = 0; iter[myid] = 0; do { sum = 1; for (x = 1; x < 250000; x++) { /* signal pending to private queue because of * pthread_kill(), i.e., tkill() */ if ((x % 5000) == 0) pthread_kill(pthread_self(), SIGUSR1); sum += x; } iter[myid]++; gettimeofday(&now, NULL); if (now.tv_sec > last.tv_sec) { printf("%ld: myid = %3d, fd = %3d, count = %4ld, iter = %4ld, rate = %ld/Kiter\n", now.tv_sec - start.tv_sec, myid, fd2ov[myid].fd, count[myid], iter[myid], (1000 * count[myid])/iter[myid]); count[myid] = 0; iter[myid] = 0; last = now; } } while (now.tv_sec < start.tv_sec + program_time); } #define DPRINT(str) \ printf("(%s) si->fd = %d, ov->self = 0x%lx, self = 0x%lx\n", \ str, fd, (unsigned long)ov->self, (unsigned long)self) void sigusr1_handler(int sig, siginfo_t *info, void *context) { } /* * a signal handler cannot safely invoke printf() */ void sigio_handler(int sig, siginfo_t *info, void *context) { perf_event_desc_t *fdx; struct perf_event_header ehdr; struct over_args *ov; int fd, i, ret; pid_t tid; /* * positive si_code indicate kernel generated signal * which is normal for SIGIO */ if (info->si_code < 0) errx(1, "signal not generated by kernel"); /* * SIGPOLL = SIGIO * expect POLL_HUP instead of POLL_IN because we are * in one-shot mode (IOC_REFRESH) */ if (info->si_code != POLL_HUP) errx(1, "signal not generated by SIGIO: %d", info->si_code); fd = info->si_fd; tid = gettid(); for(i=0; i < MAX_THR; i++) if (fd2ov[i].fd == fd) break; if (i == MAX_THR) errx(1, "bad info.si_fd: %d", fd); ov = &fd2ov[i]; /* * current thread id may not always match the id * associated with the file descriptor * * We need to use the other's thread fds info * otherwise, it is going to get stuck with no * more samples generated */ if (tid != ov->tid) { mismatch[myid]++; fdx = ov->fds; } else { fdx = fds; } /* * read sample header */ ret = perf_read_buffer(fdx+0, &ehdr, sizeof(ehdr)); if (ret) { errx(1, "cannot read event header"); } /* * message we do not handle */ if (ehdr.type != PERF_RECORD_SAMPLE) { bad_msg[myid]++; goto skip; } user_callback(myid); skip: /* mark sample as consumed */ perf_skip_buffer(fdx+0, ehdr.size); /* * re-arm period, next notification after wakeup_events */ ret = ioctl(fd, PERF_EVENT_IOC_REFRESH, 1); if (ret) err(1, "cannot refresh"); } void overflow_start(char *name) { struct f_owner_ex fown_ex; struct over_args *ov; size_t pgsz; int ret, fd, flags; fds = NULL; num_fds = 0; ret = perf_setup_list_events("cycles", &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot monitor event"); pgsz = sysconf(_SC_PAGESIZE); ov = &fd2ov[myid]; /* do not enable now */ fds[0].hw.disabled = 1; /* notify after 1 sample */ fds[0].hw.wakeup_events = 1; fds[0].hw.sample_type = PERF_SAMPLE_IP; fds[0].hw.sample_period = threshold; fds[0].hw.read_format = 0; fds[0].fd = fd = perf_event_open(&fds[0].hw, gettid(), -1, -1, 0); if (fd == -1) err(1, "cannot attach event %s", fds[0].name); ov->fd = fd; ov->tid = gettid(); ov->id = myid; ov->fds = fds; flags = fcntl(fd, F_GETFL, 0); if (fcntl(fd, F_SETFL, flags | O_ASYNC) < 0) err(1, "fcntl SETFL failed"); fown_ex.type = F_OWNER_TID; fown_ex.pid = gettid(); ret = fcntl(fd, (fown ? F_SETOWN_EX : F_SETOWN), (fown ? (unsigned long)&fown_ex: (unsigned long)gettid())); if (ret) err(1, "fcntl SETOWN failed"); if (fcntl(fd, F_SETSIG, signum) < 0) err(1, "fcntl SETSIG failed"); fds[0].buf = mmap(NULL, (buffer_pages + 1)* pgsz, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (fds[0].buf == MAP_FAILED) err(1, "cannot mmap buffer"); fds[0].pgmsk = (buffer_pages * pgsz) - 1; printf("launch %s: fd: %d, tid: %d\n", name, fd, ov->tid); /* * activate event for wakeup_events (samples) */ ret = ioctl(fd, PERF_EVENT_IOC_REFRESH , 1); if (ret == -1) err(1, "cannot refresh"); } void overflow_stop(void) { int ret; ret = ioctl(fd2ov[myid].fd, PERF_EVENT_IOC_DISABLE, 0); if (ret) err(1, "cannot stop"); } void * my_thread(void *v) { int retval = 0; myid = (unsigned long)v; pthread_barrier_wait(&barrier); overflow_start("side"); do_cycles(); overflow_stop(); perf_free_fds(fds, num_fds); pthread_exit((void *)&retval); } static void usage(void) { printf("self_smpl_multi [-t secs] [-p period] [-s signal] [-f] [-n threads]\n" "-t secs: duration of the run in seconds\n" "-p period: sampling period in CPU cycles\n" "-s signal: signal to use (default: SIGIO)\n" "-n thread: number of threads to create (default: 1)\n" "-f : use F_SETOWN_EX for correct delivery of signal to thread (default: off)\n"); } /* * Program args: program_time, threshold, signum. */ int main(int argc, char **argv) { struct sigaction sa; pthread_t allthr[MAX_THR]; sigset_t set, old, new; int i, ret, max_thr = 1; while ((i=getopt(argc, argv, "t:p:s:fhn:")) != EOF) { switch(i) { case 'h': usage(); return 0; case 't': program_time = atoi(optarg); break; case 'p': threshold = atoi(optarg); break; case 's': signum = atoi(optarg); break; case 'f': fown = 1; break; case 'n': max_thr = atoi(optarg); if (max_thr >= MAX_THR) errx(1, "no more than %d threads", MAX_THR); break; default: errx(1, "invalid option"); } } printf("program_time = %d, threshold = %d, signum = %d fcntl(%s), threads = %d\n", program_time, threshold, signum, fown ? "F_SETOWN_EX" : "F_SETOWN", max_thr); for (i = 0; i < MAX_THR; i++) { mismatch[i] = 0; bad_msg[i] = 0; bad_restart[i] = 0; } memset(&sa, 0, sizeof(sa)); sigemptyset(&set); sa.sa_sigaction = sigusr1_handler; sa.sa_mask = set; sa.sa_flags = SA_SIGINFO; if (sigaction(SIGUSR1, &sa, NULL) != 0) errx(1, "sigaction failed"); memset(&sa, 0, sizeof(sa)); sigemptyset(&set); sa.sa_sigaction = sigio_handler; sa.sa_mask = set; sa.sa_flags = SA_SIGINFO; if (sigaction(signum, &sa, NULL) != 0) errx(1, "sigaction failed"); if (pfm_initialize() != PFM_SUCCESS) errx(1, "pfm_initialize failed"); /* * +1 because main thread is also using the barrier */ pthread_barrier_init(&barrier, 0, max_thr+1); for(i=0; i < max_thr; i++) { ret = pthread_create(allthr+i, NULL, my_thread, (void *)(unsigned long)i); if (ret) err(1, "pthread_create failed"); } myid = i; sigemptyset(&set); sigemptyset(&new); sigaddset(&set, SIGIO); sigaddset(&new, SIGIO); if (pthread_sigmask(SIG_BLOCK, &set, NULL)) err(1, "cannot mask SIGIO in main thread"); ret = sigprocmask(SIG_SETMASK, NULL, &old); if (ret) err(1, "sigprocmask failed"); if (sigismember(&old, SIGIO)) { warnx("program started with SIGIO masked, unmasking it now\n"); ret = sigprocmask(SIG_UNBLOCK, &new, NULL); if (ret) err(1, "sigprocmask failed"); } pthread_barrier_wait(&barrier); printf("\n\n"); for (i = 0; i < max_thr; i++) { pthread_join(allthr[i], NULL); } printf("\n\n"); for (i = 0; i < max_thr; i++) { printf("myid = %3d, fd = %3d, total = %4ld, mismatch = %ld, " "bad_msg = %ld, bad_restart = %ld\n", fd2ov[i].id, fd2ov[i].fd, total[i], mismatch[i], bad_msg[i], bad_restart[i]); } /* free libpfm resources cleanly */ pfm_terminate(); return (0); } papi-5.3.0/src/libpfm4/perf_examples/Makefile0000600003276200002170000000525512247131124020564 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk DIRS= ifeq ($(ARCH),ia64) #DIRS +=ia64 endif ifeq ($(ARCH),x86_64) DIRS += x86 endif ifeq ($(ARCH),i386) DIRS += x86 endif ifeq ($(CONFIG_PFMLIB_ARCH_CRAYXT),y) CFLAGS += -DCONFIG_PFMLIB_ARCH_CRAYXT endif CFLAGS+= -I. -D_GNU_SOURCE -pthread PERF_EVENT_HDR=$(TOPDIR)/include/perfmon/pfmlib_perf_event.h LPC_UTILS=perf_util.o LPC_UTILS_HDR=perf_util.h TARGETS+=self self_basic self_count task task_attach_timeout syst \ notify_self notify_group task_smpl self_smpl_multi \ self_pipe syst_count task_cpu syst_smpl evt2raw EXAMPLESDIR=$(DOCDIR)/perf_examples all: $(TARGETS) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done # # Many systems don't have ncurses-devel installed # rtop: rtop.o $(PFMLIB) -$(CC) $(CFLAGS) $(LDFLAGS) -D_GNU_SOURCE -o $@ $^ $(LIBS) -lpthread -lncurses -lm $(TARGETS): %:%.o $(LPC_UTILS) $(PFMLIB) $(PERF_EVENT_HDR) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $< $(LPC_UTILS) $(PFMLIB) $(LIBS) $(LPC_UTILS): $(LPC_UTILS_HDR) clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done $(RM) -f *.o $(TARGETS) *~ distclean: clean install_examples: $(TARGETS) @echo installing: $(TARGETS) -mkdir -p $(EXAMPLESDIR) $(INSTALL) -m 755 $(TARGETS) $(EXAMPLESDIR) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done # # examples are installed as part of the RPM install, typically in /usr/share/doc/libpfm-X.Y/ # .PHONY: install depend install_examples papi-5.3.0/src/libpfm4/perf_examples/perf_util.h0000600003276200002170000001111312247131124021254 0ustar ralphundrgrad/* * perf_util.h - helper functions for perf_events * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PERF_UTIL_H__ #define __PERF_UTIL_H__ #include #include #include #include typedef struct { struct perf_event_attr hw; uint64_t values[3]; uint64_t prev_values[3]; char *name; uint64_t id; /* event id kernel */ void *buf; size_t pgmsk; int group_leader; int fd; int max_fds; int idx; /* opaque libpfm event identifier */ char *fstr; /* fstr from library, must be freed */ } perf_event_desc_t; /* handy shortcut */ #define PERF_FORMAT_SCALE (PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING) extern int perf_setup_argv_events(const char **argv, perf_event_desc_t **fd, int *num_fds); extern int perf_setup_list_events(const char *events, perf_event_desc_t **fd, int *num_fds); extern int perf_read_buffer(perf_event_desc_t *hw, void *buf, size_t sz); extern void perf_free_fds(perf_event_desc_t *fds, int num_fds); extern void perf_skip_buffer(perf_event_desc_t *hw, size_t sz); static inline int perf_read_buffer_32(perf_event_desc_t *hw, void *buf) { return perf_read_buffer(hw, buf, sizeof(uint32_t)); } static inline int perf_read_buffer_64(perf_event_desc_t *hw, void *buf) { return perf_read_buffer(hw, buf, sizeof(uint64_t)); } /* * values[0] = raw count * values[1] = TIME_ENABLED * values[2] = TIME_RUNNING */ static inline uint64_t perf_scale(uint64_t *values) { uint64_t res = 0; if (!values[2] && !values[1] && values[0]) warnx("WARNING: time_running = 0 = time_enabled, raw count not zero\n"); if (values[2] > values[1]) warnx("WARNING: time_running > time_enabled\n"); if (values[2]) res = (uint64_t)((double)values[0] * values[1]/values[2]); return res; } static inline uint64_t perf_scale_delta(uint64_t *values, uint64_t *prev_values) { double pval[3], val[3]; uint64_t res = 0; if (!values[2] && !values[1] && values[0]) warnx("WARNING: time_running = 0 = time_enabled, raw count not zero\n"); if (values[2] > values[1]) warnx("WARNING: time_running > time_enabled\n"); if (values[2] - prev_values[2]) { /* covnert everything to double to avoid overflows! */ pval[0] = prev_values[0]; pval[1] = prev_values[1]; pval[2] = prev_values[2]; val[0] = values[0]; val[1] = values[1]; val[2] = values[2]; res = (uint64_t)(((val[0] - pval[0]) * (val[1] - pval[1])/ (val[2] - pval[2]))); } return res; } /* * TIME_RUNNING/TIME_ENABLED */ static inline double perf_scale_ratio(uint64_t *values) { if (!values[1]) return 0.0; return values[2]*1.0/values[1]; } static inline int perf_fd2event(perf_event_desc_t *fds, int num_events, int fd) { int i; for(i=0; i < num_events; i++) if (fds[i].fd == fd) return i; return -1; } /* * id = PERF_FORMAT_ID */ static inline int perf_id2event(perf_event_desc_t *fds, int num_events, uint64_t id) { int j; for(j=0; j < num_events; j++) if (fds[j].id == id) return j; return -1; } static inline int perf_is_group_leader(perf_event_desc_t *fds, int idx) { return fds[idx].group_leader == idx; } extern int perf_get_group_nevents(perf_event_desc_t *fds, int num, int leader); extern int perf_display_sample(perf_event_desc_t *fds, int num_fds, int idx, struct perf_event_header *ehdr, FILE *fp); extern uint64_t display_lost(perf_event_desc_t *hw, perf_event_desc_t *fds, int num_fds, FILE *fp); extern void display_exit(perf_event_desc_t *hw, FILE *fp); extern void display_freq(int mode, perf_event_desc_t *hw, FILE *fp); #endif papi-5.3.0/src/libpfm4/perf_examples/notify_self.c0000600003276200002170000001630412247131124021606 0ustar ralphundrgrad/* * notify_self.c - example of how you can use overflow notifications * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define SMPL_PERIOD 2400000000ULL static volatile unsigned long notification_received; static perf_event_desc_t *fds = NULL; static int num_fds = 0; static int buffer_pages = 1; /* size of buffer payload (must be power of 2)*/ static void sigio_handler(int n, siginfo_t *info, void *uc) { struct perf_event_header ehdr; int ret, id; /* * positive si_code indicate kernel generated signal * which is normal for SIGIO */ if (info->si_code < 0) errx(1, "signal not generated by kernel"); /* * SIGPOLL = SIGIO * expect POLL_HUP instead of POLL_IN because we are * in one-shot mode (IOC_REFRESH) */ if (info->si_code != POLL_HUP) errx(1, "signal not generated by SIGIO"); id = perf_fd2event(fds, num_fds, info->si_fd); if (id == -1) errx(1, "no event associated with fd=%d", info->si_fd); ret = perf_read_buffer(fds+id, &ehdr, sizeof(ehdr)); if (ret) errx(1, "cannot read event header"); if (ehdr.type != PERF_RECORD_SAMPLE) { warnx("unexpected sample type=%d, skipping\n", ehdr.type); perf_skip_buffer(fds+id, ehdr.size); goto skip; } printf("Notification:%lu ", notification_received); ret = perf_display_sample(fds, num_fds, 0, &ehdr, stdout); /* * increment our notification counter */ notification_received++; skip: /* * rearm the counter for one more shot */ ret = ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1); if (ret == -1) err(1, "cannot refresh"); } /* * infinite loop waiting for notification to get out */ void busyloop(void) { /* * busy loop to burn CPU cycles */ for(;notification_received < 20;) ; } int main(int argc, char **argv) { struct sigaction act; sigset_t new, old; uint64_t *val; size_t sz, pgsz; int ret, i; setlocale(LC_ALL, ""); ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "Cannot initialize library: %s", pfm_strerror(ret)); pgsz = sysconf(_SC_PAGESIZE); /* * Install the signal handler (SIGIO) * need SA_SIGINFO because we need the fd * in the signal handler */ memset(&act, 0, sizeof(act)); act.sa_sigaction = sigio_handler; act.sa_flags = SA_SIGINFO; sigaction (SIGIO, &act, 0); sigemptyset(&old); sigemptyset(&new); sigaddset(&new, SIGIO); ret = sigprocmask(SIG_SETMASK, NULL, &old); if (ret) err(1, "sigprocmask failed"); if (sigismember(&old, SIGIO)) { warnx("program started with SIGIO masked, unmasking it now\n"); ret = sigprocmask(SIG_UNBLOCK, &new, NULL); if (ret) err(1, "sigprocmask failed"); } /* * allocates fd for us */ ret = perf_setup_list_events("cycles," "instructions", &fds, &num_fds); if (ret || (num_fds == 0)) exit(1); fds[0].fd = -1; for(i=0; i < num_fds; i++) { /* want a notification for every each added to the buffer */ fds[i].hw.disabled = !i; if (!i) { fds[i].hw.wakeup_events = 1; fds[i].hw.sample_type = PERF_SAMPLE_IP|PERF_SAMPLE_READ|PERF_SAMPLE_PERIOD; fds[i].hw.sample_period = SMPL_PERIOD; /* read() returns event identification for signal handler */ fds[i].hw.read_format = PERF_FORMAT_GROUP|PERF_FORMAT_ID|PERF_FORMAT_SCALE; } fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, fds[0].fd, 0); if (fds[i].fd == -1) err(1, "cannot attach event %s", fds[i].name); } sz = (3+2*num_fds)*sizeof(uint64_t); val = malloc(sz); if (!val) err(1, "cannot allocated memory"); /* * On overflow, the non lead events are stored in the sample. * However we need some key to figure the order in which they * were laid out in the buffer. The file descriptor does not * work for this. Instead, we extract a unique ID for each event. * That id will be part of the sample for each event value. * Therefore we will be able to match value to events * * PERF_FORMAT_ID: returns unique 64-bit identifier in addition * to event value. */ if (fds[0].fd == -1) errx(1, "cannot create event 0"); ret = read(fds[0].fd, val, sz); if (ret == -1) err(1, "cannot read id %zu", sizeof(val)); /* * we are using PERF_FORMAT_GROUP, therefore the structure * of val is as follows: * * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * We are skipping the first 3 values (nr, time_enabled, time_running) * and then for each event we get a pair of values. */ for(i=0; i < num_fds; i++) { fds[i].id = val[2*i+1+3]; printf("%"PRIu64" %s\n", fds[i].id, fds[i].name); } fds[0].buf = mmap(NULL, (buffer_pages+1)*pgsz, PROT_READ|PROT_WRITE, MAP_SHARED, fds[0].fd, 0); if (fds[0].buf == MAP_FAILED) err(1, "cannot mmap buffer"); fds[0].pgmsk = (buffer_pages * pgsz) - 1; /* * setup asynchronous notification on the file descriptor */ ret = fcntl(fds[0].fd, F_SETFL, fcntl(fds[0].fd, F_GETFL, 0) | O_ASYNC); if (ret == -1) err(1, "cannot set ASYNC"); /* * necessary if we want to get the file descriptor for * which the SIGIO is sent in siginfo->si_fd. * SA_SIGINFO in itself is not enough */ ret = fcntl(fds[0].fd, F_SETSIG, SIGIO); if (ret == -1) err(1, "cannot setsig"); /* * get ownership of the descriptor */ ret = fcntl(fds[0].fd, F_SETOWN, getpid()); if (ret == -1) err(1, "cannot setown"); /* * enable the group for one period */ ret = ioctl(fds[0].fd, PERF_EVENT_IOC_REFRESH , 1); if (ret == -1) err(1, "cannot refresh"); busyloop(); ret = ioctl(fds[0].fd, PERF_EVENT_IOC_DISABLE, 1); if (ret == -1) err(1, "cannot disable"); /* * destroy our session */ for(i=0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); free(val); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/task.c0000600003276200002170000002112612247131124020225 0ustar ralphundrgrad/* * task_inherit.c - example of a task counting event in a tree of child processes * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define MAX_GROUPS 256 typedef struct { const char *events[MAX_GROUPS]; int num_groups; int format_group; int inherit; int print; int pin; pid_t pid; } options_t; static options_t options; static volatile int quit; int child(char **arg) { /* * execute the requested command */ execvp(arg[0], arg); errx(1, "cannot exec: %s\n", arg[0]); /* not reached */ } static void read_groups(perf_event_desc_t *fds, int num) { uint64_t *values = NULL; size_t new_sz, sz = 0; int i, evt; ssize_t ret; /* * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * } && PERF_FORMAT_GROUP * * we do not use FORMAT_ID in this program */ for (evt = 0; evt < num; ) { int num_evts_to_read; if (options.format_group) { num_evts_to_read = perf_get_group_nevents(fds, num, evt); new_sz = sizeof(uint64_t) * (3 + num_evts_to_read); } else { num_evts_to_read = 1; new_sz = sizeof(uint64_t) * 3; } if (new_sz > sz) { sz = new_sz; values = realloc(values, sz); } if (!values) err(1, "cannot allocate memory for values\n"); ret = read(fds[evt].fd, values, new_sz); if (ret != (ssize_t)new_sz) { /* unsigned */ if (ret == -1) err(1, "cannot read values event %s", fds[evt].name); /* likely pinned and could not be loaded */ warnx("could not read event %d, tried to read %zu bytes, but got %zd", evt, new_sz, ret); } /* * propagate to save area */ for (i = evt; i < (evt + num_evts_to_read); i++) { if (options.format_group) values[0] = values[3 + (i - evt)]; /* * scaling because we may be sharing the PMU and * thus may be multiplexed */ fds[i].values[0] = values[0]; fds[i].values[1] = values[1]; fds[i].values[2] = values[2]; } evt += num_evts_to_read; } if (values) free(values); } static void print_counts(perf_event_desc_t *fds, int num) { double ratio; uint64_t val, delta; int i; read_groups(fds, num); for(i=0; i < num; i++) { val = perf_scale(fds[i].values); delta = perf_scale_delta(fds[i].values, fds[i].prev_values); ratio = perf_scale_ratio(fds[i].values); /* separate groups */ if (perf_is_group_leader(fds, i)) putchar('\n'); if (options.print) printf("%'20"PRIu64" %'20"PRIu64" %s (%.2f%% scaling, ena=%'"PRIu64", run=%'"PRIu64")\n", val, delta, fds[i].name, (1.0-ratio)*100.0, fds[i].values[1], fds[i].values[2]); else printf("%'20"PRIu64" %s (%.2f%% scaling, ena=%'"PRIu64", run=%'"PRIu64")\n", val, fds[i].name, (1.0-ratio)*100.0, fds[i].values[1], fds[i].values[2]); fds[i].prev_values[0] = fds[i].values[0]; fds[i].prev_values[1] = fds[i].values[1]; fds[i].prev_values[2] = fds[i].values[2]; } } static void sig_handler(int n) { quit = 1; } int parent(char **arg) { perf_event_desc_t *fds = NULL; int status, ret, i, num_fds = 0, grp, group_fd; int ready[2], go[2]; char buf; pid_t pid; go[0] = go[1] = -1; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed"); for (grp = 0; grp < options.num_groups; grp++) { int ret; ret = perf_setup_list_events(options.events[grp], &fds, &num_fds); if (ret || !num_fds) exit(1); } pid = options.pid; if (!pid) { ret = pipe(ready); if (ret) err(1, "cannot create pipe ready"); ret = pipe(go); if (ret) err(1, "cannot create pipe go"); /* * Create the child task */ if ((pid=fork()) == -1) err(1, "Cannot fork process"); /* * and launch the child code * * The pipe is used to avoid a race condition * between for() and exec(). We need the pid * of the new tak but we want to start measuring * at the first user level instruction. Thus we * need to prevent exec until we have attached * the events. */ if (pid == 0) { close(ready[0]); close(go[1]); /* * let the parent know we exist */ close(ready[1]); if (read(go[0], &buf, 1) == -1) err(1, "unable to read go_pipe"); exit(child(arg)); } close(ready[1]); close(go[0]); if (read(ready[0], &buf, 1) == -1) err(1, "unable to read child_ready_pipe"); close(ready[0]); } for(i=0; i < num_fds; i++) { int is_group_leader; /* boolean */ is_group_leader = perf_is_group_leader(fds, i); if (is_group_leader) { /* this is the group leader */ group_fd = -1; } else { group_fd = fds[fds[i].group_leader].fd; } /* * create leader disabled with enable_on-exec */ if (!options.pid) { fds[i].hw.disabled = is_group_leader; fds[i].hw.enable_on_exec = is_group_leader; } fds[i].hw.read_format = PERF_FORMAT_SCALE; /* request timing information necessary for scaling counts */ if (is_group_leader && options.format_group) fds[i].hw.read_format |= PERF_FORMAT_GROUP; if (options.inherit) fds[i].hw.inherit = 1; if (options.pin && is_group_leader) fds[i].hw.pinned = 1; fds[i].fd = perf_event_open(&fds[i].hw, pid, -1, group_fd, 0); if (fds[i].fd == -1) { warn("cannot attach event%d %s", i, fds[i].name); goto error; } } if (!options.pid && go[1] > -1) close(go[1]); if (options.print) { if (!options.pid) { while(waitpid(pid, &status, WNOHANG) == 0) { sleep(1); print_counts(fds, num_fds); } } else { while(quit == 0) { sleep(1); print_counts(fds, num_fds); } } } else { if (!options.pid) waitpid(pid, &status, 0); else pause(); print_counts(fds, num_fds); } for(i=0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); return 0; error: free(fds); if (!options.pid) kill(SIGKILL, pid); /* free libpfm resources cleanly */ pfm_terminate(); return -1; } static void usage(void) { printf("usage: task [-h] [-i] [-g] [-p] [-P] [-t pid] [-e event1,event2,...] cmd\n" "-h\t\tget help\n" "-i\t\tinherit across fork\n" "-f\t\tuse PERF_FORMAT_GROUP for reading up counts (experimental, not working)\n" "-p\t\tprint counts every second\n" "-P\t\tpin events\n" "-t pid\tmeasure existing pid\n" "-e ev,ev\tgroup of events to measure (multiple -e switches are allowed)\n" ); } int main(int argc, char **argv) { int c; setlocale(LC_ALL, ""); while ((c=getopt(argc, argv,"+he:ifpPt:")) != -1) { switch(c) { case 'e': if (options.num_groups < MAX_GROUPS) { options.events[options.num_groups++] = optarg; } else { errx(1, "you cannot specify more than %d groups.\n", MAX_GROUPS); } break; case 'f': options.format_group = 1; break; case 'p': options.print = 1; break; case 'P': options.pin = 1; break; case 'i': options.inherit = 1; break; case 't': options.pid = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown error"); } } if (options.num_groups == 0) { options.events[0] = "cycles,instructions"; options.num_groups = 1; } if (!argv[optind] && !options.pid) errx(1, "you must specify a command to execute or a thread to attach to\n"); signal(SIGINT, sig_handler); return parent(argv+optind); } papi-5.3.0/src/libpfm4/perf_examples/notify_group.c0000600003276200002170000001257712247131124022021 0ustar ralphundrgrad/* * notify_group.c - self-sampling multuiple events in one group * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define SMPL_PERIOD 2400000000ULL typedef struct { uint64_t ip; } sample_t; static volatile unsigned long notification_received; static perf_event_desc_t *fds; static int num_fds; static int buffer_pages = 1; /* size of buffer payload (must be power of 2) */ static void sigio_handler(int n, siginfo_t *info, struct sigcontext *sc) { struct perf_event_header ehdr; uint64_t ip; int id, ret; id = perf_fd2event(fds, num_fds, info->si_fd); if (id == -1) errx(1, "cannot find event for descriptor %d", info->si_fd); ret = perf_read_buffer(fds+id, &ehdr, sizeof(ehdr)); if (ret) errx(1, "cannot read event header"); if (ehdr.type != PERF_RECORD_SAMPLE) { warnx("unknown event type %d, skipping", ehdr.type); perf_skip_buffer(fds+id, ehdr.size - sizeof(ehdr)); goto skip; } ret = perf_read_buffer(fds+id, &ip, sizeof(ip)); if (ret) errx(1, "cannot read IP"); notification_received++; printf("Notification %lu: 0x%"PRIx64" fd=%d %s\n", notification_received, ip, info->si_fd, fds[id].name); skip: /* * rearm the counter for one more shot */ ret = ioctl(info->si_fd, PERF_EVENT_IOC_REFRESH, 1); if (ret == -1) err(1, "cannot refresh"); } /* * infinite loop waiting for notification to get out */ void busyloop(void) { /* * busy loop to burn CPU cycles */ for(;notification_received < 1024;) ; } int main(int argc, char **argv) { struct sigaction act; sigset_t new, old; size_t pgsz; int ret, i; ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "Cannot initialize library: %s", pfm_strerror(ret)); pgsz = sysconf(_SC_PAGESIZE); /* * Install the signal handler (SIGIO) */ memset(&act, 0, sizeof(act)); act.sa_sigaction = (void *)sigio_handler; act.sa_flags = SA_SIGINFO; sigaction (SIGIO, &act, 0); sigemptyset(&old); sigemptyset(&new); sigaddset(&new, SIGIO); ret = sigprocmask(SIG_SETMASK, NULL, &old); if (ret) err(1, "sigprocmask failed"); if (sigismember(&old, SIGIO)) { warnx("program started with SIGIO masked, unmasking it now\n"); ret = sigprocmask(SIG_UNBLOCK, &new, NULL); if (ret) err(1, "sigprocmask failed"); } /* * allocates fd for us */ ret = perf_setup_list_events("cycles," "instructions," "cycles", &fds, &num_fds); if (ret || !num_fds) exit(1); fds[0].fd = -1; for(i=0; i < num_fds; i++) { /* want a notification for each sample added to the buffer */ fds[i].hw.disabled = !!i; printf("i=%d disabled=%d\n", i, fds[i].hw.disabled); fds[i].hw.wakeup_events = 1; fds[i].hw.sample_type = PERF_SAMPLE_IP; fds[i].hw.sample_period = SMPL_PERIOD; fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, fds[0].fd, 0); if (fds[i].fd == -1) { warn("cannot attach event %s", fds[i].name); goto error; } fds[i].buf = mmap(NULL, (buffer_pages + 1)*pgsz, PROT_READ|PROT_WRITE, MAP_SHARED, fds[i].fd, 0); if (fds[i].buf == MAP_FAILED) err(1, "cannot mmap buffer"); /* * setup asynchronous notification on the file descriptor */ ret = fcntl(fds[i].fd, F_SETFL, fcntl(fds[i].fd, F_GETFL, 0) | O_ASYNC); if (ret == -1) err(1, "cannot set ASYNC"); /* * necessary if we want to get the file descriptor for * which the SIGIO is sent for in siginfo->si_fd. * SA_SIGINFO in itself is not enough */ ret = fcntl(fds[i].fd, F_SETSIG, SIGIO); if (ret == -1) err(1, "cannot setsig"); /* * get ownership of the descriptor */ ret = fcntl(fds[i].fd, F_SETOWN, getpid()); if (ret == -1) err(1, "cannot setown"); fds[i].pgmsk = (buffer_pages * pgsz) - 1; } for(i=0; i < num_fds; i++) { ret = ioctl(fds[i].fd, PERF_EVENT_IOC_REFRESH , 1); if (ret == -1) err(1, "cannot refresh"); } busyloop(); prctl(PR_TASK_PERF_EVENTS_DISABLE); error: /* * destroy our session */ for(i=0; i < num_fds; i++) if (fds[i].fd > -1) close(fds[i].fd); perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/perf_util.c0000600003276200002170000003761312247131124021264 0ustar ralphundrgrad/* * perf_util.c - helper functions for perf_events * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include "perf_util.h" /* the **fd parameter must point to a null pointer on the first call * max_fds and num_fds must both point to a zero value on the first call * The return value is success (0) vs. failure (non-zero) */ int perf_setup_argv_events(const char **argv, perf_event_desc_t **fds, int *num_fds) { perf_event_desc_t *fd; pfm_perf_encode_arg_t arg; int new_max, ret, num, max_fds; int group_leader; if (!(argv && fds && num_fds)) return -1; fd = *fds; if (fd) { max_fds = fd[0].max_fds; if (max_fds < 2) return -1; num = *num_fds; } else { max_fds = num = 0; /* bootstrap */ } group_leader = num; while(*argv) { if (num == max_fds) { if (max_fds == 0) new_max = 2; else new_max = max_fds << 1; if (new_max < max_fds) { warn("too many entries"); goto error; } fd = realloc(fd, new_max * sizeof(*fd)); if (!fd) { warn("cannot allocate memory"); goto error; } /* reset newly allocated chunk */ memset(fd + max_fds, 0, (new_max - max_fds) * sizeof(*fd)); max_fds = new_max; /* update max size */ fd[0].max_fds = max_fds; } /* ABI compatibility, set before calling libpfm */ fd[num].hw.size = sizeof(fd[num].hw); memset(&arg, 0, sizeof(arg)); arg.attr = &fd[num].hw; arg.fstr = &fd[num].fstr; /* fd[].fstr is NULL */ ret = pfm_get_os_event_encoding(*argv, PFM_PLM0|PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg); if (ret != PFM_SUCCESS) { warnx("event %s: %s", *argv, pfm_strerror(ret)); goto error; } fd[num].name = strdup(*argv); fd[num].group_leader = group_leader; fd[num].idx = arg.idx; num++; argv++; } *num_fds = num; *fds = fd; return 0; error: perf_free_fds(fd, num); return -1; } int perf_setup_list_events(const char *ev, perf_event_desc_t **fd, int *num_fds) { const char **argv; char *p, *q, *events; int i, ret, num = 0; if (!(ev && fd && num_fds)) return -1; events = strdup(ev); if (!events) return -1; q = events; while((p = strchr(q, ','))) { num++; q = p + 1; } num++; num++; /* terminator */ argv = malloc(num * sizeof(char *)); if (!argv) { free(events); return -1; } i = 0; q = events; while((p = strchr(q, ','))) { *p = '\0'; argv[i++] = q; q = p + 1; } argv[i++] = q; argv[i] = NULL; ret = perf_setup_argv_events(argv, fd, num_fds); free(argv); free(events); /* strdup in perf_setup_argv_events() */ return ret; } void perf_free_fds(perf_event_desc_t *fds, int num_fds) { int i; for (i = 0 ; i < num_fds; i++) { free(fds[i].name); free(fds[i].fstr); } free(fds); } int perf_get_group_nevents(perf_event_desc_t *fds, int num, int idx) { int leader; int i; if (idx < 0 || idx >= num) return 0; leader = fds[idx].group_leader; for (i = leader + 1; i < num; i++) { if (fds[i].group_leader != leader) { /* This is a new group leader, so the previous * event was the final event of the preceding * group. */ return i - leader; } } return i - leader; } int perf_read_buffer(perf_event_desc_t *hw, void *buf, size_t sz) { struct perf_event_mmap_page *hdr = hw->buf; size_t pgmsk = hw->pgmsk; void *data; unsigned long tail; size_t avail_sz, m, c; /* * data points to beginning of buffer payload */ data = ((void *)hdr)+sysconf(_SC_PAGESIZE); /* * position of tail within the buffer payload */ tail = hdr->data_tail & pgmsk; /* * size of what is available * * data_head, data_tail never wrap around */ avail_sz = hdr->data_head - hdr->data_tail; if (sz > avail_sz) return -1; /* * sz <= avail_sz, we can satisfy the request */ /* * c = size till end of buffer * * buffer payload size is necessarily * a power of two, so we can do: */ c = pgmsk + 1 - tail; /* * min with requested size */ m = c < sz ? c : sz; /* copy beginning */ memcpy(buf, data+tail, m); /* * copy wrapped around leftover */ if ((sz - m) > 0) memcpy(buf+m, data, sz - m); //printf("\nhead=%lx tail=%lx new_tail=%lx sz=%zu\n", hdr->data_head, hdr->data_tail, hdr->data_tail+sz, sz); hdr->data_tail += sz; return 0; } void perf_skip_buffer(perf_event_desc_t *hw, size_t sz) { struct perf_event_mmap_page *hdr = hw->buf; if ((hdr->data_tail + sz) > hdr->data_head) sz = hdr->data_head - hdr->data_tail; hdr->data_tail += sz; } static size_t __perf_handle_raw(perf_event_desc_t *hw) { size_t sz = 0; uint32_t raw_sz, i; char *buf; int ret; ret = perf_read_buffer_32(hw, &raw_sz); if (ret) { warnx("cannot read raw size"); return -1; } sz += sizeof(raw_sz); printf("\n\tRAWSZ:%u\n", raw_sz); buf = malloc(raw_sz); if (!buf) { warn("cannot allocate raw buffer"); return -1; } ret = perf_read_buffer(hw, buf, raw_sz); if (ret) { warnx("cannot read raw data"); free(buf); return -1; } if (raw_sz) putchar('\t'); for(i=0; i < raw_sz; i++) { printf("0x%02x ", buf[i] & 0xff ); if (((i+1) % 16) == 0) printf("\n\t"); } if (raw_sz) putchar('\n'); free(buf); return sz + raw_sz; } static int perf_display_branch_stack(perf_event_desc_t *desc, FILE *fp) { struct perf_branch_entry b; uint64_t nr, n; int ret; ret = perf_read_buffer(desc, &n, sizeof(n)); if (ret) errx(1, "cannot read branch stack nr"); fprintf(fp, "\n\tBRANCH_STACK:%"PRIu64"\n", n); nr = n; /* * from most recent to least recent take branch */ while (nr--) { ret = perf_read_buffer(desc, &b, sizeof(b)); if (ret) errx(1, "cannot read branch stack entry"); fprintf(fp, "\tFROM:0x%016"PRIx64" TO:0x%016"PRIx64" MISPRED:%c\n", b.from, b.to, !(b.mispred || b.predicted) ? '-': (b.mispred ? 'Y' :'N')); } return (int)(n * sizeof(b) + sizeof(n)); } static int perf_display_regs_user(perf_event_desc_t *hw, FILE *fp) { return 0; } static int perf_display_stack_user(perf_event_desc_t *hw, FILE *fp) { uint64_t nr; char buf[512]; size_t sz; int ret; ret = perf_read_buffer(hw, &nr, sizeof(hw)); if (ret) errx(1, "cannot user stack size"); fprintf(fp, "USER_STACK: SZ:%"PRIu64"\n", nr); /* consume content */ while (nr) { sz = nr; if (sz > sizeof(buf)) sz = sizeof(buf); ret = perf_read_buffer(hw, buf, sz); if (ret) errx(1, "cannot user stack content"); nr -= sz; } return 0; } int perf_display_sample(perf_event_desc_t *fds, int num_fds, int idx, struct perf_event_header *ehdr, FILE *fp) { perf_event_desc_t *hw; struct { uint32_t pid, tid; } pid; struct { uint64_t value, id; } grp; uint64_t time_enabled, time_running; size_t sz; uint64_t type, fmt; uint64_t val64; const char *str; int ret, e; if (!fds || !fp || !ehdr || num_fds < 0 || idx < 0 || idx >= num_fds) return -1; sz = ehdr->size - sizeof(*ehdr); hw = fds+idx; type = hw->hw.sample_type; fmt = hw->hw.read_format; /* * the sample_type information is laid down * based on the PERF_RECORD_SAMPLE format specified * in the perf_event.h header file. * That order is different from the enum perf_event_sample_format */ if (type & PERF_SAMPLE_IP) { const char *xtra = " "; ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx("cannot read IP"); return -1; } /* * MISC_EXACT_IP indicates that kernel is returning * th IIP of an instruction which caused the event, i.e., * no skid */ if (hw->hw.precise_ip && (ehdr->misc & PERF_RECORD_MISC_EXACT_IP)) xtra = " (exact) "; fprintf(fp, "IIP:%#016"PRIx64"%s", val64, xtra); sz -= sizeof(val64); } if (type & PERF_SAMPLE_TID) { ret = perf_read_buffer(hw, &pid, sizeof(pid)); if (ret) { warnx( "cannot read PID"); return -1; } fprintf(fp, "PID:%d TID:%d ", pid.pid, pid.tid); sz -= sizeof(pid); } if (type & PERF_SAMPLE_TIME) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read time"); return -1; } fprintf(fp, "TIME:%'"PRIu64" ", val64); sz -= sizeof(val64); } if (type & PERF_SAMPLE_ADDR) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read addr"); return -1; } fprintf(fp, "ADDR:%#016"PRIx64" ", val64); sz -= sizeof(val64); } if (type & PERF_SAMPLE_ID) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read id"); return -1; } fprintf(fp, "ID:%"PRIu64" ", val64); sz -= sizeof(val64); } if (type & PERF_SAMPLE_STREAM_ID) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read stream_id"); return -1; } fprintf(fp, "STREAM_ID:%"PRIu64" ", val64); sz -= sizeof(val64); } if (type & PERF_SAMPLE_CPU) { struct { uint32_t cpu, reserved; } cpu; ret = perf_read_buffer(hw, &cpu, sizeof(cpu)); if (ret) { warnx( "cannot read cpu"); return -1; } fprintf(fp, "CPU:%u ", cpu.cpu); sz -= sizeof(cpu); } if (type & PERF_SAMPLE_PERIOD) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read period"); return -1; } fprintf(fp, "PERIOD:%'"PRIu64" ", val64); sz -= sizeof(val64); } /* struct read_format { * { u64 value; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 id; } && PERF_FORMAT_ID * } && !PERF_FORMAT_GROUP * * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * } && PERF_FORMAT_GROUP * }; */ if (type & PERF_SAMPLE_READ) { uint64_t values[3]; uint64_t nr; if (fmt & PERF_FORMAT_GROUP) { ret = perf_read_buffer_64(hw, &nr); if (ret) { warnx( "cannot read nr"); return -1; } sz -= sizeof(nr); time_enabled = time_running = 1; if (fmt & PERF_FORMAT_TOTAL_TIME_ENABLED) { ret = perf_read_buffer_64(hw, &time_enabled); if (ret) { warnx( "cannot read timing info"); return -1; } sz -= sizeof(time_enabled); } if (fmt & PERF_FORMAT_TOTAL_TIME_RUNNING) { ret = perf_read_buffer_64(hw, &time_running); if (ret) { warnx( "cannot read timing info"); return -1; } sz -= sizeof(time_running); } fprintf(fp, "ENA=%'"PRIu64" RUN=%'"PRIu64" NR=%"PRIu64"\n", time_enabled, time_running, nr); values[1] = time_enabled; values[2] = time_running; while(nr--) { grp.id = -1; ret = perf_read_buffer_64(hw, &grp.value); if (ret) { warnx( "cannot read group value"); return -1; } sz -= sizeof(grp.value); if (fmt & PERF_FORMAT_ID) { ret = perf_read_buffer_64(hw, &grp.id); if (ret) { warnx( "cannot read leader id"); return -1; } sz -= sizeof(grp.id); } e = perf_id2event(fds, num_fds, grp.id); if (e == -1) str = "unknown sample event"; else str = fds[e].name; values[0] = grp.value; grp.value = perf_scale(values); fprintf(fp, "\t%'"PRIu64" %s (%"PRIu64"%s)\n", grp.value, str, grp.id, time_running != time_enabled ? ", scaled":""); } } else { time_enabled = time_running = 0; /* * this program does not use FORMAT_GROUP when there is only one event */ ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read value"); return -1; } sz -= sizeof(val64); if (fmt & PERF_FORMAT_TOTAL_TIME_ENABLED) { ret = perf_read_buffer_64(hw, &time_enabled); if (ret) { warnx( "cannot read timing info"); return -1; } sz -= sizeof(time_enabled); } if (fmt & PERF_FORMAT_TOTAL_TIME_RUNNING) { ret = perf_read_buffer_64(hw, &time_running); if (ret) { warnx( "cannot read timing info"); return -1; } sz -= sizeof(time_running); } if (fmt & PERF_FORMAT_ID) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read leader id"); return -1; } sz -= sizeof(val64); } fprintf(fp, "ENA=%'"PRIu64" RUN=%'"PRIu64"\n", time_enabled, time_running); values[0] = val64; values[1] = time_enabled; values[2] = time_running; val64 = perf_scale(values); fprintf(fp, "\t%'"PRIu64" %s %s\n", val64, fds[0].name, time_running != time_enabled ? ", scaled":""); } } if (type & PERF_SAMPLE_CALLCHAIN) { uint64_t nr, ip; ret = perf_read_buffer_64(hw, &nr); if (ret) { warnx( "cannot read callchain nr"); return -1; } sz -= sizeof(nr); while(nr--) { ret = perf_read_buffer_64(hw, &ip); if (ret) { warnx( "cannot read ip"); return -1; } sz -= sizeof(ip); fprintf(fp, "\t0x%"PRIx64"\n", ip); } } if (type & PERF_SAMPLE_RAW) { ret = __perf_handle_raw(hw); if (ret == -1) return -1; sz -= ret; } if (type & PERF_SAMPLE_BRANCH_STACK) { ret = perf_display_branch_stack(hw, fp); sz -= ret; } if (type & PERF_SAMPLE_REGS_USER) { ret = perf_display_regs_user(hw, fp); sz -= ret; } if (type & PERF_SAMPLE_STACK_USER) { ret = perf_display_stack_user(hw, fp); sz -= ret; } if (type & PERF_SAMPLE_WEIGHT) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read weight"); return -1; } fprintf(fp, "WEIGHT:%'"PRIu64" ", val64); sz -= sizeof(val64); } if (type & PERF_SAMPLE_DATA_SRC) { ret = perf_read_buffer_64(hw, &val64); if (ret) { warnx( "cannot read data src"); return -1; } fprintf(fp, "DATA_SRC:%'"PRIu64" ", val64); sz -= sizeof(val64); } /* * if we have some data left, it is because there is more * than what we know about. In fact, it is more complicated * because we may have the right size but wrong layout. But * that's the best we can do. */ if (sz) { warnx("did not correctly parse sample leftover=%zu", sz); perf_skip_buffer(hw, sz); } fputc('\n',fp); return 0; } uint64_t display_lost(perf_event_desc_t *hw, perf_event_desc_t *fds, int num_fds, FILE *fp) { struct { uint64_t id, lost; } lost; const char *str; int e, ret; ret = perf_read_buffer(hw, &lost, sizeof(lost)); if (ret) { warnx("cannot read lost info"); return 0; } e = perf_id2event(fds, num_fds, lost.id); if (e == -1) str = "unknown lost event"; else str = fds[e].name; fprintf(fp, "<<>>\n", lost.lost, str); return lost.lost; } void display_exit(perf_event_desc_t *hw, FILE *fp) { struct { pid_t pid, ppid, tid, ptid; } grp; int ret; ret = perf_read_buffer(hw, &grp, sizeof(grp)); if (ret) { warnx("cannot read exit info"); return; } fprintf(fp,"[%d] exited\n", grp.pid); } void display_freq(int mode, perf_event_desc_t *hw, FILE *fp) { struct { uint64_t time, id, stream_id; } thr; int ret; ret = perf_read_buffer(hw, &thr, sizeof(thr)); if (ret) { warnx("cannot read throttling info"); return; } fprintf(fp, "%s value=%"PRIu64" event ID=%"PRIu64"\n", mode ? "Throttled" : "Unthrottled", thr.id, thr.stream_id); } papi-5.3.0/src/libpfm4/perf_examples/syst_count.c0000600003276200002170000002321612247131124021477 0ustar ralphundrgrad/* * syst.c - example of a simple system wide monitoring program * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define MAX_GROUPS 256 #define MAX_PATH 1024 #ifndef STR # define _STR(x) #x # define STR(x) _STR(x) #endif typedef struct { const char *events[MAX_GROUPS]; int nevents[MAX_GROUPS]; /* #events per group */ int num_groups; int delay; int excl; int pin; int interval; int cpu; char *cgroup_name; } options_t; static options_t options; static perf_event_desc_t **all_fds; static int cgroupfs_find_mountpoint(char *buf, size_t maxlen) { FILE *fp; char mountpoint[MAX_PATH+1], tokens[MAX_PATH+1], type[MAX_PATH+1]; char *token, *saved_ptr = NULL; int found = 0; fp = fopen("/proc/mounts", "r"); if (!fp) return -1; /* * in order to handle split hierarchy, we need to scan /proc/mounts * and inspect every cgroupfs mount point to find one that has * perf_event subsystem */ while (fscanf(fp, "%*s %"STR(MAX_PATH)"s %"STR(MAX_PATH)"s %" STR(MAX_PATH)"s %*d %*d\n", mountpoint, type, tokens) == 3) { if (!strcmp(type, "cgroup")) { token = strtok_r(tokens, ",", &saved_ptr); while (token != NULL) { if (!strcmp(token, "perf_event")) { found = 1; break; } token = strtok_r(NULL, ",", &saved_ptr); } } if (found) break; } fclose(fp); if (!found) return -1; if (strlen(mountpoint) < maxlen) { strcpy(buf, mountpoint); return 0; } return -1; } int open_cgroup(char *name) { char path[MAX_PATH+1]; char mnt[MAX_PATH+1]; int cfd; if (cgroupfs_find_mountpoint(mnt, MAX_PATH+1)) errx(1, "cannot find cgroup fs mount point"); snprintf(path, MAX_PATH, "%s/%s", mnt, name); cfd = open(path, O_RDONLY); if (cfd == -1) warn("no access to cgroup %s\n", name); return cfd; } void setup_cpu(int cpu, int cfd) { perf_event_desc_t *fds = NULL; int old_total, total = 0, num; int i, j, n, ret, is_lead, group_fd; unsigned long flags; pid_t pid; for(i=0, j=0; i < options.num_groups; i++) { old_total = total; ret = perf_setup_list_events(options.events[i], &fds, &total); if (ret) errx(1, "cannot setup events\n"); all_fds[cpu] = fds; num = total - old_total; options.nevents[i] = num; for(n=0; n < num; n++, j++) { is_lead = perf_is_group_leader(fds, j); if (is_lead) { fds[j].hw.disabled = 1; group_fd = -1; } else { fds[j].hw.disabled = 0; group_fd = fds[fds[j].group_leader].fd; } fds[j].hw.size = sizeof(struct perf_event_attr); if (options.cgroup_name) { flags = PERF_FLAG_PID_CGROUP; pid = cfd; //fds[j].hw.cgroup = 1; //fds[j].hw.cgroup_fd = cfd; } else { flags = 0; pid = -1; } if (options.pin && is_lead) fds[j].hw.pinned = 1; if (options.excl && is_lead) fds[j].hw.exclusive = 1; /* request timing information necessary for scaling counts */ fds[j].hw.read_format = PERF_FORMAT_SCALE; fds[j].fd = perf_event_open(&fds[j].hw, pid, cpu, group_fd, flags); if (fds[j].fd == -1) { if (errno == EACCES) err(1, "you need to be root to run system-wide on this machine"); warn("cannot attach event %s to CPU%ds, skipping it", fds[j].name, cpu); goto error; } } } return; error: for (i=0; i < j; i++) { if (fds[i].fd > -1) close(fds[i].fd); fds[i].fd = -1; } } void start_cpu(int c) { perf_event_desc_t *fds = NULL; int j, ret, n = 0; fds = all_fds[c]; if (fds[0].fd == -1) return; for(j=0; j < options.num_groups; j++) { /* group leader always first in each group */ ret = ioctl(fds[n].fd, PERF_EVENT_IOC_ENABLE, 0); if (ret) err(1, "cannot enable event %s\n", fds[j].name); n += options.nevents[j]; } } void stop_cpu(int c) { perf_event_desc_t *fds = NULL; int j, ret, n = 0; fds = all_fds[c]; if (fds[0].fd == -1) return; for(j=0; j < options.num_groups; j++) { /* group leader always first in each group */ ret = ioctl(fds[n].fd, PERF_EVENT_IOC_DISABLE, 0); if (ret) err(1, "cannot disable event %s\n", fds[j].name); n += options.nevents[j]; } } void read_cpu(int c) { perf_event_desc_t *fds; uint64_t val, delta; double ratio; int i, j, n, ret; fds = all_fds[c]; if (fds[0].fd == -1) { printf("CPU%d not monitored\n", c); return; } for(i=0, j = 0; i < options.num_groups; i++) { for(n = 0; n < options.nevents[i]; n++, j++) { ret = read(fds[j].fd, fds[j].values, sizeof(fds[j].values)); if (ret != sizeof(fds[j].values)) { if (ret == -1) err(1, "cannot read event %s : %d", fds[j].name, ret); else { warnx("CPU%d G%-2d could not read event %s, read=%d", c, i, fds[j].name, ret); continue; } } /* * scaling because we may be sharing the PMU and * thus may be multiplexed */ delta = perf_scale_delta(fds[j].values, fds[j].prev_values); val = perf_scale(fds[j].values); ratio = perf_scale_ratio(fds[j].values); printf("CPU%-3d G%-2d %'20"PRIu64" %'20"PRIu64" %s (scaling %.2f%%, ena=%'"PRIu64", run=%'"PRIu64") %s\n", c, i, val, delta, fds[j].name, (1.0-ratio)*100, fds[j].values[1], fds[j].values[2], options.cgroup_name ? options.cgroup_name : ""); fds[j].prev_values[0] = fds[j].values[0]; fds[j].prev_values[1] = fds[j].values[1]; fds[j].prev_values[2] = fds[j].values[2]; if (fds[j].values[2] > fds[j].values[1]) errx(1, "WARNING: time_running > time_enabled %"PRIu64"\n", fds[j].values[2] - fds[j].values[1]); } } } void close_cpu(int c) { perf_event_desc_t *fds = NULL; int i, j; int total = 0; fds = all_fds[c]; if (fds[0].fd == -1) return; for(i=0; i < options.num_groups; i++) { for(j=0; j < options.nevents[i]; j++) close(fds[j].fd); total += options.nevents[i]; } perf_free_fds(fds, total); } void measure(void) { int c, cmin, cmax, ncpus; int cfd = -1; cmin = 0; cmax = (int)sysconf(_SC_NPROCESSORS_ONLN); ncpus = cmax; if (options.cpu != -1) { cmin = options.cpu; cmax = cmin + 1; } all_fds = malloc(ncpus * sizeof(perf_event_desc_t *)); if (!all_fds) err(1, "cannot allocate memory for all_fds"); if (options.cgroup_name) { cfd = open_cgroup(options.cgroup_name); if (cfd == -1) exit(1); } for(c=cmin ; c < cmax; c++) setup_cpu(c, cfd); if (options.cgroup_name) close(cfd); printf("\n", options.delay); /* * FIX this for hotplug CPU */ if (options.interval) { struct timespec tv; int delay; for (delay = 1 ; delay <= options.delay; delay++) { for(c=cmin ; c < cmax; c++) start_cpu(c); if (0) { tv.tv_sec = 0; tv.tv_nsec = 100000000; nanosleep(&tv, NULL); } else sleep(1); for(c=cmin ; c < cmax; c++) stop_cpu(c); for(c = cmin; c < cmax; c++) { printf("# %'ds -----\n", delay); read_cpu(c); } } } else { for(c=cmin ; c < cmax; c++) start_cpu(c); sleep(options.delay); if (0) for(c=cmin ; c < cmax; c++) stop_cpu(c); for(c = cmin; c < cmax; c++) { printf("# -----\n"); read_cpu(c); } } for(c = cmin; c < cmax; c++) close_cpu(c); free(all_fds); } static void usage(void) { printf("usage: syst [-c cpu] [-x] [-h] [-p] [-d delay] [-P] [-G cgroup name] [-e event1,event2,...]\n"); } int main(int argc, char **argv) { int c, ret; setlocale(LC_ALL, ""); options.cpu = -1; while ((c=getopt(argc, argv,"hc:e:d:xPpG:")) != -1) { switch(c) { case 'x': options.excl = 1; break; case 'p': options.interval = 1; break; case 'e': if (options.num_groups < MAX_GROUPS) { options.events[options.num_groups++] = optarg; } else { errx(1, "you cannot specify more than %d groups.\n", MAX_GROUPS); } break; case 'c': options.cpu = atoi(optarg); break; case 'd': options.delay = atoi(optarg); break; case 'P': options.pin = 1; break; case 'h': usage(); exit(0); case 'G': options.cgroup_name = optarg; break; default: errx(1, "unknown error"); } } if (!options.delay) options.delay = 20; if (!options.events[0]) { options.events[0] = "cycles,instructions"; options.num_groups = 1; } ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "libpfm initialization failed: %s\n", pfm_strerror(ret)); measure(); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/task_attach_timeout.c0000600003276200002170000001165012247131124023320 0ustar ralphundrgrad/* * task_attach_timeout.c - attach to another task for monitoring for a short while * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" typedef struct { char *events; int delay; int print; int group; int pinned; } options_t; static options_t options; static void print_counts(perf_event_desc_t *fds, int num, int do_delta) { ssize_t ret; int i; /* * now simply read the results. */ for(i=0; i < num; i++) { uint64_t val; double ratio; ret = read(fds[i].fd, fds[i].values, sizeof(fds[i].values)); if (ret < (ssize_t)sizeof(fds[i].values)) { if (ret == -1) err(1, "cannot read values event %s", fds[i].name); else warnx("could not read event%d", i); } val = perf_scale(fds[i].values); ratio = perf_scale_ratio(fds[i].values); val = do_delta ? perf_scale_delta(fds[i].values, fds[i].prev_values) : val; fds[i].prev_values[0] = fds[i].values[0]; fds[i].prev_values[1] = fds[i].values[1]; fds[i].prev_values[2] = fds[i].values[2]; if (ratio == 1.0) printf("%20"PRIu64" %s\n", val, fds[i].name); else if (ratio == 0.0) printf("%20"PRIu64" %s (did not run: incompatible events, too many events in a group, competing session)\n", val, fds[i].name); else printf("%20"PRIu64" %s (scaled from %.2f%% of time)\n", val, fds[i].name, ratio*100.0); } } int measure(pid_t pid) { perf_event_desc_t *fds = NULL; int i, ret, num_fds = 0; char fn[32]; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed\n"); ret = perf_setup_list_events(options.events, &fds, &num_fds); if (ret || (num_fds == 0)) exit(1); fds[0].fd = -1; for(i=0; i < num_fds; i++) { fds[i].hw.disabled = 0; /* start immediately */ /* request timing information necessary for scaling counts */ fds[i].hw.read_format = PERF_FORMAT_SCALE; fds[i].hw.pinned = !i && options.pinned; fds[i].fd = perf_event_open(&fds[i].hw, pid, -1, (options.group? fds[0].fd : -1), 0); if (fds[i].fd == -1) errx(1, "cannot attach event %s", fds[i].name); } /* * no notification is generated by perf_counters * when the monitored thread exits. Thus we need * to poll /proc/ to detect it has disappeared, * otherwise we have to wait until the end of the * timeout */ sprintf(fn, "/proc/%d/status", pid); while(access(fn, F_OK) == 0 && options.delay) { sleep(1); options.delay--; if (options.print) print_counts(fds, num_fds, 1); } if (options.delay) warn("thread %d terminated before timeout", pid); if (!options.print) print_counts(fds, num_fds, 0); for(i=0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } static void usage(void) { printf("usage: task_attach_timeout [-h] [-p] [-P] [-g] [-d delay] [-e event1,event2,...] pid\n"); } int main(int argc, char **argv) { int c; while ((c=getopt(argc, argv,"he:vd:pgP")) != -1) { switch(c) { case 'e': options.events = optarg; break; case 'p': options.print = 1; break; case 'P': options.pinned = 1; break; case 'g': options.group = 1; break; case 'd': options.delay = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown error"); } } if (!options.events) options.events = strdup("cycles,instructions"); if (options.delay < 1) options.delay = 10; if (!argv[optind]) errx(1, "you must specify pid to attach to\n"); return measure(atoi(argv[optind])); } papi-5.3.0/src/libpfm4/perf_examples/self.c0000600003276200002170000001044712247131124020220 0ustar ralphundrgrad/* * self.c - example of a simple self monitoring task * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2002-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" static const char *gen_events[]={ "cycles", "instructions", NULL }; static volatile int quit; void sig_handler(int n) { quit = 1; } void noploop(void) { for(;quit == 0;); } static void print_counts(perf_event_desc_t *fds, int num_fds, const char *msg) { uint64_t val; uint64_t values[3]; double ratio; int i; ssize_t ret; /* * now read the results. We use pfp_event_count because * libpfm guarantees that counters for the events always * come first. */ memset(values, 0, sizeof(values)); for (i = 0; i < num_fds; i++) { ret = read(fds[i].fd, values, sizeof(values)); if (ret < (ssize_t)sizeof(values)) { if (ret == -1) err(1, "cannot read results: %s", strerror(errno)); else warnx("could not read event%d", i); } /* * scaling is systematic because we may be sharing the PMU and * thus may be multiplexed */ val = perf_scale(values); ratio = perf_scale_ratio(values); printf("%s %'20"PRIu64" %s (%.2f%% scaling, raw=%'"PRIu64", ena=%'"PRIu64", run=%'"PRIu64")\n", msg, val, fds[i].name, (1.0-ratio)*100.0, values[0], values[1], values[2]); } } int main(int argc, char **argv) { perf_event_desc_t *fds = NULL; int i, ret, num_fds = 0; setlocale(LC_ALL, ""); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "Cannot initialize library: %s", pfm_strerror(ret)); ret = perf_setup_argv_events(argc > 1 ? (const char **)argv+1 : gen_events, &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot setup events"); fds[0].fd = -1; for(i=0; i < num_fds; i++) { /* request timing information necessary for scaling */ fds[i].hw.read_format = PERF_FORMAT_SCALE; fds[i].hw.disabled = 1; /* do not start now */ /* each event is in an independent group (multiplexing likely) */ fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, -1, 0); if (fds[i].fd == -1) err(1, "cannot open event %d", i); } signal(SIGALRM, sig_handler); /* * enable all counters attached to this thread and created by it */ ret = prctl(PR_TASK_PERF_EVENTS_ENABLE); if (ret) err(1, "prctl(enable) failed"); print_counts(fds, num_fds, "INITIAL: "); alarm(10); noploop(); /* * disable all counters attached to this thread */ ret = prctl(PR_TASK_PERF_EVENTS_DISABLE); if (ret) err(1, "prctl(disable) failed"); printf("Final counts:\n"); print_counts(fds, num_fds, "FINAL: "); for (i = 0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/x86/0000700003276200002170000000000012247131124017540 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/perf_examples/x86/Makefile0000600003276200002170000000366212247131124021211 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/../.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk CFLAGS+= -I. -D_GNU_SOURCE -I.. LIBS += -lm ifeq ($(SYS),Linux) CFLAGS+= -pthread endif TARGETS= ifeq ($(SYS),Linux) LPC_UTILS=../perf_util.o TARGETS += bts_smpl endif EXAMPLESDIR=$(DOCDIR)/perf_examples/x86 all: $(TARGETS) $(TARGETS): %:%.o $(LPC_UTILS) $(PFMLIB) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS) clean: $(RM) -f *.o $(TARGETS) *~ distclean: clean install_examples: $(TARGETS) @echo installing: $(TARGETS) -mkdir -p $(DESTDIR)$(EXAMPLESDIR) $(INSTALL) -m 755 $(TARGETS) $(TARGET_GEN) $(DESTDIR)$(EXAMPLESDIR) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done .PHONY: install depend install_examples papi-5.3.0/src/libpfm4/perf_examples/x86/bts_smpl.c0000600003276200002170000001544412247131124021541 0ustar ralphundrgrad/* * bts_smpl.c - example of Intel Branch Trace Stack sampling * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2003-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define SMPL_PERIOD 24000000ULL typedef struct { int opt_no_show; int opt_inherit; int mmap_pages; } options_t; static jmp_buf jbuf; static uint64_t collected_samples, lost_samples; static perf_event_desc_t *fds; static int num_fds; static options_t options; static struct option the_options[]={ { "help", 0, 0, 1}, { "no-show", 0, &options.opt_no_show, 1}, { 0, 0, 0, 0} }; static void cld_handler(int n) { longjmp(jbuf, 1); } int child(char **arg) { /* * force the task to stop before executing the first * user level instruction */ ptrace(PTRACE_TRACEME, 0, NULL, NULL); execvp(arg[0], arg); /* not reached */ return -1; } struct timeval last_read, this_read; static void process_smpl_buf(perf_event_desc_t *hw) { struct perf_event_header ehdr; int ret; for(;;) { ret = perf_read_buffer(hw, &ehdr, sizeof(ehdr)); if (ret) return; /* nothing to read */ switch(ehdr.type) { case PERF_RECORD_SAMPLE: perf_display_sample(fds, num_fds, hw - fds, &ehdr, stdout); collected_samples++; break; case PERF_RECORD_EXIT: display_exit(hw, stdout); break; case PERF_RECORD_LOST: display_lost(hw, fds, num_fds, stdout); break; case PERF_RECORD_THROTTLE: display_freq(1, hw, stdout); break; case PERF_RECORD_UNTHROTTLE: display_freq(0, hw, stdout); break; default: printf("unknown sample type %d sz=%d\n", ehdr.type, ehdr.size); perf_skip_buffer(hw, ehdr.size - sizeof(ehdr)); } } } int mainloop(char **arg) { static uint64_t ovfl_count; /* static to avoid setjmp issue */ struct pollfd pollfds[1]; size_t map_size = 0; sigset_t bmask; pid_t pid; uint64_t val[2]; int status, ret; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed\n"); map_size = (options.mmap_pages+1)*getpagesize(); /* * does allocate fds */ ret = perf_setup_list_events("branches:u", &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot setup event"); memset(pollfds, 0, sizeof(pollfds)); /* * Create the child task */ if ((pid=fork()) == -1) err(1, "cannot fork process\n"); if (pid == 0) exit(child(arg)); /* * wait for the child to exec */ ret = waitpid(pid, &status, WUNTRACED); if (ret == -1) err(1, "waitpid failed"); if (WIFEXITED(status)) errx(1, "task %s [%d] exited already status %d\n", arg[0], pid, WEXITSTATUS(status)); fds[0].fd = -1; fds[0].hw.disabled = 0; /* start immediately */ if (options.opt_inherit) fds[0].hw.inherit = 1; fds[0].hw.sample_type = PERF_SAMPLE_IP|PERF_SAMPLE_ADDR; /* * BTS only supported at user level */ if (fds[0].hw.exclude_user ||fds[0].hw.exclude_kernel == 0) errx(1, "BTS currently supported only at the user level\n"); /* * period MUST be one to trigger BTS: tracing not sampling anymore */ fds[0].hw.sample_period = 1; fds[0].hw.exclude_kernel = 1; fds[0].hw.exclude_hv = 1; fds[0].hw.read_format |= PERF_FORMAT_ID; fds[0].fd = perf_event_open(&fds[0].hw, pid, -1, -1, 0); if (fds[0].fd == -1) err(1, "cannot attach event %s", fds[0].name); fds[0].buf = mmap(NULL, map_size, PROT_READ|PROT_WRITE, MAP_SHARED, fds[0].fd, 0); if (fds[0].buf == MAP_FAILED) err(1, "cannot mmap buffer"); /* does not include header page */ fds[0].pgmsk = (options.mmap_pages*getpagesize())-1; ret = read(fds[0].fd, val, sizeof(val)); if (ret == -1) err(1, "cannot read id %zu", sizeof(val)); fds[0].id = val[1]; printf("%"PRIu64" %s\n", fds[0].id, fds[0].name); /* * effectively activate monitoring */ ptrace(PTRACE_DETACH, pid, NULL, 0); signal(SIGCHLD, cld_handler); pollfds[0].fd = fds[0].fd; pollfds[0].events = POLLIN; if (setjmp(jbuf) == 1) goto terminate_session; sigemptyset(&bmask); sigaddset(&bmask, SIGCHLD); /* * core loop */ for(;;) { ret = poll(pollfds, 1, -1); if (ret < 0 && errno == EINTR) break; ovfl_count++; ret = sigprocmask(SIG_SETMASK, &bmask, NULL); if (ret) err(1, "setmask"); process_smpl_buf(&fds[0]); ret = sigprocmask(SIG_UNBLOCK, &bmask, NULL); if (ret) err(1, "unblock"); } terminate_session: /* * cleanup child */ wait4(pid, &status, 0, NULL); close(fds[0].fd); /* check for partial event buffer */ process_smpl_buf(&fds[0]); munmap(fds[0].buf, map_size); free(fds); printf("%"PRIu64" samples collected in %"PRIu64" poll events, %"PRIu64" lost samples\n", collected_samples, ovfl_count, lost_samples); return 0; } static void usage(void) { printf("usage: bts_smpl [-h] [--help] [-i] [-m mmap_pages] cmd\n"); } int main(int argc, char **argv) { int c; while ((c=getopt_long(argc, argv,"+hm:p:if", the_options, 0)) != -1) { switch(c) { case 0: continue; case 'i': options.opt_inherit = 1; break; case 'm': if (options.mmap_pages) errx(1, "mmap pages already set\n"); options.mmap_pages = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown option"); } } if (argv[optind] == NULL) errx(1, "you must specify a command to execute\n"); if (!options.mmap_pages) options.mmap_pages = 4; return mainloop(argv+optind); } papi-5.3.0/src/libpfm4/perf_examples/self_count.c0000600003276200002170000001373112247131124021427 0ustar ralphundrgrad/* * self_count.c - example of a simple self monitoring using mmapped page * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" static const char *gen_events[]={ "cycles", NULL }; static volatile int quit; void sig_handler(int n) { quit = 1; } #if defined(__x86_64__) || defined(__i386__) #ifdef __x86_64__ #define DECLARE_ARGS(val, low, high) unsigned low, high #define EAX_EDX_VAL(val, low, high) ((low) | ((uint64_t )(high) << 32)) #define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high) #define EAX_EDX_RET(val, low, high) "=a" (low), "=d" (high) #else #define DECLARE_ARGS(val, low, high) unsigned long long val #define EAX_EDX_VAL(val, low, high) (val) #define EAX_EDX_ARGS(val, low, high) "A" (val) #define EAX_EDX_RET(val, low, high) "=A" (val) #endif #define barrier() __asm__ __volatile__("": : :"memory") static inline int rdpmc(struct perf_event_mmap_page *hdr, uint64_t *value) { int counter = hdr->index - 1; DECLARE_ARGS(val, low, high); if (counter < 0) return -1; asm volatile("rdpmc" : EAX_EDX_RET(val, low, high) : "c" (counter)); *value = EAX_EDX_VAL(val, low, high); return 0; } #else /* * Default barrier macro. * Given this is architecture specific, it must be defined when * libpfm is ported to new architecture. The default macro below * simply does nothing. */ #define barrier() {} /* * Default function to read counter directly from user level mode. * Given this is architecture specific, it must be defined when * libpfm is ported to new architecture. The default routine below * simply fails and the caller falls backs to syscall. */ static inline int rdpmc(struct perf_event_mmap_page *hdr, uint64_t *value) { int counter = hdr->index - 1; if (counter < 0) return -1; printf("your architecture does not have a way to read counters from user mode\n"); return -1; } #endif /* * our test code (function cannot be made static otherwise it is optimized away) */ unsigned long fib(unsigned long n) { if (n == 0) return 0; if (n == 1) return 2; return fib(n-1)+fib(n-2); } uint64_t read_count(perf_event_desc_t *fds) { struct perf_event_mmap_page *hdr; uint64_t values[3]; uint64_t offset = 0; uint64_t val; unsigned int seq; ssize_t ret; int idx = -1; hdr = fds->buf; do { seq = hdr->lock; barrier(); /* try reading directly from user mode */ if (!rdpmc(hdr, &values[0])) { offset = hdr->offset; values[1] = hdr->time_enabled; values[2] = hdr->time_running; ret = 0; } else { offset = 0; idx = -1; ret = read(fds->fd, values, sizeof(values)); if (ret < (ssize_t)sizeof(values)) errx(1, "cannot read values"); break; } barrier(); } while (hdr->lock != seq); printf("raw=0x%"PRIx64 " offset=0x%"PRIx64", ena=%"PRIu64 " run=%"PRIu64" idx=%d\n", values[0], offset, values[1], values[2], idx); values[0] += offset; val = perf_scale(values); return val; } int main(int argc, char **argv) { perf_event_desc_t *fds = NULL; long lret; size_t pgsz; uint64_t val; int i, ret, num_fds = 0; int n = 30; lret = sysconf(_SC_PAGESIZE); if (lret < 0) err(1, "cannot get page size"); pgsz = (size_t)lret; /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "Cannot initialize library: %s", pfm_strerror(ret)); ret = perf_setup_argv_events(argc > 1 ? (const char **)argv+1 : gen_events, &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot setup events"); fds[0].fd = -1; for(i=0; i < num_fds; i++) { /* request timing information necesaary for scaling */ fds[i].hw.read_format = PERF_FORMAT_SCALE; fds[i].hw.disabled = 0; //fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, fds[0].fd, 0); fds[i].fd = perf_event_open(&fds[i].hw, 0, -1, -1, 0); if (fds[i].fd == -1) err(1, "cannot open event %d", i); fds[i].buf = mmap(NULL, pgsz, PROT_READ, MAP_SHARED, fds[i].fd, 0); if (fds[i].buf == MAP_FAILED) err(1, "cannot mmap page"); } signal(SIGALRM, sig_handler); /* * enable all counters attached to this thread */ ioctl(fds[0].fd, PERF_EVENT_IOC_ENABLE, 0); alarm(10); for(;quit == 0;) { for (i=0; i < num_fds; i++) { val = read_count(&fds[i]); printf("%20"PRIu64" %s\n", val, fds[i].name); } fib(n); n += 5; if (n > 35) n = 30; } /* * disable all counters attached to this thread */ ioctl(fds[0].fd, PERF_EVENT_IOC_DISABLE, 0); for (i=0; i < num_fds; i++) { munmap(fds[i].buf, pgsz); close(fds[i].fd); } perf_free_fds(fds, num_fds); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/syst_smpl.c0000700003276200002170000002305512247131124021324 0ustar ralphundrgrad/* * syst_smpl.c - example of a system-wide sampling * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define SMPL_PERIOD 240000000ULL #define MAX_PATH 1024 #ifndef STR # define _STR(x) #x # define STR(x) _STR(x) #endif typedef struct { int opt_no_show; int mmap_pages; int cpu; int pin; int delay; char *events; char *cgroup; } options_t; static jmp_buf jbuf; static uint64_t collected_samples, lost_samples; static perf_event_desc_t *fds; static int num_fds; static options_t options; static size_t sz, pgsz; static size_t map_size; static struct option the_options[]={ { "help", 0, 0, 1}, { "no-show", 0, &options.opt_no_show, 1}, { 0, 0, 0, 0} }; static const char *gen_events = "cycles,instructions"; static void process_smpl_buf(perf_event_desc_t *hw) { struct perf_event_header ehdr; int ret; for(;;) { ret = perf_read_buffer(hw, &ehdr, sizeof(ehdr)); if (ret) return; /* nothing to read */ switch(ehdr.type) { case PERF_RECORD_SAMPLE: ret = perf_display_sample(fds, num_fds, hw - fds, &ehdr, stdout); if (ret) errx(1, "cannot parse sample"); collected_samples++; break; case PERF_RECORD_EXIT: display_exit(hw, stdout); break; case PERF_RECORD_LOST: lost_samples += display_lost(hw, fds, num_fds, stdout); break; case PERF_RECORD_THROTTLE: display_freq(1, hw, stdout); break; case PERF_RECORD_UNTHROTTLE: display_freq(0, hw, stdout); break; default: printf("unknown sample type %d\n", ehdr.type); perf_skip_buffer(hw, ehdr.size - sizeof(ehdr)); } } } int setup_cpu(int cpu, int fd) { uint64_t *val; int ret, flags; int i, pid; /* * does allocate fds */ ret = perf_setup_list_events(options.events, &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot setup event list"); if (!fds[0].hw.sample_period) errx(1, "need to set sampling period or freq on first event, use :period= or :freq="); fds[0].fd = -1; for(i=0; i < num_fds; i++) { fds[i].hw.disabled = !i; /* start immediately */ if (options.cgroup) { flags = PERF_FLAG_PID_CGROUP; pid = fd; } else { flags = 0; pid = -1; } if (options.pin) fds[i].hw.pinned = 1; if (fds[i].hw.sample_period) { /* * set notification threshold to be halfway through the buffer */ if (fds[i].hw.sample_period) { fds[i].hw.wakeup_watermark = (options.mmap_pages*pgsz) / 2; fds[i].hw.watermark = 1; } fds[i].hw.sample_type = PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_READ|PERF_SAMPLE_TIME|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID|PERF_SAMPLE_CPU; printf("%s period=%"PRIu64" freq=%d\n", fds[i].name, fds[i].hw.sample_period, fds[i].hw.freq); fds[i].hw.read_format = PERF_FORMAT_SCALE; if (num_fds > 1) fds[i].hw.read_format |= PERF_FORMAT_GROUP|PERF_FORMAT_ID; if (fds[i].hw.freq) fds[i].hw.sample_type |= PERF_SAMPLE_PERIOD; } fds[i].fd = perf_event_open(&fds[i].hw, pid, cpu, fds[0].fd, flags); if (fds[i].fd == -1) { if (fds[i].hw.precise_ip) err(1, "cannot attach event %s: precise mode may not be supported", fds[i].name); err(1, "cannot attach event %s", fds[i].name); } } /* * kernel adds the header page to the size of the mmapped region */ fds[0].buf = mmap(NULL, map_size, PROT_READ|PROT_WRITE, MAP_SHARED, fds[0].fd, 0); if (fds[0].buf == MAP_FAILED) err(1, "cannot mmap buffer"); /* does not include header page */ fds[0].pgmsk = (options.mmap_pages*pgsz)-1; /* * send samples for all events to first event's buffer */ for (i = 1; i < num_fds; i++) { if (!fds[i].hw.sample_period) continue; ret = ioctl(fds[i].fd, PERF_EVENT_IOC_SET_OUTPUT, fds[0].fd); if (ret) err(1, "cannot redirect sampling output"); } /* * we are using PERF_FORMAT_GROUP, therefore the structure * of val is as follows: * * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * } * We are skipping the first 3 values (nr, time_enabled, time_running) * and then for each event we get a pair of values. */ if (num_fds > 1 && fds[0].fd > -1) { ssize_t sret; sz = (3+2*num_fds)*sizeof(uint64_t); val = malloc(sz); if (!val) err(1, "cannot allocated memory"); sret = read(fds[0].fd, val, sz); if (sret == (ssize_t)sz) err(1, "cannot read id %zu", sizeof(val)); for(i=0; i < num_fds; i++) { fds[i].id = val[2*i+1+3]; printf("%"PRIu64" %s\n", fds[i].id, fds[i].name); } free(val); } return 0; } static void start_cpu(void) { int ret; ret = ioctl(fds[0].fd, PERF_EVENT_IOC_ENABLE, 0); if (ret) err(1, "cannot start counter"); } static const char *cgroupfs_find_mountpoint(void) { static char cgroup_mountpoint[MAX_PATH+1]; FILE *fp; int found = 0; char type[64]; fp = fopen("/proc/mounts", "r"); if (!fp) return NULL; while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", cgroup_mountpoint, type) == 2) { found = !strcmp(type, "cgroup"); if (found) break; } fclose(fp); return found ? cgroup_mountpoint : NULL; } int open_cgroup(char *name) { char path[MAX_PATH+1]; const char *mnt; int cfd; mnt = cgroupfs_find_mountpoint(); if (!mnt) errx(1, "cannot find cgroup fs mount point"); snprintf(path, MAX_PATH, "%s/%s", mnt, name); cfd = open(path, O_RDONLY); if (cfd == -1) warn("no access to cgroup %s\n", name); return cfd; } static void handler(int n) { longjmp(jbuf, 1); } int mainloop(char **arg) { static uint64_t ovfl_count = 0; /* static to avoid setjmp issue */ struct pollfd pollfds[1]; int ret; int fd = -1; int i; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed\n"); pgsz = sysconf(_SC_PAGESIZE); map_size = (options.mmap_pages+1)*pgsz; if (options.cgroup) { fd = open_cgroup(options.cgroup); if (fd == -1) err(1, "cannot open cgroup file %s\n", options.cgroup); } setup_cpu(options.cpu, fd); /* done with cgroup */ if (fd != -1) close(fd); signal(SIGALRM, handler); signal(SIGINT, handler); pollfds[0].fd = fds[0].fd; pollfds[0].events = POLLIN; printf("monitoring on CPU%d, session ending in %ds\n", options.cpu, options.delay); if (setjmp(jbuf) == 1) goto terminate_session; start_cpu(); alarm(options.delay); /* * core loop */ for(;;) { ret = poll(pollfds, 1, -1); if (ret < 0 && errno == EINTR) break; ovfl_count++; process_smpl_buf(&fds[0]); } terminate_session: for(i=0; i < num_fds; i++) close(fds[i].fd); /* check for partial event buffer */ process_smpl_buf(&fds[0]); munmap(fds[0].buf, map_size); perf_free_fds(fds, num_fds); printf("%"PRIu64" samples collected in %"PRIu64" poll events, %"PRIu64" lost samples\n", collected_samples, ovfl_count, lost_samples); return 0; } static void usage(void) { printf("usage: syst_smpl [-h] [-P] [--help] [-m mmap_pages] [-f] [-e event1,...,eventn] [-c cpu] [-d seconds]\n"); } int main(int argc, char **argv) { int c; setlocale(LC_ALL, ""); options.cpu = -1; options.delay = -1; while ((c=getopt_long(argc, argv,"hPe:m:c:d:G:", the_options, 0)) != -1) { switch(c) { case 0: continue; case 'e': if (options.events) errx(1, "events specified twice\n"); options.events = optarg; break; case 'm': if (options.mmap_pages) errx(1, "mmap pages already set\n"); options.mmap_pages = atoi(optarg); break; case 'P': options.pin = 1; break; case 'd': options.delay = atoi(optarg); break; case 'G': options.cgroup = optarg; break; case 'c': options.cpu = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown option"); } } if (!options.events) options.events = strdup(gen_events); if (!options.mmap_pages) options.mmap_pages = 1; if (options.cpu == -1) options.cpu = random() % sysconf(_SC_NPROCESSORS_ONLN); if (options.delay == -1) options.delay = 10; if (options.mmap_pages > 1 && ((options.mmap_pages) & 0x1)) errx(1, "number of pages must be power of 2\n"); return mainloop(argv+optind); } papi-5.3.0/src/libpfm4/perf_examples/task_cpu.c0000600003276200002170000002273312247131124021101 0ustar ralphundrgrad/* * task_cpu.c - example of per-thread remote monitoring with per-cpu breakdown * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define MAX_GROUPS 256 #define MAX_CPUS 64 typedef struct { const char *events[MAX_GROUPS]; int num_groups; int format_group; int inherit; int print; int pin; int ncpus; pid_t pid; } options_t; static options_t options; static volatile int quit; int child(char **arg) { /* * execute the requested command */ execvp(arg[0], arg); errx(1, "cannot exec: %s\n", arg[0]); /* not reached */ } static void read_groups(perf_event_desc_t *fds, int num) { uint64_t *values = NULL; size_t new_sz, sz = 0; int i, evt; ssize_t ret; /* * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * } && PERF_FORMAT_GROUP * * we do not use FORMAT_ID in this program */ for (evt = 0; evt < num; ) { int num_evts_to_read; if (options.format_group) { num_evts_to_read = perf_get_group_nevents(fds, num, evt); new_sz = sizeof(uint64_t) * (3 + num_evts_to_read); } else { num_evts_to_read = 1; new_sz = sizeof(uint64_t) * 3; } if (new_sz > sz) { sz = new_sz; values = realloc(values, sz); } if (!values) err(1, "cannot allocate memory for values\n"); ret = read(fds[evt].fd, values, new_sz); if (ret != (ssize_t)new_sz) { /* unsigned */ if (ret == -1) err(1, "cannot read values event %s", fds[evt].name); /* likely pinned and could not be loaded */ warnx("could not read event %d, tried to read %zu bytes, but got %zd", evt, new_sz, ret); } /* * propagate to save area */ for (i = evt; i < (evt + num_evts_to_read); i++) { if (options.format_group) values[0] = values[3 + (i - evt)]; /* * scaling because we may be sharing the PMU and * thus may be multiplexed */ fds[i].values[0] = values[0]; fds[i].values[1] = values[1]; fds[i].values[2] = values[2]; } evt += num_evts_to_read; } if (values) free(values); } static void print_counts(perf_event_desc_t *fds, int num, int cpu) { double ratio; uint64_t val, delta; int i; read_groups(fds, num); for(i=0; i < num; i++) { val = perf_scale(fds[i].values); delta = perf_scale_delta(fds[i].values, fds[i].prev_values); ratio = perf_scale_ratio(fds[i].values); /* separate groups */ if (perf_is_group_leader(fds, i)) putchar('\n'); if (options.print) printf("CPU%-2d %'20"PRIu64" %'20"PRIu64" %s (%.2f%% scaling, ena=%'"PRIu64", run=%'"PRIu64")\n", cpu, val, delta, fds[i].name, (1.0-ratio)*100.0, fds[i].values[1], fds[i].values[2]); else printf("CPU%-2d %'20"PRIu64" %s (%.2f%% scaling, ena=%'"PRIu64", run=%'"PRIu64")\n", cpu, val, fds[i].name, (1.0-ratio)*100.0, fds[i].values[1], fds[i].values[2]); } } static void sig_handler(int n) { quit = 1; } int parent(char **arg) { perf_event_desc_t *fds, *fds_cpus[MAX_CPUS]; int status, ret, i, num_fds = 0, grp, group_fd; int ready[2], go[2], cpu; char buf; pid_t pid; go[0] = go[1] = -1; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed"); if (options.ncpus >= MAX_CPUS) errx(1, "maximum number of cpus exceeded (%d)", MAX_CPUS); memset(fds_cpus, 0, sizeof(fds_cpus)); for (cpu=0; cpu < options.ncpus; cpu++) { for (grp = 0; grp < options.num_groups; grp++) { num_fds = 0; ret = perf_setup_list_events(options.events[grp], &fds_cpus[cpu], &num_fds); if (ret || !num_fds) exit(1); } } pid = options.pid; if (!pid) { ret = pipe(ready); if (ret) err(1, "cannot create pipe ready"); ret = pipe(go); if (ret) err(1, "cannot create pipe go"); /* * Create the child task */ if ((pid=fork()) == -1) err(1, "Cannot fork process"); /* * and launch the child code * * The pipe is used to avoid a race condition * between for() and exec(). We need the pid * of the new tak but we want to start measuring * at the first user level instruction. Thus we * need to prevent exec until we have attached * the events. */ if (pid == 0) { close(ready[0]); close(go[1]); /* * let the parent know we exist */ close(ready[1]); if (read(go[0], &buf, 1) == -1) err(1, "unable to read go_pipe"); exit(child(arg)); } close(ready[1]); close(go[0]); if (read(ready[0], &buf, 1) == -1) err(1, "unable to read child_ready_pipe"); close(ready[0]); } for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; for(i=0; i < num_fds; i++) { int is_group_leader; /* boolean */ is_group_leader = perf_is_group_leader(fds, i); if (is_group_leader) { /* this is the group leader */ group_fd = -1; } else { group_fd = fds[fds[i].group_leader].fd; } /* * create leader disabled with enable_on-exec */ if (!options.pid) { fds[i].hw.disabled = is_group_leader; fds[i].hw.enable_on_exec = is_group_leader; } fds[i].hw.read_format = PERF_FORMAT_SCALE; /* request timing information necessary for scaling counts */ if (is_group_leader && options.format_group) fds[i].hw.read_format |= PERF_FORMAT_GROUP; if (options.inherit) fds[i].hw.inherit = 1; if (options.pin && is_group_leader) fds[i].hw.pinned = 1; fds[i].fd = perf_event_open(&fds[i].hw, pid, cpu, group_fd, 0); if (fds[i].fd == -1) { warn("cannot attach event%d %s", i, fds[i].name); goto error; } } } if (!options.pid && go[1] > -1) close(go[1]); if (options.print) { if (!options.pid) { while(waitpid(pid, &status, WNOHANG) == 0) { sleep(1); for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; print_counts(fds, num_fds, cpu); } } } else { while(quit == 0) { sleep(1); for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; print_counts(fds, num_fds, cpu); } } } } else { if (!options.pid) waitpid(pid, &status, 0); else { pause(); for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; for(i=0; i < num_fds; i++) ioctl(fds[i].fd, PERF_EVENT_IOC_DISABLE, 0); } } for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; print_counts(fds, num_fds, cpu); } } for (cpu=0; cpu < options.ncpus; cpu++) { fds = fds_cpus[cpu]; for(i=0; i < num_fds; i++) close(fds[i].fd); perf_free_fds(fds, num_fds); } /* free libpfm resources cleanly */ pfm_terminate(); return 0; error: free(fds); if (!options.pid) kill(SIGKILL, pid); /* free libpfm resources cleanly */ pfm_terminate(); return -1; } static void usage(void) { printf("usage: task_cpu [-h] [-i] [-g] [-p] [-P] [-t pid] [-e event1,event2,...] cmd\n" "-h\t\tget help\n" "-i\t\tinherit across fork\n" "-f\t\tuse PERF_FORMAT_GROUP for reading up counts (experimental, not working)\n" "-p\t\tprint counts every second\n" "-P\t\tpin events\n" "-t pid\tmeasure existing pid\n" "-e ev,ev\tgroup of events to measure (multiple -e switches are allowed)\n" ); } int main(int argc, char **argv) { int c; setlocale(LC_ALL, ""); while ((c=getopt(argc, argv,"+he:ifpPt:")) != -1) { switch(c) { case 'e': if (options.num_groups < MAX_GROUPS) { options.events[options.num_groups++] = optarg; } else { errx(1, "you cannot specify more than %d groups.\n", MAX_GROUPS); } break; case 'f': options.format_group = 1; break; case 'p': options.print = 1; break; case 'P': options.pin = 1; break; case 'i': options.inherit = 1; break; case 't': options.pid = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown error"); } } options.ncpus = sysconf(_SC_NPROCESSORS_ONLN); if (options.ncpus < 1) errx(1, "cannot determine number of online processors"); if (options.num_groups == 0) { options.events[0] = "cycles,instructions"; options.num_groups = 1; } if (!argv[optind] && !options.pid) errx(1, "you must specify a command to execute or a thread to attach to\n"); signal(SIGINT, sig_handler); return parent(argv+optind); } papi-5.3.0/src/libpfm4/perf_examples/syst.c0000600003276200002170000001264612247131124020274 0ustar ralphundrgrad/* * syst.c - example of a simple system wide monitoring program * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" typedef struct { const char *events; int delay; int excl; int cpu; int group; } options_t; static options_t options; static perf_event_desc_t **all_fds; static int *num_fds; void setup_cpu(int cpu) { perf_event_desc_t *fds; int i, ret; ret = perf_setup_list_events(options.events, &all_fds[cpu], &num_fds[cpu]); if (ret || (num_fds == 0)) errx(1, "cannot setup events\n"); fds = all_fds[cpu]; /* temp */ fds[0].fd = -1; for(i=0; i < num_fds[cpu]; i++) { fds[i].hw.disabled = options.group ? !i : 1; if (options.excl && ((options.group && !i) || (!options.group))) fds[i].hw.exclusive = 1; fds[i].hw.disabled = options.group ? !i : 1; /* request timing information necessary for scaling counts */ fds[i].hw.read_format = PERF_FORMAT_SCALE; fds[i].fd = perf_event_open(&fds[i].hw, -1, cpu, (options.group ? fds[0].fd : -1), 0); if (fds[i].fd == -1) err(1, "cannot attach event to CPU%d %s", cpu, fds[i].name); } } void measure(void) { perf_event_desc_t *fds; long lret; int c, cmin, cmax, ncpus; int i, ret, l; printf("\n", options.delay); cmin = 0; lret = sysconf(_SC_NPROCESSORS_ONLN); if (lret < 0) err(1, "cannot get number of online processors"); cmax = (int)lret; ncpus = cmax; if (options.cpu != -1) { cmin = options.cpu; cmax = cmin + 1; } all_fds = calloc(ncpus, sizeof(perf_event_desc_t *)); num_fds = calloc(ncpus, sizeof(int)); if (!all_fds || !num_fds) err(1, "cannot allocate memory for internal structures"); for(c=cmin ; c < cmax; c++) setup_cpu(c); /* * FIX this for hotplug CPU */ for(c=cmin ; c < cmax; c++) { fds = all_fds[c]; if (options.group) ret = ioctl(fds[0].fd, PERF_EVENT_IOC_ENABLE, 0); else for(i=0; i < num_fds[c]; i++) { ret = ioctl(fds[i].fd, PERF_EVENT_IOC_ENABLE, 0); if (ret) err(1, "cannot enable event %s\n", fds[i].name); } } for(l=0; l < options.delay; l++) { sleep(1); puts("------------------------"); for(c = cmin; c < cmax; c++) { fds = all_fds[c]; for(i=0; i < num_fds[c]; i++) { uint64_t val, delta; double ratio; ret = read(fds[i].fd, fds[i].values, sizeof(fds[i].values)); if (ret != sizeof(fds[i].values)) { if (ret == -1) err(1, "cannot read event %d:%d", i, ret); else warnx("could not read event%d", i); } /* * scaling because we may be sharing the PMU and * thus may be multiplexed */ val = perf_scale(fds[i].values); ratio = perf_scale_ratio(fds[i].values); delta = perf_scale_delta(fds[i].values, fds[i].prev_values); printf("CPU%d val=%-20"PRIu64" %-20"PRIu64" raw=%"PRIu64" ena=%"PRIu64" run=%"PRIu64" ratio=%.2f %s\n", c, val, delta, fds[i].values[0], fds[i].values[1], fds[i].values[2], ratio, fds[i].name); fds[i].prev_values[0] = fds[i].values[0]; fds[i].prev_values[1] = fds[i].values[1]; fds[i].prev_values[2] = fds[i].values[2]; } } } for(c = cmin; c < cmax; c++) { fds = all_fds[c]; for(i=0; i < num_fds[c]; i++) close(fds[i].fd); perf_free_fds(fds, num_fds[c]); } } static void usage(void) { printf("usage: syst [-c cpu] [-x] [-h] [-d delay] [-g] [-e event1,event2,...]\n"); } int main(int argc, char **argv) { int c, ret; options.cpu = -1; while ((c=getopt(argc, argv,"hc:e:d:gx")) != -1) { switch(c) { case 'x': options.excl = 1; break; case 'e': options.events = optarg; break; case 'c': options.cpu = atoi(optarg); break; case 'g': options.group = 1; break; case 'd': options.delay = atoi(optarg); break; case 'h': usage(); exit(0); default: errx(1, "unknown error"); } } if (!options.delay) options.delay = 20; if (!options.events) options.events = "cycles,instructions"; ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "libpfm initialization failed: %s\n", pfm_strerror(ret)); measure(); /* free libpfm resources cleanly */ pfm_terminate(); return 0; } papi-5.3.0/src/libpfm4/perf_examples/task_smpl.c0000600003276200002170000002420712247131124021263 0ustar ralphundrgrad/* * task_smpl.c - example of a task sampling another one using a randomized sampling period * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2003-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "perf_util.h" #define SMPL_PERIOD 240000000ULL typedef struct { int opt_no_show; int opt_inherit; int mem_mode; int branch_mode; int cpu; int mmap_pages; char *events; FILE *output_file; } options_t; static jmp_buf jbuf; static uint64_t collected_samples, lost_samples; static perf_event_desc_t *fds; static int num_fds; static options_t options; static struct option the_options[]={ { "help", 0, 0, 1}, { "no-show", 0, &options.opt_no_show, 1}, { 0, 0, 0, 0} }; static char *gen_events = "cycles,instructions"; static void cld_handler(int n) { longjmp(jbuf, 1); } int child(char **arg) { execvp(arg[0], arg); /* not reached */ return -1; } struct timeval last_read, this_read; static void process_smpl_buf(perf_event_desc_t *hw) { struct perf_event_header ehdr; int ret; for(;;) { ret = perf_read_buffer(hw, &ehdr, sizeof(ehdr)); if (ret) return; /* nothing to read */ if (options.opt_no_show) { perf_skip_buffer(hw, ehdr.size - sizeof(ehdr)); continue; } switch(ehdr.type) { case PERF_RECORD_SAMPLE: collected_samples++; ret = perf_display_sample(fds, num_fds, hw - fds, &ehdr, options.output_file); if (ret) errx(1, "cannot parse sample"); break; case PERF_RECORD_EXIT: display_exit(hw, options.output_file); break; case PERF_RECORD_LOST: lost_samples += display_lost(hw, fds, num_fds, options.output_file); break; case PERF_RECORD_THROTTLE: display_freq(1, hw, options.output_file); break; case PERF_RECORD_UNTHROTTLE: display_freq(0, hw, options.output_file); break; default: printf("unknown sample type %d\n", ehdr.type); perf_skip_buffer(hw, ehdr.size - sizeof(ehdr)); } } } int mainloop(char **arg) { static uint64_t ovfl_count; /* static to avoid setjmp issue */ struct pollfd pollfds[1]; sigset_t bmask; int go[2], ready[2]; uint64_t *val; size_t sz, pgsz; size_t map_size = 0; pid_t pid; int status, ret; int i; char buf; if (pfm_initialize() != PFM_SUCCESS) errx(1, "libpfm initialization failed\n"); pgsz = sysconf(_SC_PAGESIZE); map_size = (options.mmap_pages+1)*pgsz; /* * does allocate fds */ ret = perf_setup_list_events(options.events, &fds, &num_fds); if (ret || !num_fds) errx(1, "cannot setup event list"); memset(pollfds, 0, sizeof(pollfds)); ret = pipe(ready); if (ret) err(1, "cannot create pipe ready"); ret = pipe(go); if (ret) err(1, "cannot create pipe go"); /* * Create the child task */ if ((pid=fork()) == -1) err(1, "cannot fork process\n"); if (pid == 0) { close(ready[0]); close(go[1]); /* * let the parent know we exist */ close(ready[1]); if (read(go[0], &buf, 1) == -1) err(1, "unable to read go_pipe"); exit(child(arg)); } close(ready[1]); close(go[0]); if (read(ready[0], &buf, 1) == -1) err(1, "unable to read child_ready_pipe"); close(ready[0]); fds[0].fd = -1; if (!fds[0].hw.sample_period) errx(1, "need to set sampling period or freq on first event, use :period= or :freq="); for(i=0; i < num_fds; i++) { if (i == 0) { fds[i].hw.disabled = 1; fds[i].hw.enable_on_exec = 1; /* start immediately */ } else fds[i].hw.disabled = 0; if (options.opt_inherit) fds[i].hw.inherit = 1; if (fds[i].hw.sample_period) { /* * set notification threshold to be halfway through the buffer */ fds[i].hw.wakeup_watermark = (options.mmap_pages*pgsz) / 2; fds[i].hw.watermark = 1; fds[i].hw.sample_type = PERF_SAMPLE_IP|PERF_SAMPLE_TID|PERF_SAMPLE_READ|PERF_SAMPLE_TIME|PERF_SAMPLE_PERIOD|PERF_SAMPLE_STREAM_ID; fprintf(options.output_file,"%s period=%"PRIu64" freq=%d\n", fds[i].name, fds[i].hw.sample_period, fds[i].hw.freq); fds[i].hw.read_format = PERF_FORMAT_SCALE; if (num_fds > 1) fds[i].hw.read_format |= PERF_FORMAT_GROUP|PERF_FORMAT_ID; if (fds[i].hw.freq) fds[i].hw.sample_type |= PERF_SAMPLE_PERIOD; if (options.mem_mode) fds[i].hw.sample_type |= PERF_SAMPLE_WEIGHT | PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_ADDR; if (options.branch_mode) { fds[i].hw.sample_type |= PERF_SAMPLE_BRANCH_STACK; fds[i].hw.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; } } fds[i].fd = perf_event_open(&fds[i].hw, pid, options.cpu, fds[0].fd, 0); if (fds[i].fd == -1) { if (fds[i].hw.precise_ip) err(1, "cannot attach event %s: precise mode may not be supported", fds[i].name); err(1, "cannot attach event %s", fds[i].name); } } /* * kernel adds the header page to the size of the mmapped region */ fds[0].buf = mmap(NULL, map_size, PROT_READ|PROT_WRITE, MAP_SHARED, fds[0].fd, 0); if (fds[0].buf == MAP_FAILED) err(1, "cannot mmap buffer"); /* does not include header page */ fds[0].pgmsk = (options.mmap_pages*pgsz)-1; /* * send samples for all events to first event's buffer */ for (i = 1; i < num_fds; i++) { if (!fds[i].hw.sample_period) continue; ret = ioctl(fds[i].fd, PERF_EVENT_IOC_SET_OUTPUT, fds[0].fd); if (ret) err(1, "cannot redirect sampling output"); } /* * we are using PERF_FORMAT_GROUP, therefore the structure * of val is as follows: * * { u64 nr; * { u64 time_enabled; } && PERF_FORMAT_ENABLED * { u64 time_running; } && PERF_FORMAT_RUNNING * { u64 value; * { u64 id; } && PERF_FORMAT_ID * } cntr[nr]; * We are skipping the first 3 values (nr, time_enabled, time_running) * and then for each event we get a pair of values. */ if (num_fds > 1 && fds[0].fd > -1) { sz = (3+2*num_fds)*sizeof(uint64_t); val = malloc(sz); if (!val) err(1, "cannot allocate memory"); ret = read(fds[0].fd, val, sz); if (ret == -1) err(1, "cannot read id %zu", sizeof(val)); for(i=0; i < num_fds; i++) { fds[i].id = val[2*i+1+3]; fprintf(options.output_file,"%"PRIu64" %s\n", fds[i].id, fds[i].name); } free(val); } pollfds[0].fd = fds[0].fd; pollfds[0].events = POLLIN; for(i=0; i < num_fds; i++) { ret = ioctl(fds[i].fd, PERF_EVENT_IOC_ENABLE, 0); if (ret) err(1, "cannot enable event %s\n", fds[i].name); } signal(SIGCHLD, cld_handler); close(go[1]); if (setjmp(jbuf) == 1) goto terminate_session; sigemptyset(&bmask); sigaddset(&bmask, SIGCHLD); /* * core loop */ for(;;) { ret = poll(pollfds, 1, -1); if (ret < 0 && errno == EINTR) break; ovfl_count++; ret = sigprocmask(SIG_SETMASK, &bmask, NULL); if (ret) err(1, "setmask"); process_smpl_buf(&fds[0]); ret = sigprocmask(SIG_UNBLOCK, &bmask, NULL); if (ret) err(1, "unblock"); } terminate_session: /* * cleanup child */ wait4(pid, &status, 0, NULL); for(i=0; i < num_fds; i++) close(fds[i].fd); /* check for partial event buffer */ process_smpl_buf(&fds[0]); munmap(fds[0].buf, map_size); perf_free_fds(fds, num_fds); fprintf(options.output_file, "%"PRIu64" samples collected in %"PRIu64" poll events, %"PRIu64" lost samples\n", collected_samples, ovfl_count, lost_samples); /* free libpfm resources cleanly */ pfm_terminate(); fclose(options.output_file); return 0; } static void usage(void) { printf("usage: task_smpl [-h] [--help] [-i] [-c cpu] [-m mmap_pages] [-M] [-b] [-o output_file] [-e event1,...,eventn] cmd\n"); } int main(int argc, char **argv) { int c; setlocale(LC_ALL, ""); options.cpu = -1; options.output_file=stdout; while ((c=getopt_long(argc, argv,"+he:m:ic:o:Mb", the_options, 0)) != -1) { switch(c) { case 0: continue; case 'e': if (options.events) errx(1, "events specified twice\n"); options.events = optarg; break; case 'i': options.opt_inherit = 1; break; case 'm': if (options.mmap_pages) errx(1, "mmap pages already set\n"); options.mmap_pages = atoi(optarg); break; case 'M': options.mem_mode = 1; break; case 'b': options.branch_mode = 1; break; case 'c': options.cpu = atoi(optarg); break; case 'o': options.output_file=fopen(optarg,"w"); if (options.output_file==NULL) { printf("Invalid filename %s\n", optarg); exit(0); } break; case 'h': usage(); exit(0); default: errx(1, "unknown option"); } } if (argv[optind] == NULL) errx(1, "you must specify a command to execute\n"); if (!options.events) options.events = strdup(gen_events); if (!options.mmap_pages) options.mmap_pages = 1; if (options.mmap_pages > 1 && ((options.mmap_pages) & 0x1)) errx(1, "number of pages must be power of 2\n"); return mainloop(argv+optind); } papi-5.3.0/src/libpfm4/include/0000700003276200002170000000000012247131123015703 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/include/Makefile0000600003276200002170000000305712247131123017352 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk HEADERS= perfmon/pfmlib.h \ perfmon/perf_event.h \ perfmon/pfmlib_perf_event.h dir: -mkdir -p $(DESTDIR)$(INCDIR)/perfmon install: dir $(INSTALL) -m 644 $(HEADERS) $(DESTDIR)$(INCDIR)/perfmon .PHONY: all clean distclean depend dir papi-5.3.0/src/libpfm4/include/perfmon/0000700003276200002170000000000012247131123017351 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/include/perfmon/err.h0000700003276200002170000000325612247131123020323 0ustar ralphundrgrad/* * err.h: substitute header for compiling on Windows with MingGW * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PFM_ERR_H__ #define __PFM_ERR_H__ #ifndef PFMLIB_WINDOWS #include #else /* PFMLIB_WINDOWS */ #define warnx(...) do { \ fprintf (stderr, __VA_ARGS__); \ fprintf (stderr, "\n"); \ } while (0) #define errx(code, ...) do { \ fprintf (stderr, __VA_ARGS__); \ fprintf (stderr, "\n"); \ exit (code); \ } while (0) #define err(code, ...) do { \ fprintf (stderr, __VA_ARGS__); \ fprintf (stderr, " : %s\n", strerror(errno)); \ exit (code); \ } while (0) #endif #endif /* __PFM_ERR_H__ */ papi-5.3.0/src/libpfm4/include/perfmon/pfmlib.h0000600003276200002170000003605112247131123021002 0ustar ralphundrgrad/* * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2001-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PFMLIB_H__ #define __PFMLIB_H__ #ifdef __cplusplus extern "C" { #endif #include #include #include #include #define LIBPFM_VERSION (4 << 16 | 0) #define PFM_MAJ_VERSION(v) ((v)>>16) #define PFM_MIN_VERSION(v) ((v) & 0xffff) /* * ABI revision level */ #define LIBPFM_ABI_VERSION 0 /* * priv level mask (for dfl_plm) */ #define PFM_PLM0 0x01 /* kernel */ #define PFM_PLM1 0x02 /* not yet used */ #define PFM_PLM2 0x04 /* not yet used */ #define PFM_PLM3 0x08 /* priv level 3, 2, 1 (x86) */ #define PFM_PLMH 0x10 /* hypervisor */ /* * Performance Event Source * * The source is what is providing events. * It can be: * - Hardware Performance Monitoring Unit (PMU) * - a particular kernel subsystem * * Identifiers are guaranteed constant across libpfm revisions * * New sources must be added at the end before PFM_PMU_MAX */ typedef enum { PFM_PMU_NONE= 0, /* no PMU */ PFM_PMU_GEN_IA64, /* Intel IA-64 architected PMU */ PFM_PMU_ITANIUM, /* Intel Itanium */ PFM_PMU_ITANIUM2, /* Intel Itanium 2 */ PFM_PMU_MONTECITO, /* Intel Dual-Core Itanium 2 9000 */ PFM_PMU_AMD64, /* AMD AMD64 (obsolete) */ PFM_PMU_I386_P6, /* Intel PIII (P6 core) */ PFM_PMU_INTEL_NETBURST, /* Intel Netburst (Pentium 4) */ PFM_PMU_INTEL_NETBURST_P, /* Intel Netburst Prescott (Pentium 4) */ PFM_PMU_COREDUO, /* Intel Core Duo/Core Solo */ PFM_PMU_I386_PM, /* Intel Pentium M */ PFM_PMU_INTEL_CORE, /* Intel Core */ PFM_PMU_INTEL_PPRO, /* Intel Pentium Pro */ PFM_PMU_INTEL_PII, /* Intel Pentium II */ PFM_PMU_INTEL_ATOM, /* Intel Atom */ PFM_PMU_INTEL_NHM, /* Intel Nehalem core PMU */ PFM_PMU_INTEL_NHM_EX, /* Intel Nehalem-EX core PMU */ PFM_PMU_INTEL_NHM_UNC, /* Intel Nehalem uncore PMU */ PFM_PMU_INTEL_X86_ARCH, /* Intel X86 architectural PMU */ PFM_PMU_MIPS_20KC, /* MIPS 20KC */ PFM_PMU_MIPS_24K, /* MIPS 24K */ PFM_PMU_MIPS_25KF, /* MIPS 25KF */ PFM_PMU_MIPS_34K, /* MIPS 34K */ PFM_PMU_MIPS_5KC, /* MIPS 5KC */ PFM_PMU_MIPS_74K, /* MIPS 74K */ PFM_PMU_MIPS_R10000, /* MIPS R10000 */ PFM_PMU_MIPS_R12000, /* MIPS R12000 */ PFM_PMU_MIPS_RM7000, /* MIPS RM7000 */ PFM_PMU_MIPS_RM9000, /* MIPS RM9000 */ PFM_PMU_MIPS_SB1, /* MIPS SB1/SB1A */ PFM_PMU_MIPS_VR5432, /* MIPS VR5432 */ PFM_PMU_MIPS_VR5500, /* MIPS VR5500 */ PFM_PMU_MIPS_ICE9A, /* SiCortex ICE9A */ PFM_PMU_MIPS_ICE9B, /* SiCortex ICE9B */ PFM_PMU_POWERPC, /* POWERPC */ PFM_PMU_CELL, /* IBM CELL */ PFM_PMU_SPARC_ULTRA12, /* UltraSPARC I, II, IIi, and IIe */ PFM_PMU_SPARC_ULTRA3, /* UltraSPARC III */ PFM_PMU_SPARC_ULTRA3I, /* UltraSPARC IIIi and IIIi+ */ PFM_PMU_SPARC_ULTRA3PLUS, /* UltraSPARC III+ and IV */ PFM_PMU_SPARC_ULTRA4PLUS, /* UltraSPARC IV+ */ PFM_PMU_SPARC_NIAGARA1, /* Niagara-1 */ PFM_PMU_SPARC_NIAGARA2, /* Niagara-2 */ PFM_PMU_PPC970, /* IBM PowerPC 970(FX,GX) */ PFM_PMU_PPC970MP, /* IBM PowerPC 970MP */ PFM_PMU_POWER3, /* IBM POWER3 */ PFM_PMU_POWER4, /* IBM POWER4 */ PFM_PMU_POWER5, /* IBM POWER5 */ PFM_PMU_POWER5p, /* IBM POWER5+ */ PFM_PMU_POWER6, /* IBM POWER6 */ PFM_PMU_POWER7, /* IBM POWER7 */ PFM_PMU_PERF_EVENT, /* perf_event PMU */ PFM_PMU_INTEL_WSM, /* Intel Westmere single-socket (Clarkdale) */ PFM_PMU_INTEL_WSM_DP, /* Intel Westmere dual-socket (Westmere-EP, Gulftwon) */ PFM_PMU_INTEL_WSM_UNC, /* Intel Westmere uncore PMU */ PFM_PMU_AMD64_K7, /* AMD AMD64 K7 */ PFM_PMU_AMD64_K8_REVB, /* AMD AMD64 K8 RevB */ PFM_PMU_AMD64_K8_REVC, /* AMD AMD64 K8 RevC */ PFM_PMU_AMD64_K8_REVD, /* AMD AMD64 K8 RevD */ PFM_PMU_AMD64_K8_REVE, /* AMD AMD64 K8 RevE */ PFM_PMU_AMD64_K8_REVF, /* AMD AMD64 K8 RevF */ PFM_PMU_AMD64_K8_REVG, /* AMD AMD64 K8 RevG */ PFM_PMU_AMD64_FAM10H_BARCELONA, /* AMD AMD64 Fam10h Barcelona RevB */ PFM_PMU_AMD64_FAM10H_SHANGHAI, /* AMD AMD64 Fam10h Shanghai RevC */ PFM_PMU_AMD64_FAM10H_ISTANBUL, /* AMD AMD64 Fam10h Istanbul RevD */ PFM_PMU_ARM_CORTEX_A8, /* ARM Cortex A8 */ PFM_PMU_ARM_CORTEX_A9, /* ARM Cortex A9 */ PFM_PMU_TORRENT, /* IBM Torrent hub chip */ PFM_PMU_INTEL_SNB, /* Intel Sandy Bridge (single socket) */ PFM_PMU_AMD64_FAM14H_BOBCAT, /* AMD AMD64 Fam14h Bobcat */ PFM_PMU_AMD64_FAM15H_INTERLAGOS,/* AMD AMD64 Fam15h Interlagos */ PFM_PMU_INTEL_SNB_EP, /* Intel SandyBridge EP */ PFM_PMU_AMD64_FAM12H_LLANO, /* AMD AMD64 Fam12h Llano */ PFM_PMU_AMD64_FAM11H_TURION, /* AMD AMD64 Fam11h Turion */ PFM_PMU_INTEL_IVB, /* Intel IvyBridge */ PFM_PMU_ARM_CORTEX_A15, /* ARM Cortex A15 */ PFM_PMU_INTEL_SNB_UNC_CB0, /* Intel SandyBridge C-box 0 uncore PMU */ PFM_PMU_INTEL_SNB_UNC_CB1, /* Intel SandyBridge C-box 1 uncore PMU */ PFM_PMU_INTEL_SNB_UNC_CB2, /* Intel SandyBridge C-box 2 uncore PMU */ PFM_PMU_INTEL_SNB_UNC_CB3, /* Intel SandyBridge C-box 3 uncore PMU */ PFM_PMU_INTEL_SNBEP_UNC_CB0, /* Intel SandyBridge-EP C-Box core 0 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB1, /* Intel SandyBridge-EP C-Box core 1 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB2, /* Intel SandyBridge-EP C-Box core 2 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB3, /* Intel SandyBridge-EP C-Box core 3 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB4, /* Intel SandyBridge-EP C-Box core 4 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB5, /* Intel SandyBridge-EP C-Box core 5 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB6, /* Intel SandyBridge-EP C-Box core 6 uncore */ PFM_PMU_INTEL_SNBEP_UNC_CB7, /* Intel SandyBridge-EP C-Box core 7 uncore */ PFM_PMU_INTEL_SNBEP_UNC_HA, /* Intel SandyBridge-EP HA uncore */ PFM_PMU_INTEL_SNBEP_UNC_IMC0, /* Intel SandyBridge-EP IMC socket 0 uncore */ PFM_PMU_INTEL_SNBEP_UNC_IMC1, /* Intel SandyBridge-EP IMC socket 1 uncore */ PFM_PMU_INTEL_SNBEP_UNC_IMC2, /* Intel SandyBridge-EP IMC socket 2 uncore */ PFM_PMU_INTEL_SNBEP_UNC_IMC3, /* Intel SandyBridge-EP IMC socket 3 uncore */ PFM_PMU_INTEL_SNBEP_UNC_PCU, /* Intel SandyBridge-EP PCU uncore */ PFM_PMU_INTEL_SNBEP_UNC_QPI0, /* Intel SandyBridge-EP QPI link 0 uncore */ PFM_PMU_INTEL_SNBEP_UNC_QPI1, /* Intel SandyBridge-EP QPI link 1 uncore */ PFM_PMU_INTEL_SNBEP_UNC_UBOX, /* Intel SandyBridge-EP U-Box uncore */ PFM_PMU_INTEL_SNBEP_UNC_R2PCIE, /* Intel SandyBridge-EP R2PCIe uncore */ PFM_PMU_INTEL_SNBEP_UNC_R3QPI0, /* Intel SandyBridge-EP R3QPI 0 uncore */ PFM_PMU_INTEL_SNBEP_UNC_R3QPI1, /* Intel SandyBridge-EP R3QPI 1 uncore */ PFM_PMU_INTEL_KNC, /* Intel Knights Corner (Xeon Phi) */ PFM_PMU_S390X_CPUM_CF, /* s390x: CPU-M counter facility */ PFM_PMU_ARM_1176, /* ARM 1176 */ PFM_PMU_INTEL_IVB_EP, /* Intel IvyBridge EP */ PFM_PMU_INTEL_HSW, /* Intel Haswell */ PFM_PMU_INTEL_IVB_UNC_CB0, /* Intel IvyBridge C-box 0 uncore PMU */ PFM_PMU_INTEL_IVB_UNC_CB1, /* Intel IvyBridge C-box 1 uncore PMU */ PFM_PMU_INTEL_IVB_UNC_CB2, /* Intel IvyBridge C-box 2 uncore PMU */ PFM_PMU_INTEL_IVB_UNC_CB3, /* Intel IvyBridge C-box 3 uncore PMU */ PFM_PMU_POWER8, /* IBM POWER8 */ /* MUST ADD NEW PMU MODELS HERE */ PFM_PMU_MAX /* end marker */ } pfm_pmu_t; typedef enum { PFM_PMU_TYPE_UNKNOWN=0, /* unknown PMU type */ PFM_PMU_TYPE_CORE, /* processor core PMU */ PFM_PMU_TYPE_UNCORE, /* processor socket-level PMU */ PFM_PMU_TYPE_OS_GENERIC,/* generic OS-provided PMU */ PFM_PMU_TYPE_MAX } pfm_pmu_type_t; typedef enum { PFM_ATTR_NONE=0, /* no attribute */ PFM_ATTR_UMASK, /* unit mask */ PFM_ATTR_MOD_BOOL, /* register modifier */ PFM_ATTR_MOD_INTEGER, /* register modifier */ PFM_ATTR_RAW_UMASK, /* raw umask (not user visible) */ PFM_ATTR_MAX /* end-marker */ } pfm_attr_t; /* * define additional event data types beyond historic uint64 * what else can fit in 64 bits? */ typedef enum { PFM_DTYPE_UNKNOWN=0, /* unkown */ PFM_DTYPE_UINT64, /* uint64 */ PFM_DTYPE_INT64, /* int64 */ PFM_DTYPE_DOUBLE, /* IEEE double precision float */ PFM_DTYPE_FIXED, /* 32.32 fixed point */ PFM_DTYPE_RATIO, /* 32/32 integer ratio */ PFM_DTYPE_CHAR8, /* 8 char unterminated string */ PFM_DTYPE_MAX /* end-marker */ } pfm_dtype_t; /* * event attribute control: which layer is controlling * the attribute could be PMU, OS APIs */ typedef enum { PFM_ATTR_CTRL_UNKNOWN = 0, /* unknown */ PFM_ATTR_CTRL_PMU, /* PMU hardware */ PFM_ATTR_CTRL_PERF_EVENT, /* perf_events kernel interface */ PFM_ATTR_CTRL_MAX } pfm_attr_ctrl_t; /* * OS layer * Used when querying event or attribute information */ typedef enum { PFM_OS_NONE = 0, /* only PMU */ PFM_OS_PERF_EVENT, /* perf_events PMU attribute subset + PMU */ PFM_OS_PERF_EVENT_EXT, /* perf_events all attributes + PMU */ PFM_OS_MAX, } pfm_os_t; /* SWIG doesn't deal well with anonymous nested structures */ #ifdef SWIG #define SWIG_NAME(x) x #else #define SWIG_NAME(x) #endif /* SWIG */ /* * special data type for libpfm error value used to help * with Python support and in particular for SWIG. By using * a specific type we can detect library calls and trap errors * in one SWIG statement as opposed to having to keep track of * each call individually. Programs can use 'int' safely for * the return value. */ typedef int pfm_err_t; /* error if !PFM_SUCCESS */ typedef int os_err_t; /* error if a syscall fails */ typedef struct { const char *name; /* event name */ const char *desc; /* event description */ size_t size; /* struct sizeof */ pfm_pmu_t pmu; /* PMU identification */ pfm_pmu_type_t type; /* PMU type */ int nevents; /* how many events for this PMU */ int first_event; /* opaque index of first event */ int max_encoding; /* max number of uint64_t to encode an event */ int num_cntrs; /* number of generic counters */ int num_fixed_cntrs;/* number of fixed counters */ struct { unsigned int is_present:1; /* present on host system */ unsigned int is_dfl:1; /* is architecture default PMU */ unsigned int reserved_bits:30; } SWIG_NAME(flags); } pfm_pmu_info_t; typedef struct { const char *name; /* event name */ const char *desc; /* event description */ const char *equiv; /* event is equivalent to */ size_t size; /* struct sizeof */ uint64_t code; /* event raw code (not encoding) */ pfm_pmu_t pmu; /* which PMU */ pfm_dtype_t dtype; /* data type of event value */ int idx; /* unique event identifier */ int nattrs; /* number of attributes */ int reserved; /* for future use */ struct { unsigned int is_precise:1; /* precise sampling (Intel X86=PEBS) */ unsigned int reserved_bits:31; } SWIG_NAME(flags); } pfm_event_info_t; typedef struct { const char *name; /* attribute symbolic name */ const char *desc; /* attribute description */ const char *equiv; /* attribute is equivalent to */ size_t size; /* struct sizeof */ uint64_t code; /* attribute code */ pfm_attr_t type; /* attribute type */ int idx; /* attribute opaque index */ pfm_attr_ctrl_t ctrl; /* what is providing attr */ struct { unsigned int is_dfl:1; /* is default umask */ unsigned int is_precise:1; /* Intel X86: supports PEBS */ unsigned int reserved_bits:30; } SWIG_NAME(flags); union { uint64_t dfl_val64; /* default 64-bit value */ const char *dfl_str; /* default string value */ int dfl_bool; /* default boolean value */ int dfl_int; /* default integer value */ } SWIG_NAME(defaults); } pfm_event_attr_info_t; /* * use with PFM_OS_NONE for pfm_get_os_event_encoding() */ typedef struct { uint64_t *codes; /* out/in: event codes array */ char **fstr; /* out/in: fully qualified event string */ size_t size; /* sizeof struct */ int count; /* out/in: # of elements in array */ int idx; /* out: unique event identifier */ } pfm_pmu_encode_arg_t; #if __WORDSIZE == 64 #define PFM_PMU_INFO_ABI0 56 #define PFM_EVENT_INFO_ABI0 64 #define PFM_ATTR_INFO_ABI0 64 #define PFM_RAW_ENCODE_ABI0 32 #else #define PFM_PMU_INFO_ABI0 44 #define PFM_EVENT_INFO_ABI0 48 #define PFM_ATTR_INFO_ABI0 48 #define PFM_RAW_ENCODE_ABI0 20 #endif /* * initialization, configuration, errors */ extern pfm_err_t pfm_initialize(void); extern void pfm_terminate(void); extern const char *pfm_strerror(int code); extern int pfm_get_version(void); /* * PMU API */ extern pfm_err_t pfm_get_pmu_info(pfm_pmu_t pmu, pfm_pmu_info_t *output); /* * event API */ extern int pfm_get_event_next(int idx); extern int pfm_find_event(const char *str); extern pfm_err_t pfm_get_event_info(int idx, pfm_os_t os, pfm_event_info_t *output); /* * event encoding API * * content of args depends on value of os (refer to man page) */ extern pfm_err_t pfm_get_os_event_encoding(const char *str, int dfl_plm, pfm_os_t os, void *args); /* * attribute API */ extern pfm_err_t pfm_get_event_attr_info(int eidx, int aidx, pfm_os_t os, pfm_event_attr_info_t *output); /* * library validation API */ extern pfm_err_t pfm_pmu_validate(pfm_pmu_t pmu_id, FILE *fp); /* * older encoding API */ extern pfm_err_t pfm_get_event_encoding(const char *str, int dfl_plm, char **fstr, int *idx, uint64_t **codes, int *count); /* * error codes */ #define PFM_SUCCESS 0 /* success */ #define PFM_ERR_NOTSUPP -1 /* function not supported */ #define PFM_ERR_INVAL -2 /* invalid parameters */ #define PFM_ERR_NOINIT -3 /* library was not initialized */ #define PFM_ERR_NOTFOUND -4 /* event not found */ #define PFM_ERR_FEATCOMB -5 /* invalid combination of features */ #define PFM_ERR_UMASK -6 /* invalid or missing unit mask */ #define PFM_ERR_NOMEM -7 /* out of memory */ #define PFM_ERR_ATTR -8 /* invalid event attribute */ #define PFM_ERR_ATTR_VAL -9 /* invalid event attribute value */ #define PFM_ERR_ATTR_SET -10 /* attribute value already set */ #define PFM_ERR_TOOMANY -11 /* too many parameters */ #define PFM_ERR_TOOSMALL -12 /* parameter is too small */ /* * event, attribute iterators * must be used because no guarante indexes are contiguous * * for pmu, simply iterate over pfm_pmu_t enum and use * pfm_get_pmu_info() and the is_present field */ #define pfm_for_each_event_attr(x, z) \ for((x)=0; (x) < (z)->nattrs; (x) = (x)+1) #define pfm_for_all_pmus(x) \ for((x)= 0 ; (x) < PFM_PMU_MAX; (x)++) #ifdef __cplusplus /* extern C */ } #endif #endif /* __PFMLIB_H__ */ papi-5.3.0/src/libpfm4/include/perfmon/perf_event.h0000600003276200002170000003165112247131123021667 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PERFMON_PERF_EVENT_H__ #define __PERFMON_PERF_EVENT_H__ #include #include /* for syscall numbers */ #include #include /* for syscall stub macros */ #include /* for _IO */ #include /* for prctl() comamnds */ #ifdef __cplusplus extern "C" { #endif /* * avoid clashes with actual kernel header file */ #if !(defined(_LINUX_PERF_EVENT_H) || defined(_UAPI_LINUX_PERF_EVENT_H)) /* * attr->type field values */ enum perf_type_id { PERF_TYPE_HARDWARE = 0, PERF_TYPE_SOFTWARE = 1, PERF_TYPE_TRACEPOINT = 2, PERF_TYPE_HW_CACHE = 3, PERF_TYPE_RAW = 4, PERF_TYPE_BREAKPOINT = 5, PERF_TYPE_MAX }; /* * attr->config values for generic HW PMU events * * they get mapped onto actual events by the kernel */ enum perf_hw_id { PERF_COUNT_HW_CPU_CYCLES = 0, PERF_COUNT_HW_INSTRUCTIONS = 1, PERF_COUNT_HW_CACHE_REFERENCES = 2, PERF_COUNT_HW_CACHE_MISSES = 3, PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4, PERF_COUNT_HW_BRANCH_MISSES = 5, PERF_COUNT_HW_BUS_CYCLES = 6, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7, PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8, PERF_COUNT_HW_REF_CPU_CYCLES = 9, PERF_COUNT_HW_MAX }; /* * attr->config values for generic HW cache events * * they get mapped onto actual events by the kernel */ enum perf_hw_cache_id { PERF_COUNT_HW_CACHE_L1D = 0, PERF_COUNT_HW_CACHE_L1I = 1, PERF_COUNT_HW_CACHE_LL = 2, PERF_COUNT_HW_CACHE_DTLB = 3, PERF_COUNT_HW_CACHE_ITLB = 4, PERF_COUNT_HW_CACHE_BPU = 5, PERF_COUNT_HW_CACHE_NODE = 6, PERF_COUNT_HW_CACHE_MAX }; enum perf_hw_cache_op_id { PERF_COUNT_HW_CACHE_OP_READ = 0, PERF_COUNT_HW_CACHE_OP_WRITE = 1, PERF_COUNT_HW_CACHE_OP_PREFETCH = 2, PERF_COUNT_HW_CACHE_OP_MAX }; enum perf_hw_cache_op_result_id { PERF_COUNT_HW_CACHE_RESULT_ACCESS = 0, PERF_COUNT_HW_CACHE_RESULT_MISS = 1, PERF_COUNT_HW_CACHE_RESULT_MAX }; /* * attr->config values for SW events */ enum perf_sw_ids { PERF_COUNT_SW_CPU_CLOCK = 0, PERF_COUNT_SW_TASK_CLOCK = 1, PERF_COUNT_SW_PAGE_FAULTS = 2, PERF_COUNT_SW_CONTEXT_SWITCHES = 3, PERF_COUNT_SW_CPU_MIGRATIONS = 4, PERF_COUNT_SW_PAGE_FAULTS_MIN = 5, PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6, PERF_COUNT_SW_ALIGNMENT_FAULTS = 7, PERF_COUNT_SW_EMULATION_FAULTS = 8, PERF_COUNT_SW_MAX }; /* * attr->sample_type values */ enum perf_event_sample_format { PERF_SAMPLE_IP = 1U << 0, PERF_SAMPLE_TID = 1U << 1, PERF_SAMPLE_TIME = 1U << 2, PERF_SAMPLE_ADDR = 1U << 3, PERF_SAMPLE_READ = 1U << 4, PERF_SAMPLE_CALLCHAIN = 1U << 5, PERF_SAMPLE_ID = 1U << 6, PERF_SAMPLE_CPU = 1U << 7, PERF_SAMPLE_PERIOD = 1U << 8, PERF_SAMPLE_STREAM_ID = 1U << 9, PERF_SAMPLE_RAW = 1U << 10, PERF_SAMPLE_BRANCH_STACK = 1U << 11, PERF_SAMPLE_REGS_USER = 1U << 12, PERF_SAMPLE_STACK_USER = 1U << 13, PERF_SAMPLE_WEIGHT = 1U << 14, PERF_SAMPLE_DATA_SRC = 1U << 15, PERF_SAMPLE_MAX = 1U << 16, }; /* * branch_sample_type values */ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_USER = 1U << 0, PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, PERF_SAMPLE_BRANCH_HV = 1U << 2, PERF_SAMPLE_BRANCH_ANY = 1U << 3, PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 5, PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, PERF_SAMPLE_BRANCH_MAX = 1U << 7, }; enum perf_sample_regs_abi { PERF_SAMPLE_REGS_ABI_NONE = 0, PERF_SAMPLE_REGS_ABI_32 = 1, PERF_SAMPLE_REGS_ABI_64 = 2, }; /* * attr->read_format values */ enum perf_event_read_format { PERF_FORMAT_TOTAL_TIME_ENABLED = 1U << 0, PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1, PERF_FORMAT_ID = 1U << 2, PERF_FORMAT_GROUP = 1U << 3, PERF_FORMAT_MAX = 1U << 4, }; #define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */ #define PERF_ATTR_SIZE_VER1 72 /* add: config2 */ #define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */ /* * SWIG doesn't deal well with anonymous nested structures * so we add names for the nested structure only when swig * is used. */ #ifdef SWIG #define SWIG_NAME(x) x #else #define SWIG_NAME(x) #endif /* SWIG */ /* * perf_event_attr struct passed to perf_event_open() */ typedef struct perf_event_attr { uint32_t type; uint32_t size; uint64_t config; union { uint64_t sample_period; uint64_t sample_freq; } SWIG_NAME(sample); uint64_t sample_type; uint64_t read_format; uint64_t disabled : 1, inherit : 1, pinned : 1, exclusive : 1, exclude_user : 1, exclude_kernel : 1, exclude_hv : 1, exclude_idle : 1, mmap : 1, comm : 1, freq : 1, inherit_stat : 1, enable_on_exec : 1, task : 1, watermark : 1, precise_ip : 2, mmap_data : 1, sample_id_all : 1, exclude_host : 1, exclude_guest : 1, exclude_callchain_kernel : 1, exclude_callchain_user : 1, __reserved_1 : 41; union { uint32_t wakeup_events; uint32_t wakeup_watermark; } SWIG_NAME(wakeup); uint32_t bp_type; union { uint64_t bp_addr; uint64_t config1; /* extend config */ } SWIG_NAME(bpa); union { uint64_t bp_len; uint64_t config2; /* extend config1 */ } SWIG_NAME(bpb); uint64_t branch_sample_type; uint64_t sample_regs_user; uint32_t sample_stack_user; uint32_t __reserved_2; } perf_event_attr_t; struct perf_branch_entry { uint64_t from; uint64_t to; uint64_t mispred:1, /* target mispredicted */ predicted:1,/* target predicted */ reserved:62; }; /* * branch stack layout: * nr: number of taken branches stored in entries[] * * Note that nr can vary from sample to sample * branches (to, from) are stored from most recent * to least recent, i.e., entries[0] contains the most * recent branch. */ struct perf_branch_stack { uint64_t nr; struct perf_branch_entry entries[0]; }; /* * perf_events ioctl commands, use with event fd */ #define PERF_EVENT_IOC_ENABLE _IO ('$', 0) #define PERF_EVENT_IOC_DISABLE _IO ('$', 1) #define PERF_EVENT_IOC_REFRESH _IO ('$', 2) #define PERF_EVENT_IOC_RESET _IO ('$', 3) #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, uint64_t) #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) /* * ioctl() 3rd argument */ enum perf_event_ioc_flags { PERF_IOC_FLAG_GROUP = 1U << 0, }; /* * mmapped sampling buffer layout * occupies a 4kb page */ struct perf_event_mmap_page { uint32_t version; uint32_t compat_version; uint32_t lock; uint32_t index; int64_t offset; uint64_t time_enabled; uint64_t time_running; union { uint64_t capabilities; uint64_t cap_usr_time:1, cap_usr_rdpmc:1, cap_____res:62; } SWIG_NAME(rdmap_cap); uint16_t pmc_width; uint16_t time_shift; uint32_t time_mult; uint64_t time_offset; uint64_t __reserved[120]; uint64_t data_head; uint64_t data_tail; }; /* * sampling buffer event header */ struct perf_event_header { uint32_t type; uint16_t misc; uint16_t size; }; /* * event header misc field values */ #define PERF_EVENT_MISC_CPUMODE_MASK (3 << 0) #define PERF_EVENT_MISC_CPUMODE_UNKNOWN (0 << 0) #define PERF_EVENT_MISC_KERNEL (1 << 0) #define PERF_EVENT_MISC_USER (2 << 0) #define PERF_EVENT_MISC_HYPERVISOR (3 << 0) #define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0) #define PERF_RECORD_MISC_GUEST_USER (5 << 0) #define PERF_RECORD_MISC_EXACT (1 << 14) #define PERF_RECORD_MISC_EXACT_IP (1 << 14) #define PERF_RECORD_MISC_EXT_RESERVED (1 << 15) /* * header->type values */ enum perf_event_type { PERF_RECORD_MMAP = 1, PERF_RECORD_LOST = 2, PERF_RECORD_COMM = 3, PERF_RECORD_EXIT = 4, PERF_RECORD_THROTTLE = 5, PERF_RECORD_UNTHROTTLE = 6, PERF_RECORD_FORK = 7, PERF_RECORD_READ = 8, PERF_RECORD_SAMPLE = 9, PERF_RECORD_MAX }; enum perf_callchain_context { PERF_CONTEXT_HV = (uint64_t)-32, PERF_CONTEXT_KERNEL = (uint64_t)-128, PERF_CONTEXT_USER = (uint64_t)-512, PERF_CONTEXT_GUEST = (uint64_t)-2048, PERF_CONTEXT_GUEST_KERNEL = (uint64_t)-2176, PERF_CONTEXT_GUEST_USER = (uint64_t)-2560, PERF_CONTEXT_MAX = (uint64_t)-4095, }; /* * flags for perf_event_open() */ #define PERF_FLAG_FD_NO_GROUP (1U << 0) #define PERF_FLAG_FD_OUTPUT (1U << 1) #define PERF_FLAG_PID_CGROUP (1U << 2) #endif /* _LINUX_PERF_EVENT_H */ #ifndef __NR_perf_event_open #ifdef __x86_64__ # define __NR_perf_event_open 298 #endif #ifdef __i386__ # define __NR_perf_event_open 336 #endif #ifdef __powerpc__ # define __NR_perf_event_open 319 #endif #ifdef __arm__ #if defined(__ARM_EABI__) || defined(__thumb__) # define __NR_perf_event_open 364 #else # define __NR_perf_event_open (0x900000+364) #endif #endif #ifdef __mips__ #if _MIPS_SIM == _MIPS_SIM_ABI32 # define __NR_perf_event_open __NR_Linux + 333 #elif _MIPS_SIM == _MIPS_SIM_ABI64 # define __NR_perf_event_open __NR_Linux + 292 #else /* if _MIPS_SIM == MIPS_SIM_NABI32 */ # define __NR_perf_event_open __NR_Linux + 296 #endif #endif #endif /* __NR_perf_event_open */ /* * perf_event_open() syscall stub */ static inline int perf_event_open( struct perf_event_attr *hw_event_uptr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall( __NR_perf_event_open, hw_event_uptr, pid, cpu, group_fd, flags); } /* * compensate for some distros which do not * have recent enough linux/prctl.h */ #ifndef PR_TASK_PERF_EVENTS_DISABLE #define PR_TASK_PERF_EVENTS_ENABLE 32 #define PR_TASK_PERF_EVENTS_DISABLE 31 #endif /* handle case of older system perf_event.h included before this file */ #ifndef PERF_MEM_OP_NA union perf_mem_data_src { uint64_t val; struct { uint64_t mem_op:5, /* type of opcode */ mem_lvl:14, /* memory hierarchy level */ mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ mem_rsvd:31; }; }; /* type of opcode (load/store/prefetch,code) */ #define PERF_MEM_OP_NA 0x01 /* not available */ #define PERF_MEM_OP_LOAD 0x02 /* load instruction */ #define PERF_MEM_OP_STORE 0x04 /* store instruction */ #define PERF_MEM_OP_PFETCH 0x08 /* prefetch */ #define PERF_MEM_OP_EXEC 0x10 /* code (execution) */ #define PERF_MEM_OP_SHIFT 0 /* memory hierarchy (memory level, hit or miss) */ #define PERF_MEM_LVL_NA 0x01 /* not available */ #define PERF_MEM_LVL_HIT 0x02 /* hit level */ #define PERF_MEM_LVL_MISS 0x04 /* miss level */ #define PERF_MEM_LVL_L1 0x08 /* L1 */ #define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */ #define PERF_MEM_LVL_L2 0x20 /* L2 */ #define PERF_MEM_LVL_L3 0x40 /* L3 */ #define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */ #define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */ #define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */ #define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */ #define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */ #define PERF_MEM_LVL_IO 0x1000 /* I/O memory */ #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_SHIFT 5 /* snoop mode */ #define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ #define PERF_MEM_SNOOP_HIT 0x04 /* snoop hit */ #define PERF_MEM_SNOOP_MISS 0x08 /* snoop miss */ #define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_SHIFT 19 /* locked instruction */ #define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ #define PERF_MEM_LOCK_SHIFT 24 /* TLB access */ #define PERF_MEM_TLB_NA 0x01 /* not available */ #define PERF_MEM_TLB_HIT 0x02 /* hit level */ #define PERF_MEM_TLB_MISS 0x04 /* miss level */ #define PERF_MEM_TLB_L1 0x08 /* L1 */ #define PERF_MEM_TLB_L2 0x10 /* L2 */ #define PERF_MEM_TLB_WK 0x20 /* Hardware Walker*/ #define PERF_MEM_TLB_OS 0x40 /* OS fault handler */ #define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_S(a, s) \ (((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) #endif /* PERF_MEM_OP_NA */ #ifdef __cplusplus /* extern C */ } #endif #endif /* __PERFMON_PERF_EVENT_H__ */ papi-5.3.0/src/libpfm4/include/perfmon/pfmlib_perf_event.h0000600003276200002170000000443412247131123023217 0ustar ralphundrgrad/* * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PFMLIB_PERF_EVENTS_H__ #define __PFMLIB_PERF_EVENTS_H__ #include #include #ifdef __cplusplus extern "C" { #endif /* * use with PFM_OS_PERF, PFM_OS_PERF_EXT for pfm_get_os_event_encoding() */ typedef struct { struct perf_event_attr *attr; /* in/out: perf_event struct pointer */ char **fstr; /* out/in: fully qualified event string */ size_t size; /* sizeof struct */ int idx; /* out: opaque event identifier */ int cpu; /* out: cpu to program */ int flags; /* out: perf_event_open() flags */ int pad0; /* explicit 64-bit mode padding */ } pfm_perf_encode_arg_t; #if __WORDSIZE == 64 #define PFM_PERF_ENCODE_ABI0 40 /* includes 4-byte padding */ #else #define PFM_PERF_ENCODE_ABI0 28 #endif /* * old interface, maintained for backward compatibility with older versions o * the library. Should use pfm_get_os_event_encoding() now */ extern pfm_err_t pfm_get_perf_event_encoding(const char *str, int dfl_plm, struct perf_event_attr *output, char **fstr, int *idx); #ifdef __cplusplus /* extern C */ } #endif #endif /* __PFMLIB_PERF_EVENT_H__ */ papi-5.3.0/src/libpfm4/libpfm.spec0000600003276200002170000000651512247131124016417 0ustar ralphundrgrad%{!?with_python: %global with_python 1} %define python_sitearch %(python -c "from distutils.sysconfig import get_python_lib; print get_python_lib(1)") %define python_prefix %(python -c "import sys; print sys.prefix") Name: libpfm Version: 4.4.0 Release: 1%{?dist} Summary: Library to encode performance events for use by perf tool Group: System Environment/Libraries License: MIT URL: http://perfmon2.sourceforge.net/ Source0: http://sourceforge.net/projects/perfmon2/files/libpfm4/%{name}-%{version}.tar.gz %if %{with_python} BuildRequires: python-devel BuildRequires: python-setuptools-devel BuildRequires: swig %endif BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) %description libpfm4 is a library to help encode events for use with operating system kernels performance monitoring interfaces. The current version provides support for the perf_events interface available in upstream Linux kernels since v2.6.31. %package devel Summary: Development library to encode performance events for perf_events based tools Group: Development/Libraries Requires: %{name} = %{version} %description devel Development library and header files to create performance monitoring applications for the perf_events interface. %if %{with_python} %package python Summary: Python bindings for libpfm and perf_event_open system call Group: Development/Languages Requires: %{name} = %{version} %description python Python bindings for libpfm4 and perf_event_open system call. %endif %prep %setup -q %build %if %{with_python} %global python_config CONFIG_PFMLIB_NOPYTHON=n %else %global python_config CONFIG_PFMLIB_NOPYTHON=y %endif make %{python_config} %{?_smp_mflags} %install rm -rf $RPM_BUILD_ROOT %if %{with_python} %global python_config CONFIG_PFMLIB_NOPYTHON=n %else %global python_config CONFIG_PFMLIB_NOPYTHON=y %endif make \ PREFIX=$RPM_BUILD_ROOT%{_prefix} \ LIBDIR=$RPM_BUILD_ROOT%{_libdir} \ PYTHON_PREFIX=$RPM_BUILD_ROOT/%{python_prefix} \ %{python_config} \ LDCONFIG=/bin/true \ install %clean rm -fr $RPM_BUILD_ROOT %post -p /sbin/ldconfig %postun -p /sbin/ldconfig %files %defattr(644,root,root,755) %doc README %attr(755,root,root) %{_libdir}/lib*.so* %files devel %defattr(644,root,root,755) %{_includedir}/* %{_mandir}/man3/* %{_libdir}/lib*.a %if %{with_python} %files python %defattr(644,root,root,755) %attr(755,root,root) %{python_sitearch}/* %endif %changelog * Wed Nov 13 2013 Lukas Berk 4.4.0-1 - Intel IVB-EP support - Intel IVB updates support - Intel SNB updates support - Intel SNB-EP uncore support - ldlat support (PEBS-LL) - New Intel Atom support - bug fixes * Tue Aug 28 2012 Stephane Eranian 4.3.0-1 - ARM Cortex A15 support - updated Intel Sandy Bridge core PMU events - Intel Sandy Bridge desktop (model 42) uncore PMU support - Intel Ivy Bridge support - full perf_events generic event support - updated perf_examples - enabled Intel Nehalem/Westmere uncore PMU support - AMD LLano processor supoprt (Fam 12h) - AMD Turion rocessor supoprt (Fam 11h) - Intel Atom Cedarview processor support - Win32 compilation support - perf_events excl attribute - perf_events generic hw event aliases support - many bug fixes * Wed Mar 14 2012 William Cohen 4.2.0-2 - Some spec file fixup. * Wed Jan 12 2011 Arun Sharma 4.2.0-0 Initial revision papi-5.3.0/src/libpfm4/config.mk0000600003276200002170000001243512247131123016065 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # # This file is part of libpfm, a performance monitoring support library for # applications on Linux. # # # This file defines the global compilation settings. # It is included by every Makefile # # SYS := $(shell uname -s) ARCH := $(shell uname -m) ifeq (i686,$(findstring i686,$(ARCH))) override ARCH=i386 endif ifeq (i586,$(findstring i586,$(ARCH))) override ARCH=i386 endif ifeq (i486,$(findstring i486,$(ARCH))) override ARCH=i386 endif ifeq (i386,$(findstring i386,$(ARCH))) override ARCH=i386 endif ifeq (i86pc,$(findstring i86pc,$(ARCH))) override ARCH=i386 endif ifeq ($(ARCH),x86_64) override ARCH=x86_64 endif ifeq ($(ARCH),amd64) override ARCH=x86_64 endif ifeq (ppc,$(findstring ppc,$(ARCH))) override ARCH=powerpc endif ifeq (sparc64,$(findstring sparc64,$(ARCH))) override ARCH=sparc endif ifeq (armv6,$(findstring armv6,$(ARCH))) override ARCH=arm endif ifeq (armv7,$(findstring armv7,$(ARCH))) override ARCH=arm endif ifeq (mips64,$(findstring mips64,$(ARCH))) override ARCH=mips endif ifeq (mips,$(findstring mips,$(ARCH))) override ARCH=mips endif ifeq (MINGW,$(findstring MINGW,$(SYS))) override SYS=WINDOWS endif # # CONFIG_PFMLIB_SHARED: y=compile static and shared versions, n=static only # CONFIG_PFMLIB_DEBUG: enable debugging output support # CONFIG_PFMLIB_NOPYTHON: do not generate the python support, incompatible # with PFMLIB_SHARED=n # CONFIG_PFMLIB_SHARED?=y CONFIG_PFMLIB_DEBUG?=y CONFIG_PFMLIB_NOPYTHON?=y # # Cell Broadband Engine is reported as PPC but needs special handling. # ifeq ($(SYS),Linux) MACHINE := $(shell grep -q 'Cell Broadband Engine' /proc/cpuinfo && echo cell) ifeq (cell,$(MACHINE)) override ARCH=cell endif endif # # Library version # VERSION=4 REVISION=4 AGE=0 # # Where should things (lib, headers, man) go in the end. # PREFIX=/usr/local LIBDIR=$(PREFIX)/lib INCDIR=$(PREFIX)/include MANDIR=$(PREFIX)/share/man DOCDIR=$(PREFIX)/share/doc/libpfm-$(VERSION).$(REVISION).$(AGE) # # System header files # # SYSINCDIR : where to find standard header files (default to .) SYSINCDIR=. # # Configuration Paramaters for libpfm library # ifeq ($(ARCH),ia64) CONFIG_PFMLIB_ARCH_IA64=y endif ifeq ($(ARCH),x86_64) CONFIG_PFMLIB_ARCH_X86_64=y CONFIG_PFMLIB_ARCH_X86=y endif ifeq ($(ARCH),i386) CONFIG_PFMLIB_ARCH_I386=y CONFIG_PFMLIB_ARCH_X86=y endif ifeq ($(ARCH),mips) CONFIG_PFMLIB_ARCH_MIPS=y endif ifeq ($(ARCH),powerpc) CONFIG_PFMLIB_ARCH_POWERPC=y endif ifeq ($(ARCH),sparc) CONFIG_PFMLIB_ARCH_SPARC=y endif ifeq ($(ARCH),arm) CONFIG_PFMLIB_ARCH_ARM=y endif ifeq ($(ARCH),s390x) CONFIG_PFMLIB_ARCH_S390X=y endif ifeq ($(XTPE_COMPILE_TARGET),linux) CONFIG_PFMLIB_ARCH_CRAYXT=y endif ifeq ($(ARCH),cell) CONFIG_PFMLIB_CELL=y endif # # you shouldn't have to touch anything beyond this point # # # The entire package can be compiled using # icc the Intel Itanium Compiler (7.x,8.x, 9.x) # or GNU C #CC=icc CC?=gcc LIBS= INSTALL=install LDCONFIG=ldconfig LN?=ln -sf PFMINCDIR=$(TOPDIR)/include PFMLIBDIR=$(TOPDIR)/lib # # -Wextra: to enable extra compiler sanity checks (e.g., signed vs. unsigned) # -Wno-unused-parameter: to avoid warnings on unused foo(void *this) parameter # DBG?=-g -Wall -Werror -Wextra -Wno-unused-parameter ifeq ($(SYS),Darwin) # older gcc-4.2 does not like -Wextra and some of our initialization code # Xcode uses a gcc version which is too old for some static initializers CC=clang DBG?=-g -Wall -Werror LDCONFIG=true endif ifeq ($(SYS),FreeBSD) # gcc-4.2 does not like -Wextra and some of our initialization code DBG=-g -Wall -Werror endif CFLAGS+=$(OPTIM) $(DBG) -I$(SYSINCDIR) -I$(PFMINCDIR) MKDEP=makedepend PFMLIB=$(PFMLIBDIR)/libpfm.a ifeq ($(CONFIG_PFMLIB_DEBUG),y) CFLAGS += -DCONFIG_PFMLIB_DEBUG endif CTAGS?=ctags # # Python is for use with perf_events # so it only works on Linux # ifneq ($(SYS),Linux) CONFIG_PFMLIB_NOPYTHON=y endif # # mark that we are compiling on Linux # ifeq ($(SYS),Linux) CFLAGS+= -DCONFIG_PFMLIB_OS_LINUX endif # # compile examples statically if library is # compile static # not compatible with python support, so disable for now # ifeq ($(CONFIG_PFMLIB_SHARED),n) LDFLAGS+= -static CONFIG_PFMLIB_NOPYTHON=y endif ifeq ($(SYS),WINDOWS) CFLAGS +=-DPFMLIB_WINDOWS endif papi-5.3.0/src/libpfm4/tests/0000700003276200002170000000000012247131124015423 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/tests/validate_x86.c0000600003276200002170000017500712247131124020101 0ustar ralphundrgrad/* * validate_x86.c - validate event tables + encodings * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #define MAX_ENCODING 8 #define SRC_LINE .line = __LINE__ typedef struct { const char *name; const char *fstr; uint64_t codes[MAX_ENCODING]; int ret, count; int line; } test_event_t; static const test_event_t x86_test_events[]={ { SRC_LINE, .name = "core::INST_RETIRED:ANY_P", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:ANY_P", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:DEAD", .ret = PFM_ERR_ATTR, /* cannot know if it is umask or mod */ .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:u:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5100c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:u=0:k=1:u=1", .ret = PFM_ERR_ATTR_SET, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:c=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d300c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:c=1:i=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d300c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:c=2", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x25300c0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:c=320", .ret = PFM_ERR_ATTR_VAL, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:t=1", .ret = PFM_ERR_ATTR, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::L2_LINES_IN", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537024ull, }, { SRC_LINE, .name = "core::L2_LINES_IN:SELF", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537024ull, .fstr = "core::L2_LINES_IN:SELF:ANY:k=1:u=1:e=0:i=0:c=0", }, { SRC_LINE, .name = "core::L2_LINES_IN:SELF:BOTH_CORES", .ret = PFM_ERR_FEATCOMB, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::L2_LINES_IN:SELF:PREFETCH", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x535024ull, }, { SRC_LINE, .name = "core::L2_LINES_IN:SELF:PREFETCH:ANY", .ret = PFM_ERR_FEATCOMB, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::RS_UOPS_DISPATCHED_NONE", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d300a0ull, }, { SRC_LINE, .name = "core::RS_UOPS_DISPATCHED_NONE:c=2", .ret = PFM_ERR_ATTR_SET, .count = 1, .codes[0] = 0ull, }, { SRC_LINE, .name = "core::branch_instructions_retired", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c4ull, .fstr = "core::BR_INST_RETIRED:ANY:k=1:u=1:e=0:i=0:c=0" }, { SRC_LINE, .name = "nhm::branch_instructions_retired", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c4ull, .fstr = "nhm::BR_INST_RETIRED:ALL_BRANCHES:k=1:u=1:e=0:i=0:c=0:t=0" }, { SRC_LINE, .name = "wsm::BRANCH_INSTRUCTIONS_RETIRED", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c4ull, /* architected encoding, guaranteed to exist */ .fstr = "wsm::BR_INST_RETIRED:ALL_BRANCHES:k=1:u=1:e=0:i=0:c=0:t=0" }, { SRC_LINE, .name = "nhm::ARITH:DIV:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d60114ull, .fstr = "nhm::ARITH:CYCLES_DIV_BUSY:k=1:u=0:e=1:i=1:c=1:t=0", }, { SRC_LINE, .name = "nhm::ARITH:CYCLES_DIV_BUSY:k=1:u=1:e=1:i=1:c=1:t=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d70114ull, .fstr = "nhm::ARITH:CYCLES_DIV_BUSY:k=1:u=1:e=1:i=1:c=1:t=0", }, { SRC_LINE, .name = "wsm::UOPS_EXECUTED:CORE_STALL_COUNT:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1f53fb1ull, .fstr = "wsm::UOPS_EXECUTED:CORE_STALL_CYCLES:k=0:u=1:e=1:i=1:c=1:t=1", }, { SRC_LINE, .name = "wsm::UOPS_EXECUTED:CORE_STALL_COUNT:u:t=0", .ret = PFM_ERR_ATTR_SET, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:full_any:partial_any", .ret = PFM_ERR_FEATCOMB, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50072full, .fstr = "wsm_unc::UNC_QMC_WRITES:FULL_ANY:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:full_any", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50072full, .fstr = "wsm_unc::UNC_QMC_WRITES:FULL_ANY:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:full_ch0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50012full, .fstr = "wsm_unc::UNC_QMC_WRITES:FULL_CH0:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:partial_any", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50382full, .fstr = "wsm_unc::UNC_QMC_WRITES:PARTIAL_ANY:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:partial_ch0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50082full, .fstr = "wsm_unc::UNC_QMC_WRITES:PARTIAL_CH0:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "wsm_unc::unc_qmc_writes:partial_ch0:partial_ch1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x50182full, .fstr = "wsm_unc::UNC_QMC_WRITES:PARTIAL_CH0:PARTIAL_CH1:e=0:i=0:c=0:o=0", }, { SRC_LINE, .name = "amd64_fam10h_barcelona::DISPATCHED_FPU", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x533f00ull, .fstr = "amd64_fam10h_barcelona::DISPATCHED_FPU:ALL:k=1:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam10h_barcelona::DISPATCHED_FPU:k:u=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x523f00ull, .fstr = "amd64_fam10h_barcelona::DISPATCHED_FPU:ALL:k=1:u=0:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam10h_barcelona::DISPATCHED_FPU:OPS_ADD:OPS_MULTIPLY", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530300ull, .fstr = "amd64_fam10h_barcelona::DISPATCHED_FPU:OPS_ADD:OPS_MULTIPLY:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam10h_barcelona::L2_CACHE_MISS:ALL:DATA", .ret = PFM_ERR_FEATCOMB, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "amd64_fam10h_barcelona::MEMORY_CONTROLLER_REQUESTS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x10053fff0ull, .fstr = "amd64_fam10h_barcelona::MEMORY_CONTROLLER_REQUESTS:ALL:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_k8_revb::RETURN_STACK_OVERFLOWS:g=1:u", .ret = PFM_ERR_ATTR, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "amd64_k8_revb::RETURN_STACK_HITS:e=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x570088ull, .fstr = "amd64_k8_revb::RETURN_STACK_HITS:k=1:u=1:e=1:i=0:c=0", }, { SRC_LINE, .name = "amd64_k8_revb::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x533fecull, .fstr = "amd64_k8_revb::PROBE:ALL:k=1:u=1:e=0:i=0:c=0", }, { SRC_LINE, .name = "amd64_k8_revc::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x533fecull, .fstr = "amd64_k8_revc::PROBE:ALL:k=1:u=1:e=0:i=0:c=0", }, { SRC_LINE, .name = "amd64_k8_revd::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537fecull, .fstr = "amd64_k8_revd::PROBE:ALL:k=1:u=1:e=0:i=0:c=0" }, { SRC_LINE, .name = "amd64_k8_reve::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537fecull, .fstr = "amd64_k8_reve::PROBE:ALL:k=1:u=1:e=0:i=0:c=0" }, { SRC_LINE, .name = "amd64_k8_revf::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537fecull, .fstr = "amd64_k8_revf::PROBE:ALL:k=1:u=1:e=0:i=0:c=0" }, { SRC_LINE, .name = "amd64_k8_revg::PROBE:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x537fecull, .fstr = "amd64_k8_revg::PROBE:ALL:k=1:u=1:e=0:i=0:c=0" }, { SRC_LINE, .name = "amd64_fam10h_barcelona::L1_DTLB_MISS_AND_L2_DTLB_HIT:L2_1G_TLB_HIT", .ret = PFM_ERR_ATTR, .count = 0, .codes[0] = 0ull, }, { SRC_LINE, .name = "amd64_fam10h_barcelona::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530345ull, .fstr = "amd64_fam10h_barcelona::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL:k=1:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam10h_shanghai::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530745ull, .fstr = "amd64_fam10h_shanghai::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL:k=1:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam10h_istanbul::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530745ull, .fstr = "amd64_fam10h_istanbul::L1_DTLB_MISS_AND_L2_DTLB_HIT:ALL:k=1:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam10h_barcelona::READ_REQUEST_TO_L3_CACHE", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x40053f7e0ull, .fstr = "amd64_fam10h_barcelona::READ_REQUEST_TO_L3_CACHE:ANY_READ:ALL_CORES:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam10h_shanghai::READ_REQUEST_TO_L3_CACHE", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x40053f7e0ull, .fstr = "amd64_fam10h_shanghai::READ_REQUEST_TO_L3_CACHE:ANY_READ:ALL_CORES:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "core::RAT_STALLS:ANY:u:c=1,cycles", /* must cut at comma */ .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1510fd2ull, .fstr = "core::RAT_STALLS:ANY:k=0:u=1:e=0:i=0:c=1" }, { SRC_LINE, .name = "wsm::mem_uncore_retired:remote_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53200f, .fstr = "wsm::MEM_UNCORE_RETIRED:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm_dp::mem_uncore_retired:remote_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53100f, .fstr = "wsm_dp::MEM_UNCORE_RETIRED:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::mem_uncore_retired:local_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53100f, .fstr = "wsm::MEM_UNCORE_RETIRED:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm_dp::mem_uncore_retired:local_dram", .ret = PFM_ERR_ATTR, .count = 1, .codes[0] = 0, }, { SRC_LINE, .name = "nhm::mem_uncore_retired:uncacheable", .ret = PFM_ERR_ATTR, .count = 1, .codes[0] = 0, }, { SRC_LINE, .name = "nhm::mem_uncore_retired:l3_data_miss_unknown", .ret = PFM_ERR_ATTR, .count = 1, .codes[0] = 0, }, { SRC_LINE, .name = "nhm_ex::mem_uncore_retired:uncacheable", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53800f, .fstr = "nhm_ex::MEM_UNCORE_RETIRED:UNCACHEABLE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm_ex::mem_uncore_retired:l3_data_miss_unknown", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53010f, .fstr = "nhm_ex::MEM_UNCORE_RETIRED:L3_DATA_MISS_UNKNOWN:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::mem_uncore_retired:local_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53200f, .fstr = "nhm::MEM_UNCORE_RETIRED:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm_ex::mem_uncore_retired:local_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53200f, .fstr = "nhm_ex::MEM_UNCORE_RETIRED:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5201b7, .codes[1] = 0xffff, .fstr = "wsm::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=0:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:local_dram", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x20ff, .fstr = "wsm::OFFCORE_RESPONSE_0:ANY_REQUEST:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:PF_IFETCH", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0xff40, .fstr = "wsm::OFFCORE_RESPONSE_0:PF_IFETCH:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:ANY_DATA:LOCAL_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x2033, .fstr = "wsm::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:DMND_RFO:DMND_DATA_RD:LOCAL_DRAM:REMOTE_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x6003, .fstr = "wsm::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:LOCAL_DRAM:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0" }, { SRC_LINE, .name = "wsm::offcore_response_1:local_dram", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x20ff, .fstr = "wsm::OFFCORE_RESPONSE_1:ANY_REQUEST:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_1:PF_IFETCH", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0xff40, .fstr = "wsm::OFFCORE_RESPONSE_1:PF_IFETCH:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_1:ANY_DATA:LOCAL_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x2033, .fstr = "wsm::OFFCORE_RESPONSE_1:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_1:DMND_RFO:DMND_DATA_RD:LOCAL_DRAM:REMOTE_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x6003, .fstr = "wsm::OFFCORE_RESPONSE_1:DMND_DATA_RD:DMND_RFO:LOCAL_DRAM:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::offcore_response_0:ANY_LLC_MISS", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0xf8ff, .fstr = "wsm::OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:LOCAL_DRAM:REMOTE_DRAM:NON_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm_dp::offcore_response_0:ANY_LLC_MISS", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0xf8ff, .fstr = "wsm_dp::OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_CACHE_HITM:LOCAL_DRAM_AND_REMOTE_CACHE_HIT:REMOTE_DRAM:OTHER_LLC_MISS:NON_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm_dp::offcore_response_0:LOCAL_CACHE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x7ff, .fstr = "wsm_dp::OFFCORE_RESPONSE_0:ANY_REQUEST:UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm_dp::offcore_response_0:ANY_CACHE_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x7fff, .fstr = "wsm_dp::OFFCORE_RESPONSE_0:ANY_REQUEST:UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_HITM:LOCAL_DRAM_AND_REMOTE_CACHE_HIT:REMOTE_DRAM:OTHER_LLC_MISS:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5201b7, .codes[1] = 0xffff, .fstr = "nhm::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=0:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:local_dram", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x40ff, .fstr = "nhm::OFFCORE_RESPONSE_0:ANY_REQUEST:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:any_llc_miss", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0xf8ff, .fstr = "nhm::OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM:LOCAL_DRAM:NON_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:any_dram", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x60ff, .fstr = "nhm::OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_DRAM:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:PF_IFETCH", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0xff40, .fstr = "nhm::OFFCORE_RESPONSE_0:PF_IFETCH:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:ANY_DATA:LOCAL_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x4033, .fstr = "nhm::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "nhm::offcore_response_0:DMND_RFO:DMND_DATA_RD:LOCAL_DRAM:REMOTE_DRAM", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x6003, .fstr = "nhm::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:REMOTE_DRAM:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "amd64_k8_revg::DISPATCHED_FPU:0xff:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x52ff00ull, .fstr = "amd64_k8_revg::DISPATCHED_FPU:0xff:k=1:u=0:e=0:i=0:c=0" }, { SRC_LINE, .name = "amd64_k8_revg::DISPATCHED_FPU:0x4ff", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "amd64_fam10h_barcelona::DISPATCHED_FPU:0x4ff:u", .ret = PFM_ERR_ATTR }, { SRC_LINE, .name = "amd64_fam10h_barcelona::DISPATCHED_FPU:0xff:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x51ff00ull, .fstr = "amd64_fam10h_barcelona::DISPATCHED_FPU:0xff:k=0:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "wsm::inst_retired:0xff:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x52ffc0, .fstr = "wsm::INST_RETIRED:0xff:k=1:u=0:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "wsm::uops_issued:0xff:stall_cycles", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "wsm::uops_issued:0xff:0xf1", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "wsm::uops_issued:0xff=", .ret = PFM_ERR_ATTR_VAL, .count = 0, }, { SRC_LINE, .name = "wsm::uops_issued:123", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "wsm::uops_issued:0xfff", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "netburst::global_power_events", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2600020f, .codes[1] = 0x3d000, .fstr = "netburst::global_power_events:RUNNING:k=1:u=1:e=0:cmpl=0:thr=0", }, { SRC_LINE, .name = "netburst::global_power_events:RUNNING:u:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2600020f, .codes[1] = 0x3d000, .fstr = "netburst::global_power_events:RUNNING:k=1:u=1:e=0:cmpl=0:thr=0", }, { SRC_LINE, .name = "netburst::global_power_events:RUNNING:e", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2600020f, .codes[1] = 0x107d000, .fstr = "netburst::global_power_events:RUNNING:k=1:u=1:e=1:cmpl=0:thr=0", }, { SRC_LINE, .name = "netburst::global_power_events:RUNNING:cmpl:e:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x26000205, .codes[1] = 0x10fd000, .fstr = "netburst::global_power_events:RUNNING:k=0:u=1:e=1:cmpl=1:thr=0", }, { SRC_LINE, .name = "netburst::global_power_events:RUNNING:cmpl:thr=8:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x26000205, .codes[1] = 0x8fd000, .fstr = "netburst::global_power_events:RUNNING:k=0:u=1:e=0:cmpl=1:thr=8", }, { SRC_LINE, .name = "netburst::global_power_events:RUNNING:cmpl:thr=32:u", .ret = PFM_ERR_ATTR_VAL, .count = 0, }, { SRC_LINE, .name = "netburst::instr_completed:nbogus", .ret = PFM_ERR_NOTFOUND, .count = 0, }, { SRC_LINE, .name = "netburst_p::instr_completed:nbogus", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xe00020f, .codes[1] = 0x39000, .fstr = "netburst_p::instr_completed:NBOGUS:k=1:u=1:e=0:cmpl=0:thr=0", }, { SRC_LINE, .name = "snb::cpl_cycles:ring0_trans:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x155015c, .fstr = "snb::CPL_CYCLES:RING0:k=0:u=1:e=1:i=0:c=1:t=0", }, { SRC_LINE, .name = "snb::cpl_cycles:ring0_trans:e=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x157015cull, }, { SRC_LINE, .name = "snb::OFFCORE_REQUESTS_OUTSTanding:ALL_DATA_RD_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1530860, .fstr = "snb::OFFCORE_REQUESTS_OUTSTANDING:ALL_DATA_RD:k=1:u=1:e=0:i=0:c=1:t=0", }, { SRC_LINE, .name = "snb::uops_issued:core_stall_cycles:u:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1f3010e, .fstr = "snb::UOPS_ISSUED:ANY:k=1:u=1:e=0:i=1:c=1:t=1", }, { SRC_LINE, .name = "snb::LLC_REFERences:k:t=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x724f2e, .fstr = "snb::LAST_LEVEL_CACHE_REFERENCES:k=1:u=0:e=0:i=0:c=0:t=1", }, { SRC_LINE, .name = "snb::ITLB:0x1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301ae, .fstr = "snb::ITLB:0x1:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:DMND_RFO:ANY_RESPONSE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x10002, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:DMND_RFO:ANY_REQUEST", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x18fff, }, { SRC_LINE, .name = "snb::offcore_response_0:DMND_RFO", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x10002, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:any_response", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x18fff, .fstr = "snb::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:NO_SUPP:SNP_NONE:PF_RFO:PF_IFETCH", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x80020060, .fstr = "snb::OFFCORE_RESPONSE_0:PF_RFO:PF_IFETCH:NO_SUPP:SNP_NONE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_1:DMND_RFO:ANY_RESPONSE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x10002, .fstr = "snb::OFFCORE_RESPONSE_1:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_1:DMND_RFO:ANY_REQUEST", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x18fff, }, { SRC_LINE, .name = "snb::offcore_response_1:DMND_RFO", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x10002, .fstr = "snb::OFFCORE_RESPONSE_1:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_1:any_response", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x18fff, .fstr = "snb::OFFCORE_RESPONSE_1:ANY_REQUEST:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_1:NO_SUPP:SNP_NONE:PF_RFO:PF_IFETCH", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x80020060, .fstr = "snb::OFFCORE_RESPONSE_1:PF_RFO:PF_IFETCH:NO_SUPP:SNP_NONE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_1:ANY_REQUEST:LLC_MISS_LOCAL_DRAM", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x3f80408fffull, .fstr = "snb::OFFCORE_RESPONSE_1:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER:MISS_DRAM:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:DC_BUFFER_0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530068, .fstr = "amd64_fam14h_bobcat::MAB_REQUESTS:DC_BUFFER_0:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:DC_BUFFER_0:DC_BUFFER_1", .ret = PFM_ERR_FEATCOMB, }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:DC_BUFFER_0:IC_BUFFER_0", .ret = PFM_ERR_FEATCOMB, }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:ANY_DC_BUFFER", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530b68, .fstr = "amd64_fam14h_bobcat::MAB_REQUESTS:ANY_DC_BUFFER:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:ANY_IC_BUFFER", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530a68, .fstr = "amd64_fam14h_bobcat::MAB_REQUESTS:ANY_IC_BUFFER:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam14h_bobcat::MAB_REQUESTS:ANY_IC_BUFFER:IC_BUFFER_1", .ret = PFM_ERR_FEATCOMB, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:e", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "core::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "atom::INST_RETIRED:ANY_P:e", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5700c0ull, }, { SRC_LINE, .name = "atom::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "nhm::INST_RETIRED:ANY_P:e", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "nhm::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "wsm::INST_RETIRED:ANY_P:e", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "wsm::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "snb::INST_RETIRED:ANY_P:e", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snb::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "snb::INST_RETIRED:ANY_P:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15700c0ull, }, { SRC_LINE, .name = "snb::offcore_response_0:any_request", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]= 0x18fffull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x10001ull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:llc_hite", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x3f80080001ull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:LLC_HITE:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:llc_hite:snp_any", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x3f80080001ull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:LLC_HITE:SNP_NONE:SNP_NOT_NEEDED:SNP_MISS:SNP_NO_FWD:SNP_FWD:HITM:NON_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:llc_hite:hitm", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x1000080001ull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:LLC_HITE:HITM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:any_response", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x10001ull, .fstr = "snb::OFFCORE_RESPONSE_0:DMND_DATA_RD:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:any_response:snp_any", .ret = PFM_ERR_FEATCOMB, }, { SRC_LINE, .name = "snb::offcore_response_0:dmnd_data_rd:any_response:llc_hitmesf", .ret = PFM_ERR_FEATCOMB, }, { SRC_LINE, .name = "snb::offcore_response_0:any_response", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7ull, .codes[1]=0x18fffull, .fstr = "snb::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "amd64_fam11h_turion::MAB_REQUESTS:DC_BUFFER_0", .ret = PFM_ERR_NOTFOUND, }, { SRC_LINE, .name = "amd64_fam11h_turion::RETIRED_INSTRUCTIONS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5300c0, .fstr = "amd64_fam11h_turion::RETIRED_INSTRUCTIONS:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam11h_turion::RETIRED_UOPS:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5200c1, .fstr = "amd64_fam11h_turion::RETIRED_UOPS:k=1:u=0:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam11h_turion::CPU_CLK_UNHALTED", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530076, .fstr = "amd64_fam11h_turion::CPU_CLK_UNHALTED:k=1:u=1:e=0:i=0:c=0:h=0:g=0", }, { SRC_LINE, .name = "amd64_fam11h_turion::RETIRED_UOPS:c=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d300c1, .fstr = "amd64_fam11h_turion::RETIRED_UOPS:k=1:u=1:e=0:i=1:c=1:h=0:g=0", }, { SRC_LINE, .name = "ivb::ARITH:FPU_DIV", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1570114, .fstr = "ivb::ARITH:FPU_DIV_ACTIVE:k=1:u=1:e=1:i=0:c=1:t=0", }, { SRC_LINE, .name = "ivb::INST_RETIRED:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301c0, .fstr = "ivb::INST_RETIRED:ALL:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::INST_RETIRED:ALL:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5201c0, .fstr = "ivb::INST_RETIRED:ALL:k=1:u=0:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::INST_RETIRED:ALL:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5101c0, .fstr = "ivb::INST_RETIRED:ALL:k=0:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::TLB_ACCESS:LOAD_STLB_HIT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53045f, .fstr = "ivb::DTLB_LOAD_ACCESS:STLB_HIT:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::TLB_ACCESS:STLB_HIT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53045f, .fstr = "ivb::DTLB_LOAD_ACCESS:STLB_HIT:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::DTLB_LOAD_ACCESS:STLB_HIT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53045f, .fstr = "ivb::DTLB_LOAD_ACCESS:STLB_HIT:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::MOVE_ELIMINATION:INT_NOT_ELIMINATED", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530158, .fstr = "ivb::MOVE_ELIMINATION:INT_NOT_ELIMINATED:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::RESOURCE_STALLS:SB:RS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530ca2, .fstr = "ivb::RESOURCE_STALLS:RS:SB:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::RESOURCE_STALLS:ROB:RS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5314a2, .fstr = "ivb::RESOURCE_STALLS:RS:ROB:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::UOPS_EXECUTED:THREAD", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301b1, .fstr = "ivb::UOPS_EXECUTED:THREAD:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::UOPS_EXECUTED:THREAD:e:c=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15701b1, .fstr = "ivb::UOPS_EXECUTED:THREAD:k=1:u=1:e=1:i=0:c=1:t=0", }, { SRC_LINE, .name = "ivb::UOPS_EXECUTED:THREAD:e", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "ivb::UOPS_EXECUTED:THREAD:c=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d301b1, .fstr = "ivb::UOPS_EXECUTED:THREAD:k=1:u=1:e=0:i=1:c=1:t=0", }, { SRC_LINE, .name = "ivb::CPU_CLK_UNHALTED:REF_P", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53013c, .fstr = "ivb::CPU_CLK_UNHALTED:REF_XCLK:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::DTLB_LOAD_MISSES:DEMAND_LD_MISS_CAUSES_A_WALK", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x538108, .fstr = "ivb::DTLB_LOAD_MISSES:MISS_CAUSES_A_WALK:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::offcore_response_0:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5201b7, .codes[1] = 0x18fff, .fstr = "ivb::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=0:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::offcore_response_0:LLC_MISS_LOCAL", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x3f80408fffull, .fstr = "ivb::OFFCORE_RESPONSE_0:ANY_REQUEST:LLC_MISS_LOCAL:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::DTLB_LOAD_MISSES:STLB_HIT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53045f, .fstr = "ivb::DTLB_LOAD_MISSES:STLB_HIT:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::DTLB_LOAD_MISSES:LARGE_WALK_COMPLETED:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x518808, .fstr = "ivb::DTLB_LOAD_MISSES:LARGE_WALK_COMPLETED:k=0:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::l2_lines_in:i:i=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xd301f1, .fstr = "snb::L2_LINES_IN:I:k=1:u=1:e=0:i=1:c=0:t=0", }, { SRC_LINE, .name = "snb::l2_lines_in:i=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xd301f1, .fstr = "snb::L2_LINES_IN:I:k=1:u=1:e=0:i=1:c=0:t=0", }, { SRC_LINE, .name = "snb::l2_lines_in:i:i=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301f1, .fstr = "snb::L2_LINES_IN:I:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::l2_lines_in:e:e=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5304f1, .fstr = "snb::L2_LINES_IN:E:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb::l2_lines_in:e:e=1", .ret = PFM_ERR_ATTR, .count = 0, }, { SRC_LINE, .name = "snb::l2_lines_in:e:e=1:c=10", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xa5704f1, .fstr = "snb::L2_LINES_IN:E:k=1:u=1:e=1:i=0:c=10:t=0", }, { SRC_LINE, .name = "snb_unc_cbo0::unc_clockticks", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5000ff, .fstr = "snb_unc_cbo0::UNC_CLOCKTICKS", }, { SRC_LINE, .name = "snb_unc_cbo1::unc_clockticks", .ret = PFM_ERR_NOTFOUND }, { SRC_LINE, .name = "snb_unc_cbo2::unc_clockticks", .ret = PFM_ERR_NOTFOUND }, { SRC_LINE, .name = "snb_unc_cbo3::unc_clockticks", .ret = PFM_ERR_NOTFOUND }, { SRC_LINE, .name = "snb_unc_cbo1::UNC_CBO_CACHE_LOOKUP:STATE_MESI:READ_FILTER:c=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1d01f34, .fstr = "snb_unc_cbo1::UNC_CBO_CACHE_LOOKUP:STATE_MESI:READ_FILTER:e=0:i=1:c=1", }, { SRC_LINE, .name = "snbep_unc_cbo1::UNC_C_CLOCKTICKS:u", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x00, .fstr = "snbep_unc_cbo0::UNC_C_CLOCKTICKS", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:DATA_READ", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x334, .codes[1] = 0x7c0000, .fstr = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:DATA_READ:STATE_MESIF:e=0:i=0:t=0:tf=0:nf=0", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:DATA_READ:nf=1", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:DATA_READ:tid=1", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:STATE_M", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:nf=3", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4134, .codes[1] = 0x7c0c00, .fstr = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:STATE_MESIF:e=0:i=0:t=0:tf=0:nf=3", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:STATE_M:nf=3", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4134, .codes[1] = 0x200c00, .fstr = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:STATE_M:e=0:i=0:t=0:tf=0:nf=3", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_LLC_LOOKUP:NID:nf=3:tid=1", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:OPCODE", .ret = PFM_ERR_UMASK, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:WB", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1035, .fstr = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:WB:e=0:i=0:t=0:tf=0:nf=0", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:OPCODE:OPC_PCIWILF", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x135, .codes[1] = 0xca000000, .fstr = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:OPCODE:OPC_PCIWILF:e=0:i=0:t=0:tf=0:nf=0", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:OPCODE:OPC_PCIWILF:nf=1", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:NID_OPCODE:OPC_PCIRDCUR:nf=1", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4135, .codes[1] = 0xcf000400, .fstr = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:NID_OPCODE:OPC_PCIRDCUR:e=0:i=0:t=0:tf=0:nf=1", }, { SRC_LINE, .name = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:OPC_RFO:NID_OPCODE:nf=1", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4135, .codes[1] = 0xc0000400, .fstr = "snbep_unc_cbo0::UNC_C_TOR_INSERTS:NID_OPCODE:OPC_RFO:e=0:i=0:t=0:tf=0:nf=1", }, { SRC_LINE, .name = "snbep_unc_ha::UNC_H_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x0, .fstr = "snbep_unc_ha::UNC_H_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_ha::UNC_H_REQUESTS:READS:t=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1000301, .fstr = "snbep_unc_ha::UNC_H_REQUESTS:READS:e=0:i=0:t=1", }, { SRC_LINE, .name = "snbep_unc_imc0::UNC_M_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xff, .fstr = "snbep_unc_imc0::UNC_M_CLOCKTICKS", }, { SRC_LINE, .name = "snbep_unc_imc0::UNC_M_CLOCKTICKS:t=1", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_imc0::UNC_M_CAS_COUNT:RD", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x0304, .fstr = "snbep_unc_imc0::UNC_M_CAS_COUNT:RD:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_imc0::UNC_M_POWER_CKE_CYCLES:RANK0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x183, .fstr = "snbep_unc_imc0::UNC_M_POWER_CKE_CYCLES:RANK0:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_imc0::UNC_M_CAS_COUNT:WR", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xc04, .fstr = "snbep_unc_imc0::UNC_M_CAS_COUNT:WR:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x0, .fstr = "snbep_unc_pcu::UNC_P_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_CLOCKTICKS:t=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1000000, .fstr = "snbep_unc_pcu::UNC_P_CLOCKTICKS:e=0:i=0:t=1", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_CORE0_TRANSITION_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x200003, .fstr = "snbep_unc_pcu::UNC_P_CORE0_TRANSITION_CYCLES:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND1_CYCLES", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND2_CYCLES", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND3_CYCLES", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:ff=32", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xb, .codes[1] = 0x20, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:e=0:i=0:t=0:ff=32", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND1_CYCLES:ff=16", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xc, .codes[1] = 0x1000, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND1_CYCLES:e=0:i=0:t=0:ff=16", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND2_CYCLES:ff=8", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xd, .codes[1] = 0x80000, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND2_CYCLES:e=0:i=0:t=0:ff=8", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND3_CYCLES:ff=40", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xe, .codes[1] = 0x28000000, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND3_CYCLES:e=0:i=0:t=0:ff=40", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:ff=32:e", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4000b, .codes[1] = 0x20, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:e=1:i=0:t=0:ff=32", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:ff=32:i", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x80000b, .codes[1] = 0x20, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:e=0:i=1:t=0:ff=32", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:ff=32:e:i", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x84000b, .codes[1] = 0x20, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:e=1:i=1:t=0:ff=32", }, { SRC_LINE, .name = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:ff=32:e:i:t=4", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x484000b, .codes[1] = 0x20, .fstr = "snbep_unc_pcu::UNC_P_FREQ_BAND0_CYCLES:e=1:i=1:t=4:ff=32", }, { SRC_LINE, .name = "SNBEP_UNC_PCU::UNC_P_POWER_STATE_OCCUPANCY:CORES_C0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x4080, .fstr = "snbep_unc_pcu::UNC_P_POWER_STATE_OCCUPANCY:CORES_C0:e=0:i=0:t=0" }, { SRC_LINE, .name = "SNBEP_UNC_PCU::UNC_P_POWER_STATE_OCCUPANCY:CORES_C3", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x8080, .fstr = "snbep_unc_pcu::UNC_P_POWER_STATE_OCCUPANCY:CORES_C3:e=0:i=0:t=0", }, { SRC_LINE, .name = "SNBEP_UNC_PCU::UNC_P_POWER_STATE_OCCUPANCY:CORES_C6", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xc080, .fstr = "snbep_unc_pcu::UNC_P_POWER_STATE_OCCUPANCY:CORES_C6:e=0:i=0:t=0" }, { SRC_LINE, .name = "SNBEP_UNC_PCU::UNC_P_POWER_STATE_OCCUPANCY:CORES_C0:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x40004080, .fstr = "snbep_unc_pcu::UNC_P_POWER_STATE_OCCUPANCY:CORES_C0:e=0:i=1:t=0" }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x14, .fstr = "snbep_unc_qpi0::UNC_Q_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G0:DATA", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x201, .fstr = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G0:DATA:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G0:IDLE:t=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1800101, .fstr = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G0:IDLE:e=0:i=1:t=1", }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_TXL_FLITS_G0:DATA", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x200, .fstr = "snbep_unc_qpi0::UNC_Q_TXL_FLITS_G0:DATA:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G1:HOM", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x200602, .fstr = "snbep_unc_qpi0::UNC_Q_RXL_FLITS_G1:HOM:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_qpi0::UNC_Q_TXL_FLITS_G1:HOM", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x200600, .fstr = "snbep_unc_qpi0::UNC_Q_TXL_FLITS_G1:HOM:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_ubo::UNC_U_LOCK_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x44, .fstr = "snbep_unc_ubo::UNC_U_LOCK_CYCLES:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r2pcie::UNC_R2_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1, .fstr = "snbep_unc_r2pcie::UNC_R2_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r2pcie::UNC_R2_RING_AD_USED:ANY", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xf07, .fstr = "snbep_unc_r2pcie::UNC_R2_RING_AD_USED:ANY:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r3qpi0::UNC_R3_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1, .fstr = "snbep_unc_r3qpi0::UNC_R3_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r3qpi0::UNC_R3_TXR_CYCLES_FULL:e=0:i=0:t=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x25, .fstr = "snbep_unc_r3qpi0::UNC_R3_TXR_CYCLES_FULL:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r3qpi1::UNC_R3_CLOCKTICKS", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1, .fstr = "snbep_unc_r3qpi1::UNC_R3_CLOCKTICKS:e=0:i=0:t=0", }, { SRC_LINE, .name = "snbep_unc_r3qpi1::UNC_R3_TXR_CYCLES_FULL:e=0:i=0:t=0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x25, .fstr = "snbep_unc_r3qpi1::UNC_R3_TXR_CYCLES_FULL:e=0:i=0:t=0", }, { SRC_LINE, .name = "knc::cpu_clk_unhalted", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53002a, .fstr = "knc::CPU_CLK_UNHALTED:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "knc::instructions_executed", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530016, .fstr = "knc::INSTRUCTIONS_EXECUTED:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "knc::vpu_data_read", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x532000, .fstr = "knc::VPU_DATA_READ:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "knc::vpu_data_read:t:c=1:i", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1f32000, .fstr = "knc::VPU_DATA_READ:k=1:u=1:e=0:i=1:c=1:t=1", }, { SRC_LINE, .name = "snb_ep::cpl_cycles:ring0_trans:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x155015c, .fstr = "snb_ep::CPL_CYCLES:RING0:k=0:u=1:e=1:i=0:c=1:t=0", }, { SRC_LINE, .name = "snb_ep::cpl_cycles:ring0_trans:e=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x157015cull, }, { SRC_LINE, .name = "snb_ep::OFFCORE_REQUESTS_OUTSTanding:ALL_DATA_RD_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1530860, .fstr = "snb_ep::OFFCORE_REQUESTS_OUTSTANDING:ALL_DATA_RD:k=1:u=1:e=0:i=0:c=1:t=0", }, { SRC_LINE, .name = "snb_ep::uops_issued:core_stall_cycles:u:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1f3010e, .fstr = "snb_ep::UOPS_ISSUED:ANY:k=1:u=1:e=0:i=1:c=1:t=1", }, { SRC_LINE, .name = "snb_ep::LLC_REFERences:k:t=1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x724f2e, .fstr = "snb_ep::LAST_LEVEL_CACHE_REFERENCES:k=1:u=0:e=0:i=0:c=0:t=1", }, { SRC_LINE, .name = "snb_ep::ITLB:0x1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301ae, .fstr = "snb_ep::ITLB:0x1:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::mem_load_uops_llc_miss_retired:local_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5301d3, .fstr = "snb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::mem_load_uops_llc_miss_retired:remote_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5304d3, .fstr = "snb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_0:DMND_RFO:ANY_RESPONSE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x10002, .fstr = "snb_ep::OFFCORE_RESPONSE_0:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_0:DMND_RFO:ANY_REQUEST", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x18fff, }, { SRC_LINE, .name = "snb_ep::offcore_response_0:DMND_RFO", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x10002, .fstr = "snb_ep::OFFCORE_RESPONSE_0:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_0:any_response", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x18fff, .fstr = "snb_ep::OFFCORE_RESPONSE_0:ANY_REQUEST:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_0:NO_SUPP:SNP_NONE:PF_RFO:PF_IFETCH", .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x80020060, .fstr = "snb_ep::OFFCORE_RESPONSE_0:PF_RFO:PF_IFETCH:NO_SUPP:SNP_NONE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:DMND_RFO:ANY_RESPONSE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x10002, .fstr = "snb_ep::OFFCORE_RESPONSE_1:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:DMND_RFO:ANY_REQUEST", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x18fff, }, { SRC_LINE, .name = "snb_ep::offcore_response_1:DMND_RFO", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x10002, .fstr = "snb_ep::OFFCORE_RESPONSE_1:DMND_RFO:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:any_response", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x18fff, .fstr = "snb_ep::OFFCORE_RESPONSE_1:ANY_REQUEST:ANY_RESPONSE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:NO_SUPP:SNP_NONE:PF_RFO:PF_IFETCH", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x80020060, .fstr = "snb_ep::OFFCORE_RESPONSE_1:PF_RFO:PF_IFETCH:NO_SUPP:SNP_NONE:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:ANY_REQUEST:LLC_MISS_LOCAL_DRAM", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x3f80408fffull, .fstr = "snb_ep::OFFCORE_RESPONSE_1:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER:LLC_MISS_LOCAL_DRAM:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::offcore_response_1:ANY_REQUEST:LLC_MISS_REMOTE_DRAM", .count = 2, .codes[0] = 0x5301bb, .codes[1] = 0x3fff808fffull, .fstr = "snb_ep::OFFCORE_RESPONSE_1:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER:LLC_MISS_REMOTE_DRAM:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "snb_ep::mem_trans_retired:latency_above_threshold", .ret = PFM_SUCCESS, .count = 2, .codes[0]=0x5301cd, .codes[1] = 3, .fstr = "snb_ep::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "snb_ep::mem_trans_retired:latency_above_threshold:ldlat=2", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "snb_ep::mem_trans_retired:latency_above_threshold:ldlat=3", .ret = PFM_SUCCESS, .count = 2, .codes[0]=0x5301cd, .codes[1] = 3, .fstr = "snb_ep::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "snb_ep::mem_trans_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "snb::mem_trans_retired:latency_above_threshold", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301cd, .codes[1] = 3, .fstr = "snb::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "snb::mem_trans_retired:latency_above_threshold:ldlat=2", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "snb::mem_trans_retired:latency_above_threshold:ldlat=3", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301cd, .codes[1] = 3, .fstr = "snb::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "snb::mem_trans_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "ivb::mem_trans_retired:latency_above_threshold", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301cd, .codes[1] = 3, .fstr = "ivb::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "ivb::mem_trans_retired:latency_above_threshold:ldlat=2", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "ivb::mem_trans_retired:latency_above_threshold:ldlat=3:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5101cd, .codes[1] = 3, .fstr = "ivb::MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD:k=0:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "ivb::mem_trans_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "nhm::mem_inst_retired:latency_above_threshold", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x53100b, .codes[1] = 3, .fstr = "nhm::MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "nhm::mem_inst_retired:latency_above_threshold:ldlat=2", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "nhm::mem_inst_retired:latency_above_threshold:ldlat=3", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x53100b, .codes[1] = 3, .fstr = "nhm::MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "nhm::mem_inst_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "wsm::mem_inst_retired:latency_above_threshold", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x53100b, .codes[1] = 3, .fstr = "wsm::MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "wsm::mem_inst_retired:latency_above_threshold:ldlat=2", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "wsm::mem_inst_retired:latency_above_threshold:ldlat=3", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x53100b, .codes[1] = 3, .fstr = "wsm::MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3", }, { SRC_LINE, .name = "wsm::mem_inst_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:NOP_DW_SENT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5308f6, .fstr = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:NOP_DW_SENT:SUBLINK_0", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:ALL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x533ff6, .fstr = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:ALL:SUBLINK_0", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:ALL:SUBLINK_1", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x53bff6, .fstr = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:ALL:SUBLINK_1", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:COMMAND_DW_SENT:DATA_DW_SENT", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5303f6, .fstr = "amd64_fam15h_interlagos::LINK_TRANSMIT_BANDWIDTH_LINK_0:COMMAND_DW_SENT:DATA_DW_SENT:SUBLINK_0", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::DISPATCHED_FPU_OPS:0x4ff:u", .ret = PFM_ERR_ATTR }, { SRC_LINE, .name = "amd64_fam15h_interlagos::DISPATCHED_FPU_OPS:0xff:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x51ff00ull, .fstr = "amd64_fam15h_interlagos::DISPATCHED_FPU_OPS:0xff:k=0:u=1:e=0:i=0:c=0:h=0:g=0" }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:read_block_modify:core_3", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x4005334e0ull, .fstr = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:READ_BLOCK_MODIFY:CORE_3", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x40053f7e0ull, .fstr = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:READ_BLOCK_ANY:ANY_CORE", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:READ_BLOCK_EXCLUSIVE:PREFETCH:READ_BLOCK_MODIFY:core_4", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x400534de0ull, .fstr = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:READ_BLOCK_EXCLUSIVE:READ_BLOCK_MODIFY:PREFETCH:CORE_4", }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:read_block_any:prefetch:core_1", .ret = PFM_ERR_FEATCOMB, /* must use individual umasks to combine with prefetch */ }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:read_block_any:prefetch:core_1:core_3", .ret = PFM_ERR_FEATCOMB, /* core umasks cannot be combined */ }, { SRC_LINE, .name = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:prefetch:core_0", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x4005308e0ull, .fstr = "amd64_fam15h_interlagos::READ_REQUEST_TO_L3_CACHE:PREFETCH:CORE_0", }, { SRC_LINE, .name = "ivb_ep::mem_load_uops_llc_miss_retired:local_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5303d3, .fstr = "ivb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:LOCAL_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb_ep::mem_load_uops_llc_miss_retired:remote_dram", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x530cd3, .fstr = "ivb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb_ep::mem_load_uops_llc_miss_retired:remote_hitm", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5310d3, .fstr = "ivb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:REMOTE_HITM:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb_ep::mem_load_uops_llc_miss_retired:remote_fwd", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5320d3, .fstr = "ivb_ep::MEM_LOAD_UOPS_LLC_MISS_RETIRED:REMOTE_FWD:k=1:u=1:e=0:i=0:c=0:t=0", }, { SRC_LINE, .name = "ivb::mem_load_uops_llc_miss_retired:remote_dram", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "ivb_ep::offcore_response_0:any_request:LLC_MISS_REMOTE", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301b7, .codes[1] = 0x3fff808fffULL, .fstr = "ivb_ep::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER:LLC_MISS_REMOTE_DRAM:SNP_ANY:k=1:u=1:e=0:i=0:c=0:t=0" }, { SRC_LINE, .name = "hsw::mem_trans_retired:latency_above_threshold:ldlat=3:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5101cd, .codes[1] = 3, .fstr = "hsw::MEM_TRANS_RETIRED:LOAD_LATENCY:k=0:u=1:e=0:i=0:c=0:t=0:ldlat=3:intx=0:intxcp=0", }, { SRC_LINE, .name = "hsw::mem_trans_retired:latency_above_threshold:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "hsw::mem_trans_retired:load_latency", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x5301cd, .codes[1] = 3, .fstr = "hsw::MEM_TRANS_RETIRED:LOAD_LATENCY:k=1:u=1:e=0:i=0:c=0:t=0:ldlat=3:intx=0:intxcp=0", }, { SRC_LINE, .name = "hsw::mem_trans_retired:load_latency:ldlat=1000000", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "hsw::mem_trans_retired:latency_above_threshold:ldlat=2:intx=0:intxcp=0", .ret = PFM_ERR_ATTR_VAL, }, { SRC_LINE, .name = "hsw::inst_Retired:any_p:intx", .count = 1, .codes[0] = 0x1005300c0, .fstr = "hsw::INST_RETIRED:ANY_P:k=1:u=1:e=0:i=0:c=0:t=0:intx=1:intxcp=0", }, { SRC_LINE, .name = "hsw::inst_Retired:any_p:intx:intxcp", .count = 1, .codes[0] = 0x3005300c0, .fstr = "hsw::INST_RETIRED:ANY_P:k=1:u=1:e=0:i=0:c=0:t=0:intx=1:intxcp=1", }, { SRC_LINE, .name = "hsw::inst_Retired:any_p:intx=0:intxcp", .count = 1, .codes[0] = 0x2005300c0, .fstr = "hsw::INST_RETIRED:ANY_P:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=1", }, { SRC_LINE, .name = "hsw::cycle_activity:cycles_l2_pending", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x15301a3, .fstr = "hsw::CYCLE_ACTIVITY:CYCLES_L2_PENDING:k=1:u=1:e=0:i=0:t=0:intx=0:intxcp=0", }, { SRC_LINE, .name = "hsw::cycle_activity:cycles_l2_pending:c=8", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "hsw::hle_retired:aborted", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5304c8, .fstr = "hsw::HLE_RETIRED:ABORTED:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0", }, { SRC_LINE, .name = "hsw::rtm_retired:aborted", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5304c9, .fstr = "hsw::RTM_RETIRED:ABORTED:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0", }, { SRC_LINE, .name = "ivb_unc_cbo0::unc_clockticks", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5000ff, .fstr = "ivb_unc_cbo0::UNC_CLOCKTICKS", }, { SRC_LINE, .name = "ivb_unc_cbo1::unc_clockticks", .ret = PFM_ERR_NOTFOUND }, }; #define NUM_TEST_EVENTS (int)(sizeof(x86_test_events)/sizeof(test_event_t)) static int check_pmu_supported(const char *evt) { pfm_pmu_info_t info; char *p; int i, ret; memset(&info, 0, sizeof(info)); info.size = sizeof(info); /* look for pmu_name::.... */ p = strchr(evt, ':'); if (!p) return 1; if (*(p+1) != ':') return 1; pfm_for_all_pmus(i) { ret = pfm_get_pmu_info(i, &info); if (ret != PFM_SUCCESS) continue; if (!strncmp(info.name, evt, p - evt)) return 1; } /* PMU not there */ return 0; } static int check_test_events(FILE *fp) { const test_event_t *e; char *fstr; uint64_t *codes; int count, i, j; int ret, errors = 0; for (i=0, e = x86_test_events; i < NUM_TEST_EVENTS; i++, e++) { codes = NULL; count = 0; fstr = NULL; ret = pfm_get_event_encoding(e->name, PFM_PLM0 | PFM_PLM3, &fstr, NULL, &codes, &count); if (ret != e->ret) { if (ret == PFM_ERR_NOTFOUND && !check_pmu_supported(e->name)) { fprintf(fp,"Line %d, Event%d %s, skipped because no PMU support\n", e->line, i, e->name); continue; } fprintf(fp,"Line %d, Event%d %s, ret=%s(%d) expected %s(%d)\n", e->line, i, e->name, pfm_strerror(ret), ret, pfm_strerror(e->ret), e->ret); errors++; } else { if (ret != PFM_SUCCESS) { if (fstr) { fprintf(fp,"Line %d, Event%d %s, expected fstr NULL but it is not\n", e->line, i, e->name); errors++; } if (count != 0) { fprintf(fp,"Line %d, Event%d %s, expected count=0 instead of %d\n", e->line, i, e->name, count); errors++; } if (codes) { fprintf(fp,"Line %d, Event%d %s, expected codes[] NULL but it is not\n", e->line, i, e->name); errors++; } } else { if (count != e->count) { fprintf(fp,"Line %d, Event%d %s, count=%d expected %d\n", e->line, i, e->name, count, e->count); errors++; } for (j=0; j < count; j++) { if (codes[j] != e->codes[j]) { fprintf(fp,"Line %d, Event%d %s, codes[%d]=%#"PRIx64" expected %#"PRIx64"\n", e->line, i, e->name, j, codes[j], e->codes[j]); errors++; } } if (e->fstr && strcmp(fstr, e->fstr)) { fprintf(fp,"Line %d, Event%d %s, fstr=%s expected %s\n", e->line, i, e->name, fstr, e->fstr); errors++; } } } if (codes) free(codes); if (fstr) free(fstr); } printf("\t %d x86 events: %d errors\n", i, errors); return errors; } int validate_arch(FILE *fp) { return check_test_events(fp); } papi-5.3.0/src/libpfm4/tests/validate.c0000600003276200002170000001717512247131124017375 0ustar ralphundrgrad/* * validate.c - validate event tables + encodings * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #ifdef __linux__ #include #endif #define __weak_func __attribute__((weak)) #ifdef PFMLIB_WINDOWS int set_env_var(const char *var, char *value, int ov) { size_t len; char *str; int ret; len = strlen(var) + 1 + strlen(value) + 1; str = malloc(len); if (!str) return PFM_ERR_NOMEM; sprintf(str, "%s=%s", var, value); ret = putenv(str); free(str); return ret ? PFM_ERR_INVAL : PFM_SUCCESS; } #else static inline int set_env_var(const char *var, char *value, int ov) { return setenv(var, value, ov); } #endif __weak_func int validate_arch(FILE *fp) { return 0; } static struct { int valid_intern; int valid_arch; } options; static void usage(void) { printf("validate [-c] [-a] [-A]\n" "-c\trun the library validate events\n" "-a\trun architecture specific event tests\n" "-A\trun all tests\n" "-h\tget help\n"); } static int validate_event_tables(void) { pfm_pmu_info_t pinfo; int i, ret, errors = 0; memset(&pinfo, 0, sizeof(pinfo)); pinfo.size = sizeof(pinfo); pfm_for_all_pmus(i) { ret = pfm_get_pmu_info(i, &pinfo); if (ret != PFM_SUCCESS) continue; printf("\tchecking %s (%d events): ", pinfo.name, pinfo.nevents); fflush(stdout); ret = pfm_pmu_validate(i, stdout); if (ret != PFM_SUCCESS && ret != PFM_ERR_NOTSUPP) { printf("Failed\n"); errors++; } else if (ret == PFM_ERR_NOTSUPP) { printf("N/A\n"); } else { printf("Passed\n"); } } return errors; } #if __WORDSIZE == 64 #define STRUCT_MULT 8 #else #define STRUCT_MULT 4 #endif #define MAX_FIELDS 32 typedef struct { const char *name; size_t sz; } field_desc_t; typedef struct { const char *name; size_t sz; size_t bitfield_sz; size_t abi_sz; field_desc_t fields[MAX_FIELDS]; } struct_desc_t; #define LAST_STRUCT { .name = NULL, } #define FIELD(n, st) \ { .name = #n, \ .sz = sizeof(((st *)(0))->n), \ } #define LAST_FIELD { .name = NULL, } static const struct_desc_t pfmlib_structs[]={ { .name = "pfm_pmu_info_t", .sz = sizeof(pfm_pmu_info_t), .bitfield_sz = 4, .abi_sz = PFM_PMU_INFO_ABI0, .fields= { FIELD(name, pfm_pmu_info_t), FIELD(desc, pfm_pmu_info_t), FIELD(size, pfm_pmu_info_t), FIELD(pmu, pfm_pmu_info_t), FIELD(type, pfm_pmu_info_t), FIELD(nevents, pfm_pmu_info_t), FIELD(first_event, pfm_pmu_info_t), FIELD(max_encoding, pfm_pmu_info_t), FIELD(num_cntrs, pfm_pmu_info_t), FIELD(num_fixed_cntrs, pfm_pmu_info_t), LAST_FIELD }, }, { .name = "pfm_event_info_t", .sz = sizeof(pfm_event_info_t), .bitfield_sz = 4, .abi_sz = PFM_EVENT_INFO_ABI0, .fields= { FIELD(name, pfm_event_info_t), FIELD(desc, pfm_event_info_t), FIELD(equiv, pfm_event_info_t), FIELD(size, pfm_event_info_t), FIELD(code, pfm_event_info_t), FIELD(pmu, pfm_event_info_t), FIELD(dtype, pfm_event_info_t), FIELD(idx, pfm_event_info_t), FIELD(nattrs, pfm_event_info_t), FIELD(reserved, pfm_event_info_t), LAST_FIELD }, }, { .name = "pfm_event_attr_info_t", .sz = sizeof(pfm_event_attr_info_t), .bitfield_sz = 4+8, .abi_sz = PFM_ATTR_INFO_ABI0, .fields= { FIELD(name, pfm_event_attr_info_t), FIELD(desc, pfm_event_attr_info_t), FIELD(equiv, pfm_event_attr_info_t), FIELD(size, pfm_event_attr_info_t), FIELD(code, pfm_event_attr_info_t), FIELD(type, pfm_event_attr_info_t), FIELD(idx, pfm_event_attr_info_t), FIELD(ctrl, pfm_event_attr_info_t), LAST_FIELD }, }, { .name = "pfm_pmu_encode_arg_t", .sz = sizeof(pfm_pmu_encode_arg_t), .abi_sz = PFM_RAW_ENCODE_ABI0, .fields= { FIELD(codes, pfm_pmu_encode_arg_t), FIELD(fstr, pfm_pmu_encode_arg_t), FIELD(size, pfm_pmu_encode_arg_t), FIELD(count, pfm_pmu_encode_arg_t), FIELD(idx, pfm_pmu_encode_arg_t), LAST_FIELD }, }, #ifdef __linux__ { .name = "pfm_perf_encode_arg_t", .sz = sizeof(pfm_perf_encode_arg_t), .bitfield_sz = 0, .abi_sz = PFM_PERF_ENCODE_ABI0, .fields= { FIELD(attr, pfm_perf_encode_arg_t), FIELD(fstr, pfm_perf_encode_arg_t), FIELD(size, pfm_perf_encode_arg_t), FIELD(idx, pfm_perf_encode_arg_t), FIELD(cpu, pfm_perf_encode_arg_t), FIELD(flags, pfm_perf_encode_arg_t), FIELD(pad0, pfm_perf_encode_arg_t), LAST_FIELD }, }, #endif LAST_STRUCT }; static int validate_structs(void) { const struct_desc_t *d; const field_desc_t *f; size_t sz; int errors = 0; int abi = LIBPFM_ABI_VERSION; printf("\tlibpfm ABI version : %d\n", abi); for (d = pfmlib_structs; d->name; d++) { printf("\t%s : ", d->name); if (d->abi_sz != d->sz) { printf("struct size does not correspond to ABI size %zu vs. %zu)\n", d->abi_sz, d->sz); errors++; } if (d->sz % STRUCT_MULT) { printf("Failed (wrong mult size=%zu)\n", d->sz); errors++; } sz = d->bitfield_sz; for (f = d->fields; f->name; f++) { sz += f->sz; } if (sz != d->sz) { printf("Failed (invisible padding of %zu bytes)\n", d->sz - sz); errors++; continue; } printf("Passed\n"); } return errors; } int main(int argc, char **argv) { int ret, c, errors = 0; while ((c=getopt(argc, argv,"hcaA")) != -1) { switch(c) { case 'c': options.valid_intern = 1; break; case 'a': options.valid_arch = 1; break; case 'A': options.valid_arch = 1; options.valid_intern = 1; break; case 'h': usage(); exit(0); default: errx(1, "unknown option error"); } } /* to allow encoding of events from non detected PMU models */ ret = set_env_var("LIBPFM_ENCODE_INACTIVE", "1", 1); if (ret != PFM_SUCCESS) errx(1, "cannot force inactive encoding"); ret = pfm_initialize(); if (ret != PFM_SUCCESS) errx(1, "cannot initialize libpfm: %s", pfm_strerror(ret)); /* run everything by default */ if (!(options.valid_intern || options.valid_arch)) { options.valid_intern = 1; options.valid_arch = 1; } printf("Libpfm structure tests:\n"); errors += validate_structs(); if (options.valid_intern) { printf("Libpfm internal table tests:\n"); errors += validate_event_tables(); } if (options.valid_arch) { printf("Architecture specific tests:\n"); errors += validate_arch(stderr); } pfm_terminate(); if (errors) printf("Total %d errors\n", errors); else printf("All tests passed\n"); return errors; } papi-5.3.0/src/libpfm4/tests/Makefile0000600003276200002170000000370112247131124017066 0ustar ralphundrgrad# # Copyright (c) 2010 Google, Inc # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk SRCS=validate.c ifeq ($(CONFIG_PFMLIB_ARCH_X86),y) SRCS += validate_x86.c endif ifeq ($(CONFIG_PFMLIB_ARCH_MIPS),y) SRCS += validate_mips.c endif ifeq ($(CONFIG_PFMLIB_ARCH_ARM),y) SRCS += validate_arm.c endif ifeq ($(CONFIG_PFMLIB_ARCH_POWERPC),y) SRCS += validate_power.c endif CFLAGS+= -I. -D_GNU_SOURCE LIBS += -lm ifeq ($(SYS),Linux) CFLAGS+= -pthread endif OBJS=$(SRCS:.c=.o) TARGETS=validate all: $(TARGETS) validate: $(OBJS) $(PFMLIB) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS) clean: $(RM) -f *.o $(TARGETS) *~ distclean: clean # # examples are installed as part of the RPM install, typically in /usr/share/doc/libpfm-X.Y/ # .PHONY: install depend install_examples papi-5.3.0/src/libpfm4/tests/validate_arm.c0000600003276200002170000001243212247131124020223 0ustar ralphundrgrad/* * validate_arm.c - validate ARM event tables + encodings * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include #include #include #include #include #define MAX_ENCODING 1 #define SRC_LINE .line = __LINE__ typedef struct { const char *name; const char *fstr; uint64_t codes[MAX_ENCODING]; int ret, count, line; } test_event_t; static const test_event_t arm_test_events[]={ { SRC_LINE, .name = "arm_ac8::NEON_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x5a, .fstr = "arm_ac8::NEON_CYCLES", }, { SRC_LINE, .name = "arm_ac8::NEON_CYCLES:k", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "arm_ac8::CPU_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xff, .fstr = "arm_ac8::CPU_CYCLES", }, { SRC_LINE, .name = "arm_ac8::CPU_CYCLES_HALTED", .ret = PFM_ERR_NOTFOUND, }, { SRC_LINE, .name = "arm_ac9::CPU_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xff, .fstr = "arm_ac9::CPU_CYCLES", }, { SRC_LINE, .name = "arm_ac9::DMB_DEP_STALL_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x86, .fstr = "arm_ac9::DMB_DEP_STALL_CYCLES", }, { SRC_LINE, .name = "arm_ac9::CPU_CYCLES:u", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "arm_ac9::JAVA_HW_BYTECODE_EXEC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x40, .fstr = "arm_ac9::JAVA_HW_BYTECODE_EXEC", }, { SRC_LINE, .name = "arm_ac15::CPU_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x8000011, .fstr = "arm_ac15::CPU_CYCLES:k=1:u=1:hv=0", }, { SRC_LINE, .name = "arm_ac15::CPU_CYCLES:k", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x88000011, .fstr = "arm_ac15::CPU_CYCLES:k=1:u=0:hv=0", }, { SRC_LINE, .name = "arm_ac15::CPU_CYCLES:k:u", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x8000011, .fstr = "arm_ac15::CPU_CYCLES:k=1:u=1:hv=0", }, { SRC_LINE, .name = "arm_ac15::INST_RETIRED", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x8000008, .fstr = "arm_ac15::INST_RETIRED:k=1:u=1:hv=0", }, { SRC_LINE, .name = "arm_1176::CPU_CYCLES", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xff, .fstr = "arm_1176::CPU_CYCLES", }, { SRC_LINE, .name = "arm_1176::CPU_CYCLES:k", .ret = PFM_ERR_ATTR, }, { SRC_LINE, .name = "arm_1176::INSTR_EXEC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x07, .fstr = "arm_1176::INSTR_EXEC", }, }; #define NUM_TEST_EVENTS (int)(sizeof(arm_test_events)/sizeof(test_event_t)) static int check_test_events(FILE *fp) { const test_event_t *e; char *fstr; uint64_t *codes; int count, i, j; int ret, errors = 0; for (i = 0, e = arm_test_events; i < NUM_TEST_EVENTS; i++, e++) { codes = NULL; count = 0; fstr = NULL; ret = pfm_get_event_encoding(e->name, PFM_PLM0 | PFM_PLM3, &fstr, NULL, &codes, &count); if (ret != e->ret) { fprintf(fp,"Event%d %s, ret=%s(%d) expected %s(%d)\n", i, e->name, pfm_strerror(ret), ret, pfm_strerror(e->ret), e->ret); errors++; } else { if (ret != PFM_SUCCESS) { if (fstr) { fprintf(fp,"Event%d %s, expected fstr NULL but it is not\n", i, e->name); errors++; } if (count != 0) { fprintf(fp,"Event%d %s, expected count=0 instead of %d\n", i, e->name, count); errors++; } if (codes) { fprintf(fp,"Event%d %s, expected codes[] NULL but it is not\n", i, e->name); errors++; } } else { if (count != e->count) { fprintf(fp,"Event%d %s, count=%d expected %d\n", i, e->name, count, e->count); errors++; } for (j=0; j < count; j++) { if (codes[j] != e->codes[j]) { fprintf(fp,"Event%d %s, codes[%d]=%#"PRIx64" expected %#"PRIx64"\n", i, e->name, j, codes[j], e->codes[j]); errors++; } } if (e->fstr && strcmp(fstr, e->fstr)) { fprintf(fp,"Event%d %s, fstr=%s expected %s\n", i, e->name, fstr, e->fstr); errors++; } } } if (codes) free(codes); if (fstr) free(fstr); } printf("\t %d ARM events: %d errors\n", i, errors); return errors; } int validate_arch(FILE *fp) { return check_test_events(fp); } papi-5.3.0/src/libpfm4/tests/validate_mips.c0000600003276200002170000001347212247131124020421 0ustar ralphundrgrad/* * validate_mips.c - validate MIPS event tables + encodings * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include #include #include #include #include #define MAX_ENCODING 2 #define SRC_LINE .line = __LINE__ typedef struct { const char *name; const char *fstr; uint64_t codes[MAX_ENCODING]; int ret, count, line; } test_event_t; static const test_event_t mips_test_events[]={ { SRC_LINE, .name = "mips_74k::cycles", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xa, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=1:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::cycles:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=1:u=0:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::cycles:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x8, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=0:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::cycles:s", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=0:u=0:s=1:e=0", }, { SRC_LINE, .name = "mips_74k::cycles:e", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x1, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=0:u=0:s=0:e=1", }, { SRC_LINE, .name = "mips_74k::cycles:u:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0xa, .codes[1] = 0xf, .fstr = "mips_74k::CYCLES:k=1:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::instructions", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2a, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=1:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::instructions:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x22, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=1:u=0:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::instructions:u", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x28, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=0:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::instructions:s", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x24, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=0:u=0:s=1:e=0", }, { SRC_LINE, .name = "mips_74k::instructions:e", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x21, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=0:u=0:s=0:e=1", }, { SRC_LINE, .name = "mips_74k::instructions:u:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x2a, .codes[1] = 0xf, .fstr = "mips_74k::INSTRUCTIONS:k=1:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::PREDICTED_JR_31:u:k", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x4a, .codes[1] = 0x5, .fstr = "mips_74k::PREDICTED_JR_31:k=1:u=1:s=0:e=0", }, { SRC_LINE, .name = "mips_74k::JR_31_MISPREDICTIONS:s:e", .ret = PFM_SUCCESS, .count = 2, .codes[0] = 0x45, .codes[1] = 0xa, .fstr = "mips_74k::JR_31_MISPREDICTIONS:k=0:u=0:s=1:e=1", }, }; #define NUM_TEST_EVENTS (int)(sizeof(mips_test_events)/sizeof(test_event_t)) static int check_test_events(FILE *fp) { const test_event_t *e; char *fstr; uint64_t *codes; int count, i, j; int ret, errors = 0; for (i = 0, e = mips_test_events; i < NUM_TEST_EVENTS; i++, e++) { codes = NULL; count = 0; fstr = NULL; ret = pfm_get_event_encoding(e->name, PFM_PLM0 | PFM_PLM3, &fstr, NULL, &codes, &count); if (ret != e->ret) { fprintf(fp,"Event%d %s, ret=%s(%d) expected %s(%d)\n", i, e->name, pfm_strerror(ret), ret, pfm_strerror(e->ret), e->ret); errors++; } else { if (ret != PFM_SUCCESS) { if (fstr) { fprintf(fp,"Event%d %s, expected fstr NULL but it is not\n", i, e->name); errors++; } if (count != 0) { fprintf(fp,"Event%d %s, expected count=0 instead of %d\n", i, e->name, count); errors++; } if (codes) { fprintf(fp,"Event%d %s, expected codes[] NULL but it is not\n", i, e->name); errors++; } } else { if (count != e->count) { fprintf(fp,"Event%d %s, count=%d expected %d\n", i, e->name, count, e->count); errors++; } for (j=0; j < count; j++) { if (codes[j] != e->codes[j]) { fprintf(fp,"Event%d %s, codes[%d]=%#"PRIx64" expected %#"PRIx64"\n", i, e->name, j, codes[j], e->codes[j]); errors++; } } if (e->fstr && strcmp(fstr, e->fstr)) { fprintf(fp,"Event%d %s, fstr=%s expected %s\n", i, e->name, fstr, e->fstr); errors++; } } } if (codes) free(codes); if (fstr) free(fstr); } printf("\t %d MIPS events: %d errors\n", i, errors); return errors; } int validate_arch(FILE *fp) { return check_test_events(fp); } papi-5.3.0/src/libpfm4/tests/validate_power.c0000600003276200002170000001240712247131124020602 0ustar ralphundrgrad/* * validate_power.c - validate PowerPC event tables + encodings * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include #include #include #include #include #define MAX_ENCODING 1 #define SRC_LINE .line = __LINE__ typedef struct { const char *name; const char *fstr; uint64_t codes[MAX_ENCODING]; int ret, count, line; } test_event_t; static const test_event_t ppc_test_events[]={ { SRC_LINE, .name = "ppc970::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x7, .fstr = "ppc970::PM_CYC", }, { SRC_LINE, .name = "ppc970::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x320, .fstr = "ppc970::PM_INST_DISP", }, { SRC_LINE, .name = "ppc970mp::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x7, .fstr = "ppc970mp::PM_CYC", }, { SRC_LINE, .name = "ppc970mp::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x320, .fstr = "ppc970mp::PM_INST_DISP", }, { SRC_LINE, .name = "power4::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x7, .fstr = "power4::PM_CYC", }, { SRC_LINE, .name = "power4::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x221, .fstr = "power4::PM_INST_DISP", }, { SRC_LINE, .name = "power5::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xf, .fstr = "power5::PM_CYC", }, { SRC_LINE, .name = "power5::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x300009, .fstr = "power5::PM_INST_DISP", }, { SRC_LINE, .name = "power5p::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0xf, .fstr = "power5p::PM_CYC", }, { SRC_LINE, .name = "power5p::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x300009, .fstr = "power5p::PM_INST_DISP", }, { SRC_LINE, .name = "power6::PM_INST_CMPL", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x2, .fstr = "power6::PM_INST_CMPL", }, { SRC_LINE, .name = "power6::PM_THRD_CONC_RUN_INST", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x300026, .fstr = "power6::PM_THRD_CONC_RUN_INST", }, { SRC_LINE, .name = "power7::PM_CYC", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x1e, .fstr = "power7::PM_CYC", }, { SRC_LINE, .name = "power7::PM_INST_DISP", .ret = PFM_SUCCESS, .count = 1, .codes[0] = 0x200f2, .fstr = "power7::PM_INST_DISP", }, }; #define NUM_TEST_EVENTS (int)(sizeof(ppc_test_events)/sizeof(test_event_t)) static int check_test_events(FILE *fp) { const test_event_t *e; char *fstr; uint64_t *codes; int count, i, j; int ret, errors = 0; for (i = 0, e = ppc_test_events; i < NUM_TEST_EVENTS; i++, e++) { codes = NULL; count = 0; fstr = NULL; ret = pfm_get_event_encoding(e->name, PFM_PLM0 | PFM_PLM3, &fstr, NULL, &codes, &count); if (ret != e->ret) { fprintf(fp,"Event%d %s, ret=%s(%d) expected %s(%d)\n", i, e->name, pfm_strerror(ret), ret, pfm_strerror(e->ret), e->ret); errors++; } else { if (ret != PFM_SUCCESS) { if (fstr) { fprintf(fp,"Event%d %s, expected fstr NULL but it is not\n", i, e->name); errors++; } if (count != 0) { fprintf(fp,"Event%d %s, expected count=0 instead of %d\n", i, e->name, count); errors++; } if (codes) { fprintf(fp,"Event%d %s, expected codes[] NULL but it is not\n", i, e->name); errors++; } } else { if (count != e->count) { fprintf(fp,"Event%d %s, count=%d expected %d\n", i, e->name, count, e->count); errors++; } for (j=0; j < count; j++) { if (codes[j] != e->codes[j]) { fprintf(fp,"Event%d %s, codes[%d]=%#"PRIx64" expected %#"PRIx64"\n", i, e->name, j, codes[j], e->codes[j]); errors++; } } if (e->fstr && strcmp(fstr, e->fstr)) { fprintf(fp,"Event%d %s, fstr=%s expected %s\n", i, e->name, fstr, e->fstr); errors++; } } } if (codes) free(codes); if (fstr) free(fstr); } printf("\t %d PowerPC events: %d errors\n", i, errors); return errors; } int validate_arch(FILE *fp) { return check_test_events(fp); } papi-5.3.0/src/libpfm4/lib/0000700003276200002170000000000012247131124015027 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/lib/pfmlib_cell.c0000600003276200002170000004354512247131124017460 0ustar ralphundrgrad/* * pfmlib_cell.c : support for the Cell PMU family * * Copyright (c) 2007 TOSHIBA CORPORATION based on code from * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* public headers */ #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_cell_priv.h" /* architecture private */ #include "cell_events.h" /* PMU private */ #define SIGNAL_TYPE_CYCLES 0 #define PM_COUNTER_CTRL_CYLES 0x42C00000U #define PFM_CELL_NUM_PMCS 24 #define PFM_CELL_EVENT_MIN 1 #define PFM_CELL_EVENT_MAX 8 #define PMX_MIN_NUM 1 #define PMX_MAX_NUM 8 #define PFM_CELL_16BIT_CNTR_EVENT_MAX 8 #define PFM_CELL_32BIT_CNTR_EVENT_MAX 4 #define COMMON_REG_NUMS 8 #define ENABLE_WORD0 0 #define ENABLE_WORD1 1 #define ENABLE_WORD2 2 #define PFM_CELL_GRP_CONTROL_REG_GRP0_BIT 30 #define PFM_CELL_GRP_CONTROL_REG_GRP1_BIT 28 #define PFM_CELL_BASE_WORD_UNIT_FIELD_BIT 24 #define PFM_CELL_WORD_UNIT_FIELD_WIDTH 2 #define PFM_CELL_MAX_WORD_NUMBER 3 #define PFM_CELL_COUNTER_CONTROL_GRP1 0x80000000U #define PFM_CELL_DEFAULT_TRIGGER_EVENT_UNIT 0x00555500U #define PFM_CELL_PM_CONTROL_16BIT_CNTR_MASK 0x01E00000U #define PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_PROBLEM 0x00080000U #define PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_SUPERVISOR 0x00000000U #define PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_HYPERVISOR 0x00040000U #define PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_ALL 0x000C0000U #define PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_MASK 0x000C0000U #define ONLY_WORD(x) \ ((x == WORD_0_ONLY)||(x == WORD_2_ONLY)) ? x : 0 struct pfm_cell_signal_group_desc { unsigned int signal_type; unsigned int word_type; unsigned long long word; unsigned long long freq; unsigned int subunit; }; #define swap_int(num1, num2) do { \ int tmp = num1; \ num1 = num2; \ num2 = tmp; \ } while(0) static int pfm_cell_detect(void) { int ret; char buffer[128]; ret = __pfm_getcpuinfo_attr("cpu", buffer, sizeof(buffer)); if (ret == -1) { return PFMLIB_ERR_NOTSUPP; } if (strcmp(buffer, "Cell Broadband Engine, altivec supported")) { return PFMLIB_ERR_NOTSUPP; } return PFMLIB_SUCCESS; } static int get_pmx_offset(int pmx_num, unsigned int *pmx_ctrl_bits) { /* pmx_num==0 -> not specified * pmx_num==1 -> pm0 * : * pmx_num==8 -> pm7 */ int i = 0; int offset; if ((pmx_num >= PMX_MIN_NUM) && (pmx_num <= PMX_MAX_NUM)) { /* offset is specified */ offset = (pmx_num - 1); if ((~*pmx_ctrl_bits >> offset) & 0x1) { *pmx_ctrl_bits |= (0x1 << offset); return offset; } else { /* offset is used */ return PFMLIB_ERR_INVAL; } } else if (pmx_num == 0){ /* offset is not specified */ while (((*pmx_ctrl_bits >> i) & 0x1) && (i < PMX_MAX_NUM)) { i++; } *pmx_ctrl_bits |= (0x1 << i); return i; } /* pmx_num is invalid */ return PFMLIB_ERR_INVAL; } static unsigned long long search_enable_word(int word) { unsigned long long count = 0; while ((~word) & 0x1) { count++; word >>= 1; } return count; } static int get_count_bit(unsigned int type) { int count = 0; while(type) { if (type & 1) { count++; } type >>= 1; } return count; } static int get_debug_bus_word(struct pfm_cell_signal_group_desc *group0, struct pfm_cell_signal_group_desc *group1) { unsigned int word_type0, word_type1; /* search enable word */ word_type0 = group0->word_type; word_type1 = group1->word_type; if (group1->signal_type == NONE_SIGNAL) { group0->word = search_enable_word(word_type0); goto found; } /* swap */ if ((get_count_bit(word_type0) > get_count_bit(word_type1)) || (group0->freq == PFM_CELL_PME_FREQ_SPU)) { swap_int(group0->signal_type, group1->signal_type); swap_int(group0->freq, group1->freq); swap_int(group0->word_type, group1->word_type); swap_int(group0->subunit, group1->subunit); swap_int(word_type0, word_type1); } if ((ONLY_WORD(word_type0) != 0) && (word_type0 == word_type1)) { return PFMLIB_ERR_INVAL; } if (ONLY_WORD(word_type0)) { group0->word = search_enable_word(ONLY_WORD(word_type0)); word_type1 &= ~(1UL << (group0->word)); group1->word = search_enable_word(word_type1); } else if (ONLY_WORD(word_type1)) { group1->word = search_enable_word(ONLY_WORD(word_type1)); word_type0 &= ~(1UL << (group1->word)); group0->word = search_enable_word(word_type0); } else { group0->word = ENABLE_WORD0; if (word_type1 == WORD_0_AND_1) { group1->word = ENABLE_WORD1; } else if(word_type1 == WORD_0_AND_2) { group1->word = ENABLE_WORD2; } else { return PFMLIB_ERR_INVAL; } } found: return PFMLIB_SUCCESS; } static unsigned int get_signal_type(unsigned long long event_code) { return (event_code & 0x00000000FFFFFFFFULL) / 100; } static unsigned int get_signal_bit(unsigned long long event_code) { return (event_code & 0x00000000FFFFFFFFULL) % 100; } static int is_spe_signal_group(unsigned int signal_type) { if (41 <= signal_type && signal_type <= 56) { return 1; } else { return 0; } } static int check_signal_type(pfmlib_input_param_t *inp, pfmlib_cell_input_param_t *mod_in, struct pfm_cell_signal_group_desc *group0, struct pfm_cell_signal_group_desc *group1) { pfmlib_event_t *e; unsigned int event_cnt; int signal_cnt = 0; int i; int cycles_signal_cnt = 0; unsigned int signal_type, subunit; e = inp->pfp_events; event_cnt = inp->pfp_event_count; for(i = 0; i < event_cnt; i++) { signal_type = get_signal_type(cell_pe[e[i].event].pme_code); if ((signal_type == SIGNAL_SPU_TRIGGER) || (signal_type == SIGNAL_SPU_EVENT)) { continue; } if (signal_type == SIGNAL_TYPE_CYCLES) { cycles_signal_cnt = 1; continue; } subunit = 0; if (is_spe_signal_group(signal_type)) { subunit = mod_in->pfp_cell_counters[i].spe_subunit; } switch(signal_cnt) { case 0: group0->signal_type = signal_type; group0->word_type = cell_pe[e[i].event].pme_enable_word; group0->freq = cell_pe[e[i].event].pme_freq; group0->subunit = subunit; signal_cnt++; break; case 1: if ((group0->signal_type != signal_type) || (is_spe_signal_group(signal_type) && group0->subunit != subunit)) { group1->signal_type = signal_type; group1->word_type = cell_pe[e[i].event].pme_enable_word; group1->freq = cell_pe[e[i].event].pme_freq; group1->subunit = subunit; signal_cnt++; } break; case 2: if ((group0->signal_type != signal_type) && (group1->signal_type != signal_type)) { DPRINT("signal count is invalid\n"); return PFMLIB_ERR_INVAL; } break; default: DPRINT("signal count is invalid\n"); return PFMLIB_ERR_INVAL; } } return (signal_cnt + cycles_signal_cnt); } /* * The assignment between the privilege leve options * and ppu-count-mode field in pm_control register. * * option ppu count mode(pm_control) * --------------------------------- * -u(-3) 0b10 : Problem mode * -k(-0) 0b00 : Supervisor mode * -1 0b00 : Supervisor mode * -2 0b01 : Hypervisor mode * two options 0b11 : Any mode * * Note : Hypervisor-mode and Any-mode don't work on PS3. * */ static unsigned int get_ppu_count_mode(unsigned int plm) { unsigned int ppu_count_mode = 0; switch (plm) { case PFM_PLM0: case PFM_PLM1: ppu_count_mode = PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_SUPERVISOR; break; case PFM_PLM2: ppu_count_mode = PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_HYPERVISOR; break; case PFM_PLM3: ppu_count_mode = PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_PROBLEM; break; default : ppu_count_mode = PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_ALL; break; } return ppu_count_mode; } static int pfm_cell_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_cell_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfmlib_event_t *e; pfmlib_reg_t *pc, *pd; unsigned int event_cnt; unsigned int signal_cnt = 0, pmcs_cnt = 0; unsigned int signal_type; unsigned long long signal_bit; struct pfm_cell_signal_group_desc group[2]; int pmx_offset = 0; int i, ret; int input_control, polarity, count_cycle, count_enable; unsigned long long subunit; int shift0, shift1; unsigned int pmx_ctrl_bits; int max_event_cnt = PFM_CELL_32BIT_CNTR_EVENT_MAX; count_enable = 1; group[0].signal_type = group[1].signal_type = NONE_SIGNAL; group[0].word = group[1].word = 0L; group[0].freq = group[1].freq = 0L; group[0].subunit = group[1].subunit = 0; group[0].word_type = group[1].word_type = WORD_NONE; event_cnt = inp->pfp_event_count; e = inp->pfp_events; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; /* check event_cnt */ if (mod_in->control & PFM_CELL_PM_CONTROL_16BIT_CNTR_MASK) max_event_cnt = PFM_CELL_16BIT_CNTR_EVENT_MAX; if (event_cnt < PFM_CELL_EVENT_MIN) return PFMLIB_ERR_NOTFOUND; if (event_cnt > max_event_cnt) return PFMLIB_ERR_TOOMANY; /* check signal type */ signal_cnt = check_signal_type(inp, mod_in, &group[0], &group[1]); if (signal_cnt == PFMLIB_ERR_INVAL) return PFMLIB_ERR_NOASSIGN; /* decide debug_bus word */ if (signal_cnt != 0 && group[0].signal_type != NONE_SIGNAL) { ret = get_debug_bus_word(&group[0], &group[1]); if (ret != PFMLIB_SUCCESS) return PFMLIB_ERR_NOASSIGN; } /* common register setting */ pc[pmcs_cnt].reg_num = REG_GROUP_CONTROL; if (signal_cnt == 1) { pc[pmcs_cnt].reg_value = group[0].word << PFM_CELL_GRP_CONTROL_REG_GRP0_BIT; } else if (signal_cnt == 2) { pc[pmcs_cnt].reg_value = (group[0].word << PFM_CELL_GRP_CONTROL_REG_GRP0_BIT) | (group[1].word << PFM_CELL_GRP_CONTROL_REG_GRP1_BIT); } pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_DEBUG_BUS_CONTROL; if (signal_cnt == 1) { shift0 = PFM_CELL_BASE_WORD_UNIT_FIELD_BIT + ((PFM_CELL_MAX_WORD_NUMBER - group[0].word) * PFM_CELL_WORD_UNIT_FIELD_WIDTH); pc[pmcs_cnt].reg_value = group[0].freq << shift0; } else if (signal_cnt == 2) { shift0 = PFM_CELL_BASE_WORD_UNIT_FIELD_BIT + ((PFM_CELL_MAX_WORD_NUMBER - group[0].word) * PFM_CELL_WORD_UNIT_FIELD_WIDTH); shift1 = PFM_CELL_BASE_WORD_UNIT_FIELD_BIT + ((PFM_CELL_MAX_WORD_NUMBER - group[1].word) * PFM_CELL_WORD_UNIT_FIELD_WIDTH); pc[pmcs_cnt].reg_value = (group[0].freq << shift0) | (group[1].freq << shift1); } pc[pmcs_cnt].reg_value |= PFM_CELL_DEFAULT_TRIGGER_EVENT_UNIT; pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_TRACE_ADDRESS; pc[pmcs_cnt].reg_value = 0; pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_EXT_TRACE_TIMER; pc[pmcs_cnt].reg_value = 0; pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_PM_STATUS; pc[pmcs_cnt].reg_value = 0; pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_PM_CONTROL; pc[pmcs_cnt].reg_value = (mod_in->control & ~PFM_CELL_PM_CONTROL_PPU_CNTR_MODE_MASK) | get_ppu_count_mode(inp->pfp_dfl_plm); pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_PM_INTERVAL; pc[pmcs_cnt].reg_value = mod_in->interval; pmcs_cnt++; pc[pmcs_cnt].reg_num = REG_PM_START_STOP; pc[pmcs_cnt].reg_value = mod_in->triggers; pmcs_cnt++; pmx_ctrl_bits = 0; /* pmX register setting */ for(i = 0; i < event_cnt; i++) { /* PMX_CONTROL */ pmx_offset = get_pmx_offset(mod_in->pfp_cell_counters[i].pmX_control_num, &pmx_ctrl_bits); if (pmx_offset == PFMLIB_ERR_INVAL) { DPRINT("pmX already used\n"); return PFMLIB_ERR_INVAL; } signal_type = get_signal_type(cell_pe[e[i].event].pme_code); if (signal_type == SIGNAL_TYPE_CYCLES) { pc[pmcs_cnt].reg_value = PM_COUNTER_CTRL_CYLES; pc[pmcs_cnt].reg_num = REG_PM0_CONTROL + pmx_offset; pmcs_cnt++; pc[pmcs_cnt].reg_value = cell_pe[e[i].event].pme_code; pc[pmcs_cnt].reg_num = REG_PM0_EVENT + pmx_offset; pmcs_cnt++; pd[i].reg_num = pmx_offset; pd[i].reg_value = 0; continue; } switch(cell_pe[e[i].event].pme_type) { case COUNT_TYPE_BOTH_TYPE: case COUNT_TYPE_CUMULATIVE_LEN: case COUNT_TYPE_MULTI_CYCLE: case COUNT_TYPE_SINGLE_CYCLE: count_cycle = 1; break; case COUNT_TYPE_OCCURRENCE: count_cycle = 0; break; default: return PFMLIB_ERR_INVAL; } signal_bit = get_signal_bit(cell_pe[e[i].event].pme_code); polarity = mod_in->pfp_cell_counters[i].polarity; input_control = mod_in->pfp_cell_counters[i].input_control; subunit = 0; if (is_spe_signal_group(signal_type)) { subunit = mod_in->pfp_cell_counters[i].spe_subunit; } pc[pmcs_cnt].reg_value = ( (signal_bit << (31 - 5)) | (input_control << (31 - 6)) | (polarity << (31 - 7)) | (count_cycle << (31 - 8)) | (count_enable << (31 - 9)) ); pc[pmcs_cnt].reg_num = REG_PM0_CONTROL + pmx_offset; if (signal_type == group[1].signal_type && subunit == group[1].subunit) { pc[pmcs_cnt].reg_value |= PFM_CELL_COUNTER_CONTROL_GRP1; } pmcs_cnt++; /* PMX_EVENT */ pc[pmcs_cnt].reg_num = REG_PM0_EVENT + pmx_offset; /* debug bus word setting */ if (signal_type == group[0].signal_type && subunit == group[0].subunit) { pc[pmcs_cnt].reg_value = (cell_pe[e[i].event].pme_code | (group[0].word << 48) | (subunit << 32)); } else if (signal_type == group[1].signal_type && subunit == group[1].subunit) { pc[pmcs_cnt].reg_value = (cell_pe[e[i].event].pme_code | (group[1].word << 48) | (subunit << 32)); } else if ((signal_type == SIGNAL_SPU_TRIGGER) || (signal_type == SIGNAL_SPU_EVENT)) { pc[pmcs_cnt].reg_value = cell_pe[e[i].event].pme_code | (subunit << 32); } else { return PFMLIB_ERR_INVAL; } pmcs_cnt++; /* pmd setting */ pd[i].reg_num = pmx_offset; pd[i].reg_value = 0; } outp->pfp_pmc_count = pmcs_cnt; outp->pfp_pmd_count = event_cnt; return PFMLIB_SUCCESS; } static int pfm_cell_dispatch_events(pfmlib_input_param_t *inp, void *model_in, pfmlib_output_param_t *outp, void *model_out) { pfmlib_cell_input_param_t *mod_in = (pfmlib_cell_input_param_t *)model_in; pfmlib_cell_input_param_t default_model_in; int i; if (model_in) { mod_in = (pfmlib_cell_input_param_t *)model_in; } else { mod_in = &default_model_in; mod_in->control = 0x80000000; mod_in->interval = 0; mod_in->triggers = 0; for (i = 0; i < PMU_CELL_NUM_COUNTERS; i++) { mod_in->pfp_cell_counters[i].pmX_control_num = 0; mod_in->pfp_cell_counters[i].spe_subunit = 0; mod_in->pfp_cell_counters[i].polarity = 1; mod_in->pfp_cell_counters[i].input_control = 0; mod_in->pfp_cell_counters[i].cnt_mask = 0; mod_in->pfp_cell_counters[i].flags = 0; } } return pfm_cell_dispatch_counters(inp, mod_in, outp); } static int pfm_cell_get_event_code(unsigned int i, unsigned int cnt, int *code) { // if (cnt != PFMLIB_CNT_FIRST && cnt > 2) { if (cnt != PFMLIB_CNT_FIRST && cnt > cell_support.num_cnt) { return PFMLIB_ERR_INVAL; } *code = cell_pe[i].pme_code; return PFMLIB_SUCCESS; } static void pfm_cell_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { unsigned int i; memset(counters, 0, sizeof(*counters)); for(i=0; i < PMU_CELL_NUM_COUNTERS; i++) { pfm_regmask_set(counters, i); } } static void pfm_cell_get_impl_pmcs(pfmlib_regmask_t *impl_pmcs) { unsigned int i; memset(impl_pmcs, 0, sizeof(*impl_pmcs)); for(i=0; i < PFM_CELL_NUM_PMCS; i++) { pfm_regmask_set(impl_pmcs, i); } } static void pfm_cell_get_impl_pmds(pfmlib_regmask_t *impl_pmds) { unsigned int i; memset(impl_pmds, 0, sizeof(*impl_pmds)); for(i=0; i < PMU_CELL_NUM_PERFCTR; i++) { pfm_regmask_set(impl_pmds, i); } } static void pfm_cell_get_impl_counters(pfmlib_regmask_t *impl_counters) { unsigned int i; for(i=0; i < PMU_CELL_NUM_COUNTERS; i++) { pfm_regmask_set(impl_counters, i); } } static char* pfm_cell_get_event_name(unsigned int i) { return cell_pe[i].pme_name; } static int pfm_cell_get_event_desc(unsigned int ev, char **str) { char *s; s = cell_pe[ev].pme_desc; if (s) { *str = strdup(s); } else { *str = NULL; } return PFMLIB_SUCCESS; } static int pfm_cell_get_cycle_event(pfmlib_event_t *e) { int i; for (i = 0; i < PME_CELL_EVENT_COUNT; i++) { if (!strcmp(cell_pe[i].pme_name, "CYCLES")) { e->event = i; return PFMLIB_SUCCESS; } } return PFMLIB_ERR_NOTFOUND; } int pfm_cell_spe_event(unsigned int event_index) { if (event_index >= PME_CELL_EVENT_COUNT) return 0; return is_spe_signal_group(get_signal_type(cell_pe[event_index].pme_code)); } pfm_pmu_support_t cell_support={ .pmu_name = "CELL", .pmu_type = PFMLIB_CELL_PMU, .pme_count = PME_CELL_EVENT_COUNT, .pmc_count = PFM_CELL_NUM_PMCS, .pmd_count = PMU_CELL_NUM_PERFCTR, .num_cnt = PMU_CELL_NUM_COUNTERS, .get_event_code = pfm_cell_get_event_code, .get_event_name = pfm_cell_get_event_name, .get_event_counters = pfm_cell_get_event_counters, .dispatch_events = pfm_cell_dispatch_events, .pmu_detect = pfm_cell_detect, .get_impl_pmcs = pfm_cell_get_impl_pmcs, .get_impl_pmds = pfm_cell_get_impl_pmds, .get_impl_counters = pfm_cell_get_impl_counters, .get_event_desc = pfm_cell_get_event_desc, .get_cycle_event = pfm_cell_get_cycle_event }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_x86.c0000600003276200002170000006362412247131124020361 0ustar ralphundrgrad/* pfmlib_intel_x86.c : common code for Intel X86 processors * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file implements the common code for all Intel X86 processors. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" const pfmlib_attr_desc_t intel_x86_mods[]={ PFM_ATTR_B("k", "monitor at priv level 0"), /* monitor priv level 0 */ PFM_ATTR_B("u", "monitor at priv level 1, 2, 3"), /* monitor priv level 1, 2, 3 */ PFM_ATTR_B("e", "edge level (may require counter-mask >= 1)"), /* edge */ PFM_ATTR_B("i", "invert"), /* invert */ PFM_ATTR_I("c", "counter-mask in range [0-255]"), /* counter-mask */ PFM_ATTR_B("t", "measure any thread"), /* monitor on both threads */ PFM_ATTR_I("ldlat", "load latency threshold (cycles, [3-65535])"), /* load latency threshold */ PFM_ATTR_B("intx", "monitor only inside transactional memory region"), PFM_ATTR_B("intxcp", "do not count occurrences inside aborted transactional memory region"), PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; pfm_intel_x86_config_t pfm_intel_x86_cfg; /* * .byte 0x53 == push ebx. it's universal for 32 and 64 bit * .byte 0x5b == pop ebx. * Some gcc's (4.1.2 on Core2) object to pairing push/pop and ebx in 64 bit mode. * Using the opcode directly avoids this problem. */ static inline void cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, unsigned int *d) { __asm__ __volatile__ (".byte 0x53\n\tcpuid\n\tmovl %%ebx, %%esi\n\t.byte 0x5b" : "=a" (*a), "=S" (*b), "=c" (*c), "=d" (*d) : "a" (op)); } static void pfm_intel_x86_display_reg(void *this, pfmlib_event_desc_t *e) { const intel_x86_entry_t *pe = this_pe(this); pfm_intel_x86_reg_t reg; int i; reg.val = e->codes[0]; /* * handle generic counters */ __pfm_vbprintf("[0x%"PRIx64" event_sel=0x%x umask=0x%x os=%d usr=%d " "en=%d int=%d inv=%d edge=%d cnt_mask=%d", reg.val, reg.sel_event_select, reg.sel_unit_mask, reg.sel_os, reg.sel_usr, reg.sel_en, reg.sel_int, reg.sel_inv, reg.sel_edge, reg.sel_cnt_mask); if (pe[e->event].modmsk & _INTEL_X86_ATTR_T) __pfm_vbprintf(" any=%d", reg.sel_anythr); __pfm_vbprintf("]", e->fstr); for (i = 1 ; i < e->count; i++) __pfm_vbprintf(" [0x%"PRIx64"]", e->codes[i]); __pfm_vbprintf(" %s\n", e->fstr); } /* * number of HW modifiers */ static int intel_x86_num_mods(void *this, int idx) { const intel_x86_entry_t *pe = this_pe(this); unsigned int mask; mask = pe[idx].modmsk; return pfmlib_popcnt(mask); } int intel_x86_attr2mod(void *this, int pidx, int attr_idx) { const intel_x86_entry_t *pe = this_pe(this); size_t x; int n, numasks; numasks = intel_x86_num_umasks(this, pidx); n = attr_idx - numasks; pfmlib_for_each_bit(x, pe[pidx].modmsk) { if (n == 0) break; n--; } return x; } /* * detect processor model using cpuid() * based on documentation * http://www.intel.com/Assets/PDF/appnote/241618.pdf */ int pfm_intel_x86_detect(void) { unsigned int a, b, c, d; char buffer[64]; if (pfm_intel_x86_cfg.family) return PFM_SUCCESS; cpuid(0, &a, &b, &c, &d); strncpy(&buffer[0], (char *)(&b), 4); strncpy(&buffer[4], (char *)(&d), 4); strncpy(&buffer[8], (char *)(&c), 4); buffer[12] = '\0'; /* must be Intel */ if (strcmp(buffer, "GenuineIntel")) return PFM_ERR_NOTSUPP; cpuid(1, &a, &b, &c, &d); pfm_intel_x86_cfg.family = (a >> 8) & 0xf; // bits 11 - 8 pfm_intel_x86_cfg.model = (a >> 4) & 0xf; // Bits 7 - 4 pfm_intel_x86_cfg.stepping = a & 0xf; // Bits 0 - 3 /* extended family */ if (pfm_intel_x86_cfg.family == 0xf) pfm_intel_x86_cfg.family += (a >> 20) & 0xff; /* extended model */ if (pfm_intel_x86_cfg.family >= 0x6) pfm_intel_x86_cfg.model += ((a >> 16) & 0xf) << 4; return PFM_SUCCESS; } int pfm_intel_x86_add_defaults(void *this, pfmlib_event_desc_t *e, unsigned int msk, uint64_t *umask, unsigned int max_grpid) { const intel_x86_entry_t *pe = this_pe(this); const intel_x86_entry_t *ent; unsigned int i; int j, k, added, skip; int idx; k = e->nattrs; ent = pe+e->event; for(i=0; msk; msk >>=1, i++) { if (!(msk & 0x1)) continue; added = skip = 0; /* * must scan list of possible attributes * (not all possible attributes) */ for (j = 0; j < e->npattrs; j++) { if (e->pattrs[j].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[j].type != PFM_ATTR_UMASK) continue; idx = e->pattrs[j].idx; if (ent->umasks[idx].grpid != i) continue; if (max_grpid != INTEL_X86_MAX_GRPID && i > max_grpid) { skip = 1; continue; } /* umask is default for group */ if (intel_x86_uflag(this, e->event, idx, INTEL_X86_DFL)) { DPRINT("added default %s for group %d j=%d idx=%d\n", ent->umasks[idx].uname, i, j, idx); /* * default could be an alias, but * ucode must reflect actual code */ *umask |= ent->umasks[idx].ucode >> 8; e->attrs[k].id = j; /* pattrs index */ e->attrs[k].ival = 0; k++; added++; if (intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) goto done; if (intel_x86_uflag(this, e->event, idx, INTEL_X86_EXCL_GRP_GT)) { if (max_grpid != INTEL_X86_MAX_GRPID) { DPRINT("two max_grpid, old=%d new=%d\n", max_grpid, ent->umasks[idx].grpid); return PFM_ERR_UMASK; } max_grpid = ent->umasks[idx].grpid; } } } if (!added && !skip) { DPRINT("no default found for event %s unit mask group %d (max_grpid=%d)\n", ent->name, i, max_grpid); return PFM_ERR_UMASK; } } DPRINT("max_grpid=%d nattrs=%d k=%d\n", max_grpid, e->nattrs, k); done: e->nattrs = k; return PFM_SUCCESS; } static int intel_x86_check_pebs(void *this, pfmlib_event_desc_t *e) { const intel_x86_entry_t *pe = this_pe(this); pfm_event_attr_info_t *a; int numasks = 0, pebs = 0; int i; #if 1 if (1) // !intel_x86_requesting_pebs(e)) return PFM_SUCCESS; #endif /* * if event has no umask and is PEBS, then we are okay */ if (!pe[e->event].numasks && intel_x86_eflag(this, e->event, INTEL_X86_PEBS)) return PFM_SUCCESS; /* * if the event sets PEBS, then it measn at least one umask * supports PEBS, so we need to check */ for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { /* count number of umasks */ numasks++; /* and those that support PEBS */ if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_PEBS)) pebs++; } } /* * pass if user requested only PEBS umasks */ return pebs != numasks ? PFM_ERR_FEATCOMB : PFM_SUCCESS; } static int intel_x86_check_max_grpid(void *this, pfmlib_event_desc_t *e, int max_grpid) { const intel_x86_entry_t *pe; pfm_event_attr_info_t *a; int i, grpid; DPRINT("check: max_grpid=%d\n", max_grpid); pe = this_pe(this); for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { grpid = pe[e->event].umasks[a->idx].grpid; if (grpid > max_grpid) return PFM_ERR_FEATCOMB; } } return PFM_SUCCESS; } static int pfm_intel_x86_encode_gen(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; pfm_event_attr_info_t *a; const intel_x86_entry_t *pe; pfm_intel_x86_reg_t reg; unsigned int grpmsk, ugrpmsk = 0; uint64_t umask1, umask2, ucode, last_ucode = ~0ULL; unsigned int modhw = 0; unsigned int plmmsk = 0; int umodmsk = 0, modmsk_r = 0; int k, ret, id; unsigned int max_grpid = INTEL_X86_MAX_GRPID; unsigned int last_grpid = INTEL_X86_MAX_GRPID; unsigned int grpid; int ldlat = 0, ldlat_um = 0; int grpcounts[INTEL_X86_NUM_GRP]; int ncombo[INTEL_X86_NUM_GRP]; memset(grpcounts, 0, sizeof(grpcounts)); memset(ncombo, 0, sizeof(ncombo)); pe = this_pe(this); e->fstr[0] = '\0'; /* * preset certain fields from event code * including modifiers */ reg.val = pe[e->event].code; grpmsk = (1 << pe[e->event].ngrp)-1; /* take into account hardcoded umask */ umask1 = (reg.val >> 8) & 0xff; umask2 = 0; modmsk_r = pe[e->event].modmsk_req; for (k = 0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { grpid = pe[e->event].umasks[a->idx].grpid; /* * certain event groups are meant to be * exclusive, i.e., only unit masks of one group * can be used */ if (last_grpid != INTEL_X86_MAX_GRPID && grpid != last_grpid && intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) { DPRINT("exclusive unit mask group error\n"); return PFM_ERR_FEATCOMB; } /* * selecting certain umasks in a group may exclude any umasks * from any groups with a higher index * * enforcement requires looking at the grpid of all the umasks */ if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_EXCL_GRP_GT)) max_grpid = grpid; /* * upper layer has removed duplicates * so if we come here more than once, it is for two * disinct umasks * * NCOMBO=no combination of unit masks within the same * umask group */ ++grpcounts[grpid]; /* mark that we have a umask with NCOMBO in this group */ if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_NCOMBO)) ncombo[grpid] = 1; if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_LDLAT)) ldlat_um = 1; /* * if more than one umask in this group but one is marked * with ncombo, then fail. It is okay to combine umask within * a group as long as none is tagged with NCOMBO */ if (grpcounts[grpid] > 1 && ncombo[grpid]) { DPRINT("umask %s does not support unit mask combination within group %d\n", pe[e->event].umasks[a->idx].uname, grpid); return PFM_ERR_FEATCOMB; } last_grpid = grpid; ucode = pe[e->event].umasks[a->idx].ucode; modhw |= pe[e->event].umasks[a->idx].modhw; umask2 |= ucode >> 8; ugrpmsk |= 1 << pe[e->event].umasks[a->idx].grpid; modmsk_r |= pe[e->event].umasks[a->idx].umodmsk_req; if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_CODE_OVERRIDE)) { if (last_ucode != ~0ULL && (ucode & 0xff) != last_ucode) { DPRINT("cannot override event with two different codes for %s\n", pe[e->event].name); return PFM_ERR_FEATCOMB; } last_ucode = ucode & 0xff; reg.sel_event_select = last_ucode; } } else if (a->type == PFM_ATTR_RAW_UMASK) { /* there can only be one RAW_UMASK per event */ /* sanity check */ if (a->idx & ~0xff) { DPRINT("raw umask is 8-bit wide\n"); return PFM_ERR_ATTR; } /* override umask */ umask2 = a->idx & 0xff; ugrpmsk = grpmsk; } else { uint64_t ival = e->attrs[k].ival; switch(a->idx) { case INTEL_X86_ATTR_I: /* invert */ if (modhw & _INTEL_X86_ATTR_I) return PFM_ERR_ATTR_SET; reg.sel_inv = !!ival; umodmsk |= _INTEL_X86_ATTR_I; break; case INTEL_X86_ATTR_E: /* edge */ if (modhw & _INTEL_X86_ATTR_E) return PFM_ERR_ATTR_SET; reg.sel_edge = !!ival; umodmsk |= _INTEL_X86_ATTR_E; break; case INTEL_X86_ATTR_C: /* counter-mask */ if (modhw & _INTEL_X86_ATTR_C) return PFM_ERR_ATTR_SET; if (ival > 255) return PFM_ERR_ATTR_VAL; reg.sel_cnt_mask = ival; umodmsk |= _INTEL_X86_ATTR_C; break; case INTEL_X86_ATTR_U: /* USR */ if (modhw & _INTEL_X86_ATTR_U) return PFM_ERR_ATTR_SET; reg.sel_usr = !!ival; plmmsk |= _INTEL_X86_ATTR_U; umodmsk |= _INTEL_X86_ATTR_U; break; case INTEL_X86_ATTR_K: /* OS */ if (modhw & _INTEL_X86_ATTR_K) return PFM_ERR_ATTR_SET; reg.sel_os = !!ival; plmmsk |= _INTEL_X86_ATTR_K; umodmsk |= _INTEL_X86_ATTR_K; break; case INTEL_X86_ATTR_T: /* anythread (v3 and above) */ if (modhw & _INTEL_X86_ATTR_T) return PFM_ERR_ATTR_SET; reg.sel_anythr = !!ival; umodmsk |= _INTEL_X86_ATTR_T; break; case INTEL_X86_ATTR_LDLAT: /* load latency */ if (ival < 3 || ival > 65535) return PFM_ERR_ATTR_VAL; ldlat = ival; break; case INTEL_X86_ATTR_INTX: /* in_tx */ if (modhw & _INTEL_X86_ATTR_INTX) return PFM_ERR_ATTR_SET; reg.sel_intx = !!ival; umodmsk |= _INTEL_X86_ATTR_INTX; break; case INTEL_X86_ATTR_INTXCP: /* in_tx_cp */ if (modhw & _INTEL_X86_ATTR_INTXCP) return PFM_ERR_ATTR_SET; reg.sel_intxcp = !!ival; umodmsk |= _INTEL_X86_ATTR_INTXCP; break; } } } /* * handle case where no priv level mask was passed. * then we use the dfl_plm */ if (!(plmmsk & (_INTEL_X86_ATTR_K|_INTEL_X86_ATTR_U))) { if ((e->dfl_plm & PFM_PLM0) && (pmu->supported_plm & PFM_PLM0)) reg.sel_os = 1; if ((e->dfl_plm & PFM_PLM3) && (pmu->supported_plm & PFM_PLM3)) reg.sel_usr = 1; } /* * check that there is at least of unit mask in each unit * mask group */ if ((ugrpmsk != grpmsk && !intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) || ugrpmsk == 0) { ugrpmsk ^= grpmsk; ret = pfm_intel_x86_add_defaults(this, e, ugrpmsk, &umask2, max_grpid); if (ret != PFM_SUCCESS) return ret; } ret = intel_x86_check_pebs(this, e); if (ret != PFM_SUCCESS) return ret; /* * check no umask violates the max_grpid constraint */ if (max_grpid != INTEL_X86_MAX_GRPID) { ret = intel_x86_check_max_grpid(this, e, max_grpid); if (ret != PFM_SUCCESS) { DPRINT("event %s: umask from grp > %d\n", pe[e->event].name, max_grpid); return ret; } } if (modmsk_r && (umodmsk ^ modmsk_r)) { DPRINT("required modifiers missing: 0x%x\n", modmsk_r); return PFM_ERR_ATTR; } /* * reorder all the attributes such that the fstr appears always * the same regardless of how the attributes were submitted. */ evt_strcat(e->fstr, "%s", pe[e->event].name); pfmlib_sort_attr(e); for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", pe[e->event].umasks[a->idx].uname); else if (a->type == PFM_ATTR_RAW_UMASK) evt_strcat(e->fstr, ":0x%x", a->idx); } if (intel_x86_eflag(this, e->event, INTEL_X86_NHM_OFFCORE)) { e->codes[1] = umask2; e->count = 2; umask2 = 0; } else { e->count = 1; } if (ldlat && !ldlat_um) { DPRINT("passed ldlat= but not using ldlat umask\n"); return PFM_ERR_ATTR; } /* * force a default ldlat (will not appear in display_reg) */ if (ldlat_um && !ldlat) { DPRINT("missing ldlat= for umask, forcing to default %d cycles\n", INTEL_X86_LDLAT_DEFAULT); ldlat = INTEL_X86_LDLAT_DEFAULT; } if (ldlat && ldlat_um) { e->codes[1] = ldlat; e->count = 2; } /* take into account hardcoded modifiers, so use or on reg.val */ reg.val |= (umask1 | umask2) << 8; reg.sel_en = 1; /* force enable bit to 1 */ reg.sel_int = 1; /* force APIC int to 1 */ e->codes[0] = reg.val; /* * on recent processors (except Atom), edge requires cmask >=1 */ if ((pmu->flags & INTEL_X86_PMU_FL_ECMASK) && reg.sel_edge && !reg.sel_cnt_mask) { DPRINT("edge requires cmask >= 1\n"); return PFM_ERR_ATTR; } /* * decode ALL modifiers */ for (k = 0; k < e->npattrs; k++) { if (e->pattrs[k].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[k].type == PFM_ATTR_UMASK) continue; id = e->pattrs[k].idx; switch(id) { case INTEL_X86_ATTR_U: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_usr); break; case INTEL_X86_ATTR_K: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_os); break; case INTEL_X86_ATTR_E: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_edge); break; case INTEL_X86_ATTR_I: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_inv); break; case INTEL_X86_ATTR_C: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_cnt_mask); break; case INTEL_X86_ATTR_T: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_anythr); break; case INTEL_X86_ATTR_LDLAT: evt_strcat(e->fstr, ":%s=%d", intel_x86_mods[id].name, ldlat); break; case INTEL_X86_ATTR_INTX: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_intx); break; case INTEL_X86_ATTR_INTXCP: evt_strcat(e->fstr, ":%s=%lu", intel_x86_mods[id].name, reg.sel_intxcp); break; } } return PFM_SUCCESS; } int pfm_intel_x86_get_encoding(void *this, pfmlib_event_desc_t *e) { int ret; ret = pfm_intel_x86_encode_gen(this, e); if (ret != PFM_SUCCESS) return ret; pfm_intel_x86_display_reg(this, e); return PFM_SUCCESS; } int pfm_intel_x86_get_event_first(void *this) { pfmlib_pmu_t *p = this; return p->pme_count ? 0 : -1; } int pfm_intel_x86_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } int pfm_intel_x86_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } int pfm_intel_x86_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const intel_x86_entry_t *pe = this_pe(this); int ndfl[INTEL_X86_NUM_GRP]; int i, j, error = 0; unsigned int u, v; int npebs; if (!pmu->atdesc) { fprintf(fp, "pmu: %s missing attr_desc\n", pmu->name); error++; } if (!pmu->supported_plm && pmu->type == PFM_PMU_TYPE_CORE) { fprintf(fp, "pmu: %s supported_plm not set\n", pmu->name); error++; } for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", pmu->name, i, pe[i].name); error++; } if (!pe[i].cntmsk) { fprintf(fp, "pmu: %s event%d: %s :: cntmsk=0\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks && pe[i].ngrp == 0) { fprintf(fp, "pmu: %s event%d: %s :: ngrp cannot be zero\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks && pe[i].umasks == NULL) { fprintf(fp, "pmu: %s event%d: %s :: numasks but no umasks\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks == 0 && pe[i].umasks) { fprintf(fp, "pmu: %s event%d: %s :: numasks=0 but umasks defined\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks == 0 && pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s :: ngrp must be zero\n", pmu->name, i, pe[i].name); error++; } if (pe[i].ngrp >= INTEL_X86_NUM_GRP) { fprintf(fp, "pmu: %s event%d: %s :: ngrp too big (max=%d)\n", pmu->name, i, pe[i].name, INTEL_X86_NUM_GRP); error++; } for (j=i+1; j < (int)pmu->pme_count; j++) { if (pe[i].code == pe[j].code && !(pe[j].equiv || pe[i].equiv) && pe[j].cntmsk == pe[i].cntmsk) { fprintf(fp, "pmu: %s events %s and %s have the same code 0x%x\n", pmu->name, pe[i].name, pe[j].name, pe[i].code); error++; } } for(j=0; j < INTEL_X86_NUM_GRP; j++) ndfl[j] = 0; for(j=0, npebs = 0; j < (int)pe[i].numasks; j++) { if (!pe[i].umasks[j].uname) { fprintf(fp, "pmu: %s event%d: %s umask%d :: no name\n", pmu->name, i, pe[i].name, j); error++; } if (pe[i].umasks[j].modhw && (pe[i].umasks[j].modhw | pe[i].modmsk) != pe[i].modmsk) { fprintf(fp, "pmu: %s event%d: %s umask%d: %s :: modhw not subset of modmsk\n", pmu->name, i, pe[i].name, j, pe[i].umasks[j].uname); error++; } if (!pe[i].umasks[j].udesc) { fprintf(fp, "pmu: %s event%d: umask%d: %s :: no description\n", pmu->name, i, j, pe[i].umasks[j].uname); error++; } if (pe[i].ngrp && pe[i].umasks[j].grpid >= pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s umask%d: %s :: invalid grpid %d (must be < %d)\n", pmu->name, i, pe[i].name, j, pe[i].umasks[j].uname, pe[i].umasks[j].grpid, pe[i].ngrp); error++; } if (pe[i].umasks[j].uflags & INTEL_X86_DFL) ndfl[pe[i].umasks[j].grpid]++; if (pe[i].umasks[j].uflags & INTEL_X86_PEBS) npebs++; } if (npebs && !intel_x86_eflag(this, i, INTEL_X86_PEBS)) { fprintf(fp, "pmu: %s event%d: %s, pebs umasks but event pebs flag not set\n", pmu->name, i, pe[i].name); error++; } if (intel_x86_eflag(this, i, INTEL_X86_PEBS) && pe[i].numasks && npebs == 0) { fprintf(fp, "pmu: %s event%d: %s, pebs event flag but not umask has pebs flag\n", pmu->name, i, pe[i].name); error++; } /* if only one umask, then ought to be default */ if (pe[i].numasks == 1 && !(pe[i].umasks[0].uflags & INTEL_X86_DFL)) { fprintf(fp, "pmu: %s event%d: %s, only one umask but no default\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks) { unsigned int *dfl_model = malloc(sizeof(*dfl_model) * pe[i].numasks); if (!dfl_model) goto skip_dfl; for(u=0; u < pe[i].ngrp; u++) { int l = 0, m; for (v = 0; v < pe[i].numasks; v++) { if (pe[i].umasks[v].grpid != u) continue; if (pe[i].umasks[v].uflags & INTEL_X86_DFL) { for (m = 0; m < l; m++) { if (dfl_model[m] == pe[i].umasks[v].umodel || dfl_model[m] == 0) { fprintf(fp, "pmu: %s event%d: %s grpid %d has 2 default umasks\n", pmu->name, i, pe[i].name, u); error++; } } if (m == l) dfl_model[l++] = pe[i].umasks[v].umodel; } } } free(dfl_model); } skip_dfl: if (pe[i].flags & INTEL_X86_NCOMBO) { fprintf(fp, "pmu: %s event%d: %s :: NCOMBO is unit mask only flag\n", pmu->name, i, pe[i].name); error++; } for(u=0; u < pe[i].numasks; u++) { if (pe[i].umasks[u].uequiv) continue; if (pe[i].umasks[u].uflags & INTEL_X86_NCOMBO) continue; for(v=j+1; v < pe[i].numasks; v++) { if (pe[i].umasks[v].uequiv) continue; if (pe[i].umasks[v].uflags & INTEL_X86_NCOMBO) continue; if (pe[i].umasks[v].grpid != pe[i].umasks[u].grpid) continue; if ((pe[i].umasks[u].ucode & pe[i].umasks[v].ucode) && pe[i].umasks[u].umodel == pe[i].umasks[v].umodel) { fprintf(fp, "pmu: %s event%d: %s :: umask %s and %s have overlapping code bits\n", pmu->name, i, pe[i].name, pe[i].umasks[u].uname, pe[i].umasks[v].uname); error++; } } } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } int pfm_intel_x86_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { const intel_x86_entry_t *pe = this_pe(this); const pfmlib_attr_desc_t *atdesc = this_atdesc(this); int numasks, idx; numasks = intel_x86_num_umasks(this, pidx); if (attr_idx < numasks) { idx = intel_x86_attr2umask(this, pidx, attr_idx); info->name = pe[pidx].umasks[idx].uname; info->desc = pe[pidx].umasks[idx].udesc; info->equiv= pe[pidx].umasks[idx].uequiv; info->code = pe[pidx].umasks[idx].ucode; if (!intel_x86_uflag(this, pidx, idx, INTEL_X86_CODE_OVERRIDE)) info->code >>= 8; info->type = PFM_ATTR_UMASK; info->is_dfl = intel_x86_uflag(this, pidx, idx, INTEL_X86_DFL); info->is_precise = intel_x86_uflag(this, pidx, idx, INTEL_X86_PEBS); } else { idx = intel_x86_attr2mod(this, pidx, attr_idx); info->name = atdesc[idx].name; info->desc = atdesc[idx].desc; info->type = atdesc[idx].type; info->equiv= NULL; info->code = idx; info->is_dfl = 0; info->is_precise = 0; } info->ctrl = PFM_ATTR_CTRL_PMU; info->idx = idx; /* namespace specific index */ info->dfl_val64 = 0; return PFM_SUCCESS; } int pfm_intel_x86_get_event_info(void *this, int idx, pfm_event_info_t *info) { const intel_x86_entry_t *pe = this_pe(this); pfmlib_pmu_t *pmu = this; info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].code; info->equiv = pe[idx].equiv; info->idx = idx; /* private index */ info->pmu = pmu->pmu; /* * no umask: event supports PEBS * with umasks: at least one umask supports PEBS */ info->is_precise = intel_x86_eflag(this, idx, INTEL_X86_PEBS); info->nattrs = intel_x86_num_umasks(this, idx); info->nattrs += intel_x86_num_mods(this, idx); return PFM_SUCCESS; } int pfm_intel_x86_valid_pebs(pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; int i, npebs = 0, numasks = 0; /* first check at the event level */ if (intel_x86_eflag(e->pmu, e->event, INTEL_X86_PEBS)) return PFM_SUCCESS; /* * next check the umasks * * we do not assume we are calling after * pfm_intel_x86_ge_event_encoding(), therefore * we check the unit masks again. * They must all be PEBS-capable. */ for(i=0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU || a->type != PFM_ATTR_UMASK) continue; numasks++; if (intel_x86_uflag(e->pmu, e->event, a->idx, INTEL_X86_PEBS)) npebs++; } return npebs == numasks ? PFM_SUCCESS : PFM_ERR_FEATCOMB; } unsigned int pfm_intel_x86_get_event_nattrs(void *this, int pidx) { unsigned int nattrs; nattrs = intel_x86_num_umasks(this, pidx); nattrs += intel_x86_num_mods(this, pidx); return nattrs; } int pfm_intel_x86_can_auto_encode(void *this, int pidx, int uidx) { int numasks; if (intel_x86_eflag(this, pidx, INTEL_X86_NO_AUTOENCODE)) return 0; numasks = intel_x86_num_umasks(this, pidx); if (uidx >= numasks) return 0; return !intel_x86_uflag(this, pidx, uidx, INTEL_X86_NO_AUTOENCODE); } papi-5.3.0/src/libpfm4/lib/pfmlib_arm_priv.h0000600003276200002170000000652112247131124020356 0ustar ralphundrgrad/* * Copyright (c) 2010 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_ARM_PRIV_H__ #define __PFMLIB_ARM_PRIV_H__ /* * This file contains the definitions used for ARM processors */ /* * event description */ typedef struct { const char *name; /* event name */ const char *desc; /* event description */ unsigned int code; /* event code */ unsigned int modmsk; /* modifiers bitmask */ } arm_entry_t; typedef union pfm_arm_reg { unsigned int val; /* complete register value */ struct { unsigned int sel:8; unsigned int reserved1:19; unsigned int excl_hyp:1; unsigned int reserved2:2; unsigned int excl_pl1:1; unsigned int excl_usr:1; } evtsel; } pfm_arm_reg_t; typedef struct { int implementer; int architecture; int part; } pfm_arm_config_t; extern pfm_arm_config_t pfm_arm_cfg; extern int pfm_arm_detect(void *this); extern int pfm_arm_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_arm_get_event_first(void *this); extern int pfm_arm_get_event_next(void *this, int idx); extern int pfm_arm_event_is_valid(void *this, int pidx); extern int pfm_arm_validate_table(void *this, FILE *fp); extern int pfm_arm_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info); extern int pfm_arm_get_event_info(void *this, int idx, pfm_event_info_t *info); extern unsigned int pfm_arm_get_event_nattrs(void *this, int pidx); extern void pfm_arm_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_arm_get_perf_encoding(void *this, pfmlib_event_desc_t *e); #define ARM_ATTR_K 0 /* pl1 priv level */ #define ARM_ATTR_U 1 /* user priv level */ #define ARM_ATTR_HV 2 /* hypervisor priv level */ #define _ARM_ATTR_K (1 << ARM_ATTR_K) #define _ARM_ATTR_U (1 << ARM_ATTR_U) #define _ARM_ATTR_HV (1 << ARM_ATTR_HV) #define ARM_ATTR_PLM_ALL (_ARM_ATTR_K|_ARM_ATTR_U|_ARM_ATTR_HV) #define ARMV7_A15_ATTRS (_ARM_ATTR_K|_ARM_ATTR_U|_ARM_ATTR_HV) #define ARMV7_A15_PLM (PFM_PLM0|PFM_PLM3|PFM_PLMH) static inline int arm_has_plm(void *this, pfmlib_event_desc_t *e) { const arm_entry_t *pe = this_pe(this); return pe[e->event].modmsk & ARM_ATTR_PLM_ALL; } #endif /* __PFMLIB_ARM_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_sicortex.c0000600003276200002170000005051712247131124020376 0ustar ralphundrgrad/* * pfmlib_sicortex.c : support for the generic MIPS64 PMU family * * Contributed by Philip Mucci based on code from * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include /* public headers */ #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_sicortex_priv.h" /* architecture private */ #include "sicortex/ice9a/ice9a_all_spec_pme.h" #include "sicortex/ice9b/ice9b_all_spec_pme.h" #include "sicortex/ice9/ice9_scb_spec_sw.h" /* let's define some handy shortcuts! */ #define sel_event_mask perfsel.sel_event_mask #define sel_exl perfsel.sel_exl #define sel_os perfsel.sel_os #define sel_usr perfsel.sel_usr #define sel_sup perfsel.sel_sup #define sel_int perfsel.sel_int static pme_sicortex_entry_t *sicortex_pe = NULL; // CHANGE FOR ICET #define core_counters 2 #define MAX_ICE9_PMCS 2+4+256 #define MAX_ICE9_PMDS 2+4+256 static int compute_ice9_counters(int type) { int i; int bound = 0; pme_gen_mips64_entry_t *gen_mips64_pe = NULL; sicortex_support.pmd_count = 0; sicortex_support.pmc_count = 0; for (i=0;i 2) { /* Account for 4 sampling PMD registers */ sicortex_support.num_cnt = sicortex_support.pmd_count - 4; sicortex_support.pme_count = bound; } else { sicortex_support.pme_count = 0; /* Count up CPU only events */ for (i=0;i> (cntr*8)) & 0xff; pc[j].reg_addr = cntr*2; pc[j].reg_value = reg.val; pc[j].reg_num = cntr; __pfm_vbprintf("[CP0_25_%u(pmc%u)=0x%"PRIx64" event_mask=0x%x usr=%d os=%d sup=%d exl=%d int=1] %s\n", pc[j].reg_addr, pc[j].reg_num, pc[j].reg_value, reg.sel_event_mask, reg.sel_usr, reg.sel_os, reg.sel_sup, reg.sel_exl, sicortex_pe[e[j].event].pme_name); pd[j].reg_num = cntr; pd[j].reg_addr = cntr*2 + 1; __pfm_vbprintf("[CP0_25_%u(pmd%u)]\n", pc[j].reg_addr, pc[j].reg_num); } /* SCB event */ else { pmc_sicortex_scb_reg_t scbreg; int k; scbreg.val = 0; scbreg.sicortex_ScbPerfBucket_reg.event = sicortex_pe[e[j].event].pme_code >> 16; for (k=0;kflags & PFMLIB_SICORTEX_INPUT_SCB_INTERVAL)) { two.sicortex_ScbPerfCtl_reg.Interval = mod_in->pfp_sicortex_scb_global.Interval; } else { two.sicortex_ScbPerfCtl_reg.Interval = 6; /* 2048 cycles */ } if (mod_in && (mod_in->flags & PFMLIB_SICORTEX_INPUT_SCB_NOINC)) { two.sicortex_ScbPerfCtl_reg.NoInc = mod_in->pfp_sicortex_scb_global.NoInc; } else { two.sicortex_ScbPerfCtl_reg.NoInc = 0; } two.sicortex_ScbPerfCtl_reg.IntBit = 31; /* Interrupt on last bit */ two.sicortex_ScbPerfCtl_reg.MagicEvent = 0; two.sicortex_ScbPerfCtl_reg.AddrAssert = 1; __pfm_vbprintf("[Scb%s(pmc%u)=0x%"PRIx64" Interval=0x%x IntBit=0x%x NoInc=%d AddrAssert=%d MagicEvent=0x%x]\n","PerfCtl", pc[num].reg_num, two.val, two.sicortex_ScbPerfCtl_reg.Interval, two.sicortex_ScbPerfCtl_reg.IntBit, two.sicortex_ScbPerfCtl_reg.NoInc, two.sicortex_ScbPerfCtl_reg.AddrAssert, two.sicortex_ScbPerfCtl_reg.MagicEvent); pc[num].reg_value = two.val; /*ScbPerfHist */ pc[++num].reg_num = 3; pc[num].reg_addr = 3; three.val = 0; if (mod_in && (mod_in->flags & PFMLIB_SICORTEX_INPUT_SCB_HISTGTE)) three.sicortex_ScbPerfHist_reg.HistGte = mod_in->pfp_sicortex_scb_global.HistGte; else three.sicortex_ScbPerfHist_reg.HistGte = 1; __pfm_vbprintf("[Scb%s(pmc%u)=0x%"PRIx64" HistGte=0x%x]\n","PerfHist", pc[num].reg_num, three.val, three.sicortex_ScbPerfHist_reg.HistGte); pc[num].reg_value = three.val; /*ScbPerfBuckNum */ pc[++num].reg_num = 4; pc[num].reg_addr = 4; four.val = 0; if (mod_in && (mod_in->flags & PFMLIB_SICORTEX_INPUT_SCB_BUCKET)) four.sicortex_ScbPerfBuckNum_reg.Bucket = mod_in->pfp_sicortex_scb_global.Bucket; else four.sicortex_ScbPerfBuckNum_reg.Bucket = 0; __pfm_vbprintf("[Scb%s(pmc%u)=0x%"PRIx64" Bucket=0x%x]\n","PerfBuckNum", pc[num].reg_num, four.val, four.sicortex_ScbPerfBuckNum_reg.Bucket); pc[num].reg_value = four.val; /*ScbPerfEna */ pc[++num].reg_num = 5; pc[num].reg_addr = 5; five.val = 0; five.sicortex_ScbPerfEna_reg.ena = 1; __pfm_vbprintf("[Scb%s(pmc%u)=0x%"PRIx64" ena=%d]\n","PerfEna", pc[num].reg_num, five.val, five.sicortex_ScbPerfEna_reg.ena); pc[num].reg_value = five.val; ++num; return(num); } /* * Automatically dispatch events to corresponding counters following constraints. * Upon return the pfarg_regt structure is ready to be submitted to kernel */ static int pfm_sicortex_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_sicortex_input_param_t *mod_in, pfmlib_output_param_t *outp) { /* pfmlib_sicortex_input_param_t *param = mod_in; */ pfmlib_event_t *e = inp->pfp_events; pfmlib_reg_t *pc, *pd; unsigned int i, j, cnt = inp->pfp_event_count; unsigned int used = 0; extern pfm_pmu_support_t sicortex_support; unsigned int cntr, avail; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; /* Degree N rank based allocation */ if (cnt > sicortex_support.pmc_count) return PFMLIB_ERR_TOOMANY; if (PFMLIB_DEBUG()) { for (j=0; j < cnt; j++) { DPRINT("ev[%d]=%s, counters=0x%x\n", j, sicortex_pe[e[j].event].pme_name,sicortex_pe[e[j].event].pme_counters); } } /* Do rank based allocation, counters that live on 1 reg before counters that live on 2 regs etc. */ /* CPU counters first */ for (i=1;i<=core_counters;i++) { for (j=0; j < cnt;j++) { /* CPU counters first */ if ((sicortex_pe[e[j].event].pme_counters & ((1<pfp_dfl_plm,pc,pd,cntr,j,mod_in); used |= (1 << cntr); DPRINT("Rank %d: Used counters 0x%x\n",i, used); } } } /* SCB counters can live anywhere */ used = 0; for (j=0; j < cnt;j++) { unsigned int cntr; /* CPU counters first */ if (sicortex_pe[e[j].event].pme_counters & (1<pfp_dfl_plm,pc,pd,cntr,j,mod_in); used++; DPRINT("SCB(%d): Used counters %d\n",j,used); } } if (used) { outp->pfp_pmc_count = stuff_sicortex_scb_control_regs(pc,pd,cnt,mod_in); outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } /* number of evtsel registers programmed */ outp->pfp_pmc_count = cnt; outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } static int pfm_sicortex_dispatch_events(pfmlib_input_param_t *inp, void *model_in, pfmlib_output_param_t *outp, void *model_out) { pfmlib_sicortex_input_param_t *mod_sicortex_in = (pfmlib_sicortex_input_param_t *)model_in; return pfm_sicortex_dispatch_counters(inp, mod_sicortex_in, outp); } static int pfm_sicortex_get_event_code(unsigned int i, unsigned int cnt, int *code) { extern pfm_pmu_support_t sicortex_support; /* check validity of counter index */ if (cnt != PFMLIB_CNT_FIRST) { if (cnt < 0 || cnt >= sicortex_support.pmc_count) return PFMLIB_ERR_INVAL; } else { cnt = ffs(sicortex_pe[i].pme_counters)-1; if (cnt == -1) return(PFMLIB_ERR_INVAL); } /* if cnt == 1, shift right by 0, if cnt == 2, shift right by 8 */ /* Works on both 5k anf 20K */ unsigned int tmp = sicortex_pe[i].pme_counters; /* CPU event */ if (tmp & ((1<> (cnt*8)); else return PFMLIB_ERR_INVAL; } /* SCB event */ else { if ((cnt < 6) || (cnt >= sicortex_support.pmc_count)) return PFMLIB_ERR_INVAL; *code = 0xffff & (sicortex_pe[i].pme_code >> 16); } return PFMLIB_SUCCESS; } /* * This function is accessible directly to the user */ int pfm_sicortex_get_event_umask(unsigned int i, unsigned long *umask) { extern pfm_pmu_support_t sicortex_support; if (i >= sicortex_support.pme_count || umask == NULL) return PFMLIB_ERR_INVAL; *umask = 0; //evt_umask(i); return PFMLIB_SUCCESS; } static void pfm_sicortex_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { extern pfm_pmu_support_t sicortex_support; unsigned int tmp; memset(counters, 0, sizeof(*counters)); tmp = sicortex_pe[j].pme_counters; /* CPU counter */ if (tmp & ((1< core_counters) { /* counting pmds are not contiguous on ICE9*/ for(i=6; i < sicortex_support.pmd_count; i++) pfm_regmask_set(impl_counters, i); } } static void pfm_sicortex_get_hw_counter_width(unsigned int *width) { *width = PMU_GEN_MIPS64_COUNTER_WIDTH; } static char * pfm_sicortex_get_event_name(unsigned int i) { return sicortex_pe[i].pme_name; } static int pfm_sicortex_get_event_description(unsigned int ev, char **str) { char *s; s = sicortex_pe[ev].pme_desc; if (s) { *str = strdup(s); } else { *str = NULL; } return PFMLIB_SUCCESS; } static int pfm_sicortex_get_cycle_event(pfmlib_event_t *e) { return pfm_find_full_event("CPU_CYCLES",e); } static int pfm_sicortex_get_inst_retired(pfmlib_event_t *e) { return pfm_find_full_event("CPU_INSEXEC",e); } /* SiCortex specific functions */ /* CPU counter */ int pfm_sicortex_is_cpu(unsigned int i) { if (i < sicortex_support.pme_count) { unsigned int tmp = sicortex_pe[i].pme_counters; return !(tmp & (1< * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_ha_events.h" static void display_ha(void *this, pfmlib_event_desc_t *e, void *val) { const intel_x86_entry_t *pe = this_pe(this); pfm_snbep_unc_reg_t *reg = val; pfm_snbep_unc_reg_t f; __pfm_vbprintf("[UNC_HA=0x%"PRIx64" event=0x%x umask=0x%x en=%d " "inv=%d edge=%d thres=%d] %s\n", reg->val, reg->com.unc_event, reg->com.unc_umask, reg->com.unc_en, reg->com.unc_inv, reg->com.unc_edge, reg->com.unc_thres, pe[e->event].name); if (e->count == 1) return; f.val = e->codes[1]; __pfm_vbprintf("[UNC_HA_ADDR=0x%"PRIx64" lo_addr=0x%x hi_addr=0x%x]\n", f.val, f.ha_addr.lo_addr, f.ha_addr.hi_addr); f.val = e->codes[2]; __pfm_vbprintf("[UNC_HA_OPC=0x%"PRIx64" opc=0x%x]\n", f.val, f.ha_opc.opc); } pfmlib_pmu_t intel_snbep_unc_ha_support = { .desc = "Intel Sandy Bridge-EP HA uncore", .name = "snbep_unc_ha", .perf_name = "uncore_ha", .pmu = PFM_PMU_INTEL_SNBEP_UNC_HA, .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_h_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 4, .num_fixed_cntrs = 0, .max_encoding = 3, /* address matchers */ .pe = intel_snbep_unc_h_pe, .atdesc = snbep_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | PFMLIB_PMU_FL_NO_SMPL, .pmu_detect = pfm_intel_snbep_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .display_reg = display_ha, }; papi-5.3.0/src/libpfm4/lib/pfmlib_arm_armv6.c0000600003276200002170000000476212247131124020431 0ustar ralphundrgrad/* * pfmlib_arm_armv6.c : support for ARMv6 chips * * Copyright (c) 2013 by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_arm_priv.h" #include "events/arm_1176_events.h" /* event tables */ static int pfm_arm_detect_1176(void *this) { int ret; ret = pfm_arm_detect(this); if (ret != PFM_SUCCESS) return PFM_ERR_NOTSUPP; if ((pfm_arm_cfg.implementer == 0x41) && /* ARM */ (pfm_arm_cfg.part==0xb76)) { /* 1176 */ return PFM_SUCCESS; } return PFM_ERR_NOTSUPP; } /* ARM1176 support */ pfmlib_pmu_t arm_1176_support={ .desc = "ARM1176", .name = "arm_1176", .pmu = PFM_PMU_ARM_1176, .pme_count = LIBPFM_ARRAY_SIZE(arm_1176_pe), .type = PFM_PMU_TYPE_CORE, .pe = arm_1176_pe, .pmu_detect = pfm_arm_detect_1176, .max_encoding = 1, .num_cntrs = 2, .get_event_encoding[PFM_OS_NONE] = pfm_arm_get_encoding, PFMLIB_ENCODE_PERF(pfm_arm_get_perf_encoding), .get_event_first = pfm_arm_get_event_first, .get_event_next = pfm_arm_get_event_next, .event_is_valid = pfm_arm_event_is_valid, .validate_table = pfm_arm_validate_table, .get_event_info = pfm_arm_get_event_info, .get_event_attr_info = pfm_arm_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_arm_perf_validate_pattrs), .get_event_nattrs = pfm_arm_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/events/0000700003276200002170000000000012247131124016333 5ustar ralphundrgradpapi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_pcu_events.h0000600003276200002170000002230612247131123024112 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: snbep_unc_pcu (Intel SandyBridge-EP PCU uncore) */ static const intel_x86_umask_t snbep_unc_p_power_state_occupancy[]={ { .uname = "CORES_C0", .udesc = "Counts number of cores in C0", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CORES_C3", .udesc = "Counts number of cores in C3", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CORES_C6", .udesc = "Counts number of cores in C6", .ucode = 0xc000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_p_occupancy_counters[]={ { .uname = "C0", .udesc = "Counts number of cores in C0", .ucode = 0x0100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "C3", .udesc = "Counts number of cores in C3", .ucode = 0x0200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "C6", .udesc = "Counts number of cores in C6", .ucode = 0x0300, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_p_pe[]={ { .name = "UNC_P_CLOCKTICKS", .desc = "PCU Uncore clockticks", .modmsk = SNBEP_UNC_PCU_ATTRS, .cntmsk = 0xf, .code = 0x00, }, { .name = "UNC_P_CORE0_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x3 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE1_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x4 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE2_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x5 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE3_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x6 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE4_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x7 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE5_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x8 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE6_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0x9 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_CORE7_TRANSITION_CYCLES", .desc = "Core C State Transition Cycles", .code = 0xa | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE0", .desc = "Core C State Demotions", .code = 0x1e, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE1", .desc = "Core C State Demotions", .code = 0x1f, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE2", .desc = "Core C State Demotions", .code = 0x20, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE3", .desc = "Core C State Demotions", .code = 0x21, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE4", .desc = "Core C State Demotions", .code = 0x22, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE5", .desc = "Core C State Demotions", .code = 0x23, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE6", .desc = "Core C State Demotions", .code = 0x24, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_DEMOTIONS_CORE7", .desc = "Core C State Demotions", .code = 0x25, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_FREQ_BAND0_CYCLES", .desc = "Frequency Residency", .code = 0xb, .cntmsk = 0xf, .flags = INTEL_X86_NO_AUTOENCODE, .modmsk = SNBEP_UNC_PCU_BAND_ATTRS, .modmsk_req = _SNBEP_UNC_ATTR_FF, }, { .name = "UNC_P_FREQ_BAND1_CYCLES", .desc = "Frequency Residency", .code = 0xc, .cntmsk = 0xf, .flags = INTEL_X86_NO_AUTOENCODE, .modmsk = SNBEP_UNC_PCU_BAND_ATTRS, .modmsk_req = _SNBEP_UNC_ATTR_FF, }, { .name = "UNC_P_FREQ_BAND2_CYCLES", .desc = "Frequency Residency", .code = 0xd, .cntmsk = 0xf, .flags = INTEL_X86_NO_AUTOENCODE, .modmsk = SNBEP_UNC_PCU_BAND_ATTRS, .modmsk_req = _SNBEP_UNC_ATTR_FF, }, { .name = "UNC_P_FREQ_BAND3_CYCLES", .desc = "Frequency Residency", .code = 0xe, .cntmsk = 0xf, .flags = INTEL_X86_NO_AUTOENCODE, .modmsk = SNBEP_UNC_PCU_BAND_ATTRS, .modmsk_req = _SNBEP_UNC_ATTR_FF, }, { .name = "UNC_P_FREQ_MAX_CURRENT_CYCLES", .desc = "Current Strongest Upper Limit Cycles", .code = 0x7, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_FREQ_MAX_LIMIT_THERMAL_CYCLES", .desc = "Thermal Strongest Upper Limit Cycles", .code = 0x4, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_FREQ_MAX_OS_CYCLES", .desc = "OS Strongest Upper Limit Cycles", .code = 0x6, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_FREQ_MAX_POWER_CYCLES", .desc = "Power Strongest Upper Limit Cycles", .code = 0x5, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_FREQ_MIN_IO_P_CYCLES", .desc = "IO P Limit Strongest Lower Limit Cycles", .code = 0x1 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_PCU_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_p_occupancy_counters), .umasks = snbep_unc_p_occupancy_counters }, { .name = "UNC_P_FREQ_MIN_PERF_P_CYCLES", .desc = "Perf P Limit Strongest Lower Limit Cycles", .code = 0x2 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_PCU_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_p_occupancy_counters), .umasks = snbep_unc_p_occupancy_counters }, { .name = "UNC_P_FREQ_TRANS_CYCLES", .desc = "Cycles spent changing Frequency", .code = 0x0 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_PCU_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_p_occupancy_counters), .umasks = snbep_unc_p_occupancy_counters }, { .name = "UNC_P_MEMORY_PHASE_SHEDDING_CYCLES", .desc = "Memory Phase Shedding Cycles", .code = 0x2f, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_POWER_STATE_OCCUPANCY", .desc = "Number of cores in C0", .code = 0x80, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_PCU_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_p_power_state_occupancy), .umasks = snbep_unc_p_power_state_occupancy }, { .name = "UNC_P_PROCHOT_EXTERNAL_CYCLES", .desc = "External Prochot", .code = 0xa, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_PROCHOT_INTERNAL_CYCLES", .desc = "Internal Prochot", .code = 0x9, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_TOTAL_TRANSITION_CYCLES", .desc = "Total Core C State Transition Cycles", .code = 0xb | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_VOLT_TRANS_CYCLES_CHANGE", .desc = "Cycles Changing Voltage", .code = 0x3, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_VOLT_TRANS_CYCLES_DECREASE", .desc = "Cycles Decreasing Voltage", .code = 0x2, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_VOLT_TRANS_CYCLES_INCREASE", .desc = "Cycles Increasing Voltage", .code = 0x1, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, { .name = "UNC_P_VR_HOT_CYCLES", .desc = "VR Hot", .code = 0x32, .cntmsk = 0xf, .modmsk = SNBEP_UNC_PCU_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/power6_events.h0000600003276200002170000044150112247131124021321 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER6_EVENTS_H__ #define __POWER6_EVENTS_H__ /* * File: power6_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define POWER6_PME_PM_LSU_REJECT_STQ_FULL 0 #define POWER6_PME_PM_DPU_HELD_FXU_MULTI 1 #define POWER6_PME_PM_VMX1_STALL 2 #define POWER6_PME_PM_PMC2_SAVED 3 #define POWER6_PME_PM_L2SB_IC_INV 4 #define POWER6_PME_PM_IERAT_MISS_64K 5 #define POWER6_PME_PM_THRD_PRIO_DIFF_3or4_CYC 6 #define POWER6_PME_PM_LD_REF_L1_BOTH 7 #define POWER6_PME_PM_FPU1_FCONV 8 #define POWER6_PME_PM_IBUF_FULL_COUNT 9 #define POWER6_PME_PM_MRK_LSU_DERAT_MISS 10 #define POWER6_PME_PM_MRK_ST_CMPL 11 #define POWER6_PME_PM_L2_CASTOUT_MOD 12 #define POWER6_PME_PM_FPU1_ST_FOLDED 13 #define POWER6_PME_PM_MRK_INST_TIMEO 14 #define POWER6_PME_PM_DPU_WT 15 #define POWER6_PME_PM_DPU_HELD_RESTART 16 #define POWER6_PME_PM_IERAT_MISS 17 #define POWER6_PME_PM_FPU_SINGLE 18 #define POWER6_PME_PM_MRK_PTEG_FROM_LMEM 19 #define POWER6_PME_PM_HV_COUNT 20 #define POWER6_PME_PM_L2SA_ST_HIT 21 #define POWER6_PME_PM_L2_LD_MISS_INST 22 #define POWER6_PME_PM_EXT_INT 23 #define POWER6_PME_PM_LSU1_LDF 24 #define POWER6_PME_PM_FAB_CMD_ISSUED 25 #define POWER6_PME_PM_PTEG_FROM_L21 26 #define POWER6_PME_PM_L2SA_MISS 27 #define POWER6_PME_PM_PTEG_FROM_RL2L3_MOD 28 #define POWER6_PME_PM_DPU_WT_COUNT 29 #define POWER6_PME_PM_MRK_PTEG_FROM_L25_MOD 30 #define POWER6_PME_PM_LD_HIT_L2 31 #define POWER6_PME_PM_PTEG_FROM_DL2L3_SHR 32 #define POWER6_PME_PM_MEM_DP_RQ_GLOB_LOC 33 #define POWER6_PME_PM_L3SA_MISS 34 #define POWER6_PME_PM_NO_ITAG_COUNT 35 #define POWER6_PME_PM_DSLB_MISS 36 #define POWER6_PME_PM_LSU_FLUSH_ALIGN 37 #define POWER6_PME_PM_DPU_HELD_FPU_CR 38 #define POWER6_PME_PM_PTEG_FROM_L2MISS 39 #define POWER6_PME_PM_MRK_DATA_FROM_DMEM 40 #define POWER6_PME_PM_PTEG_FROM_LMEM 41 #define POWER6_PME_PM_MRK_DERAT_REF_64K 42 #define POWER6_PME_PM_L2SA_LD_REQ_INST 43 #define POWER6_PME_PM_MRK_DERAT_MISS_16M 44 #define POWER6_PME_PM_DATA_FROM_DL2L3_MOD 45 #define POWER6_PME_PM_FPU0_FXMULT 46 #define POWER6_PME_PM_L3SB_MISS 47 #define POWER6_PME_PM_STCX_CANCEL 48 #define POWER6_PME_PM_L2SA_LD_MISS_DATA 49 #define POWER6_PME_PM_IC_INV_L2 50 #define POWER6_PME_PM_DPU_HELD 51 #define POWER6_PME_PM_PMC1_OVERFLOW 52 #define POWER6_PME_PM_THRD_PRIO_6_CYC 53 #define POWER6_PME_PM_MRK_PTEG_FROM_L3MISS 54 #define POWER6_PME_PM_MRK_LSU0_REJECT_UST 55 #define POWER6_PME_PM_MRK_INST_DISP 56 #define POWER6_PME_PM_LARX 57 #define POWER6_PME_PM_INST_CMPL 58 #define POWER6_PME_PM_FXU_IDLE 59 #define POWER6_PME_PM_MRK_DATA_FROM_DL2L3_MOD 60 #define POWER6_PME_PM_L2_LD_REQ_DATA 61 #define POWER6_PME_PM_LSU_DERAT_MISS_CYC 62 #define POWER6_PME_PM_DPU_HELD_POWER_COUNT 63 #define POWER6_PME_PM_INST_FROM_RL2L3_MOD 64 #define POWER6_PME_PM_DATA_FROM_DMEM_CYC 65 #define POWER6_PME_PM_DATA_FROM_DMEM 66 #define POWER6_PME_PM_LSU_REJECT_PARTIAL_SECTOR 67 #define POWER6_PME_PM_LSU_REJECT_DERAT_MPRED 68 #define POWER6_PME_PM_LSU1_REJECT_ULD 69 #define POWER6_PME_PM_DATA_FROM_L3_CYC 70 #define POWER6_PME_PM_FXU1_BUSY_FXU0_IDLE 71 #define POWER6_PME_PM_INST_FROM_MEM_DP 72 #define POWER6_PME_PM_LSU_FLUSH_DSI 73 #define POWER6_PME_PM_MRK_DERAT_REF_16G 74 #define POWER6_PME_PM_LSU_LDF_BOTH 75 #define POWER6_PME_PM_FPU1_1FLOP 76 #define POWER6_PME_PM_DATA_FROM_RMEM_CYC 77 #define POWER6_PME_PM_INST_PTEG_SECONDARY 78 #define POWER6_PME_PM_L1_ICACHE_MISS 79 #define POWER6_PME_PM_INST_DISP_LLA 80 #define POWER6_PME_PM_THRD_BOTH_RUN_CYC 81 #define POWER6_PME_PM_LSU_ST_CHAINED 82 #define POWER6_PME_PM_FPU1_FXDIV 83 #define POWER6_PME_PM_FREQ_UP 84 #define POWER6_PME_PM_FAB_RETRY_SYS_PUMP 85 #define POWER6_PME_PM_DATA_FROM_LMEM 86 #define POWER6_PME_PM_PMC3_OVERFLOW 87 #define POWER6_PME_PM_LSU0_REJECT_SET_MPRED 88 #define POWER6_PME_PM_LSU0_REJECT_DERAT_MPRED 89 #define POWER6_PME_PM_LSU1_REJECT_STQ_FULL 90 #define POWER6_PME_PM_MRK_BR_MPRED 91 #define POWER6_PME_PM_L2SA_ST_MISS 92 #define POWER6_PME_PM_LSU0_REJECT_EXTERN 93 #define POWER6_PME_PM_MRK_BR_TAKEN 94 #define POWER6_PME_PM_ISLB_MISS 95 #define POWER6_PME_PM_CYC 96 #define POWER6_PME_PM_FPU_FXDIV 97 #define POWER6_PME_PM_DPU_HELD_LLA_END 98 #define POWER6_PME_PM_MEM0_DP_CL_WR_LOC 99 #define POWER6_PME_PM_MRK_LSU_REJECT_ULD 100 #define POWER6_PME_PM_1PLUS_PPC_CMPL 101 #define POWER6_PME_PM_PTEG_FROM_DMEM 102 #define POWER6_PME_PM_DPU_WT_BR_MPRED_COUNT 103 #define POWER6_PME_PM_GCT_FULL_CYC 104 #define POWER6_PME_PM_INST_FROM_L25_SHR 105 #define POWER6_PME_PM_MRK_DERAT_MISS_4K 106 #define POWER6_PME_PM_DC_PREF_STREAM_ALLOC 107 #define POWER6_PME_PM_FPU1_FIN 108 #define POWER6_PME_PM_BR_MPRED_TA 109 #define POWER6_PME_PM_DPU_HELD_POWER 110 #define POWER6_PME_PM_RUN_INST_CMPL 111 #define POWER6_PME_PM_GCT_EMPTY_CYC 112 #define POWER6_PME_PM_LLA_COUNT 113 #define POWER6_PME_PM_LSU0_REJECT_NO_SCRATCH 114 #define POWER6_PME_PM_DPU_WT_IC_MISS 115 #define POWER6_PME_PM_DATA_FROM_L3MISS 116 #define POWER6_PME_PM_FPU_FPSCR 117 #define POWER6_PME_PM_VMX1_INST_ISSUED 118 #define POWER6_PME_PM_FLUSH 119 #define POWER6_PME_PM_ST_HIT_L2 120 #define POWER6_PME_PM_SYNC_CYC 121 #define POWER6_PME_PM_FAB_SYS_PUMP 122 #define POWER6_PME_PM_IC_PREF_REQ 123 #define POWER6_PME_PM_MEM0_DP_RQ_GLOB_LOC 124 #define POWER6_PME_PM_FPU_ISSUE_0 125 #define POWER6_PME_PM_THRD_PRIO_2_CYC 126 #define POWER6_PME_PM_VMX_SIMPLE_ISSUED 127 #define POWER6_PME_PM_MRK_FPU1_FIN 128 #define POWER6_PME_PM_DPU_HELD_CW 129 #define POWER6_PME_PM_L3SA_REF 130 #define POWER6_PME_PM_STCX 131 #define POWER6_PME_PM_L2SB_MISS 132 #define POWER6_PME_PM_LSU0_REJECT 133 #define POWER6_PME_PM_TB_BIT_TRANS 134 #define POWER6_PME_PM_THERMAL_MAX 135 #define POWER6_PME_PM_FPU0_STF 136 #define POWER6_PME_PM_FPU1_FMA 137 #define POWER6_PME_PM_LSU1_REJECT_LHS 138 #define POWER6_PME_PM_DPU_HELD_INT 139 #define POWER6_PME_PM_THRD_LLA_BOTH_CYC 140 #define POWER6_PME_PM_DPU_HELD_THERMAL_COUNT 141 #define POWER6_PME_PM_PMC4_REWIND 142 #define POWER6_PME_PM_DERAT_REF_16M 143 #define POWER6_PME_PM_FPU0_FCONV 144 #define POWER6_PME_PM_L2SA_LD_REQ_DATA 145 #define POWER6_PME_PM_DATA_FROM_MEM_DP 146 #define POWER6_PME_PM_MRK_VMX_FLOAT_ISSUED 147 #define POWER6_PME_PM_MRK_PTEG_FROM_L2MISS 148 #define POWER6_PME_PM_THRD_PRIO_DIFF_1or2_CYC 149 #define POWER6_PME_PM_VMX0_STALL 150 #define POWER6_PME_PM_IC_DEMAND_L2_BHT_REDIRECT 151 #define POWER6_PME_PM_LSU_DERAT_MISS 152 #define POWER6_PME_PM_FPU0_SINGLE 153 #define POWER6_PME_PM_FPU_ISSUE_STEERING 154 #define POWER6_PME_PM_THRD_PRIO_1_CYC 155 #define POWER6_PME_PM_VMX_COMPLEX_ISSUED 156 #define POWER6_PME_PM_FPU_ISSUE_ST_FOLDED 157 #define POWER6_PME_PM_DFU_FIN 158 #define POWER6_PME_PM_BR_PRED_CCACHE 159 #define POWER6_PME_PM_MRK_ST_CMPL_INT 160 #define POWER6_PME_PM_FAB_MMIO 161 #define POWER6_PME_PM_MRK_VMX_SIMPLE_ISSUED 162 #define POWER6_PME_PM_FPU_STF 163 #define POWER6_PME_PM_MEM1_DP_CL_WR_GLOB 164 #define POWER6_PME_PM_MRK_DATA_FROM_L3MISS 165 #define POWER6_PME_PM_GCT_NOSLOT_CYC 166 #define POWER6_PME_PM_L2_ST_REQ_DATA 167 #define POWER6_PME_PM_INST_TABLEWALK_COUNT 168 #define POWER6_PME_PM_PTEG_FROM_L35_SHR 169 #define POWER6_PME_PM_DPU_HELD_ISYNC 170 #define POWER6_PME_PM_MRK_DATA_FROM_L25_SHR 171 #define POWER6_PME_PM_L3SA_HIT 172 #define POWER6_PME_PM_DERAT_MISS_16G 173 #define POWER6_PME_PM_DATA_PTEG_2ND_HALF 174 #define POWER6_PME_PM_L2SA_ST_REQ 175 #define POWER6_PME_PM_INST_FROM_LMEM 176 #define POWER6_PME_PM_IC_DEMAND_L2_BR_REDIRECT 177 #define POWER6_PME_PM_PTEG_FROM_L2 178 #define POWER6_PME_PM_DATA_PTEG_1ST_HALF 179 #define POWER6_PME_PM_BR_MPRED_COUNT 180 #define POWER6_PME_PM_IERAT_MISS_4K 181 #define POWER6_PME_PM_THRD_BOTH_RUN_COUNT 182 #define POWER6_PME_PM_LSU_REJECT_ULD 183 #define POWER6_PME_PM_DATA_FROM_DL2L3_MOD_CYC 184 #define POWER6_PME_PM_MRK_PTEG_FROM_RL2L3_MOD 185 #define POWER6_PME_PM_FPU0_FLOP 186 #define POWER6_PME_PM_FPU0_FEST 187 #define POWER6_PME_PM_MRK_LSU0_REJECT_LHS 188 #define POWER6_PME_PM_VMX_RESULT_SAT_1 189 #define POWER6_PME_PM_NO_ITAG_CYC 190 #define POWER6_PME_PM_LSU1_REJECT_NO_SCRATCH 191 #define POWER6_PME_PM_0INST_FETCH 192 #define POWER6_PME_PM_DPU_WT_BR_MPRED 193 #define POWER6_PME_PM_L1_PREF 194 #define POWER6_PME_PM_VMX_FLOAT_MULTICYCLE 195 #define POWER6_PME_PM_DATA_FROM_L25_SHR_CYC 196 #define POWER6_PME_PM_DATA_FROM_L3 197 #define POWER6_PME_PM_PMC2_OVERFLOW 198 #define POWER6_PME_PM_VMX0_LD_WRBACK 199 #define POWER6_PME_PM_FPU0_DENORM 200 #define POWER6_PME_PM_INST_FETCH_CYC 201 #define POWER6_PME_PM_LSU_LDF 202 #define POWER6_PME_PM_LSU_REJECT_L2_CORR 203 #define POWER6_PME_PM_DERAT_REF_64K 204 #define POWER6_PME_PM_THRD_PRIO_3_CYC 205 #define POWER6_PME_PM_FPU_FMA 206 #define POWER6_PME_PM_INST_FROM_L35_MOD 207 #define POWER6_PME_PM_DFU_CONV 208 #define POWER6_PME_PM_INST_FROM_L25_MOD 209 #define POWER6_PME_PM_PTEG_FROM_L35_MOD 210 #define POWER6_PME_PM_MRK_VMX_ST_ISSUED 211 #define POWER6_PME_PM_VMX_FLOAT_ISSUED 212 #define POWER6_PME_PM_LSU0_REJECT_L2_CORR 213 #define POWER6_PME_PM_THRD_L2MISS 214 #define POWER6_PME_PM_FPU_FCONV 215 #define POWER6_PME_PM_FPU_FXMULT 216 #define POWER6_PME_PM_FPU1_FRSP 217 #define POWER6_PME_PM_MRK_DERAT_REF_16M 218 #define POWER6_PME_PM_L2SB_CASTOUT_SHR 219 #define POWER6_PME_PM_THRD_ONE_RUN_COUNT 220 #define POWER6_PME_PM_INST_FROM_RMEM 221 #define POWER6_PME_PM_LSU_BOTH_BUS 222 #define POWER6_PME_PM_FPU1_FSQRT_FDIV 223 #define POWER6_PME_PM_L2_LD_REQ_INST 224 #define POWER6_PME_PM_MRK_PTEG_FROM_L35_SHR 225 #define POWER6_PME_PM_BR_PRED_CR 226 #define POWER6_PME_PM_MRK_LSU0_REJECT_ULD 227 #define POWER6_PME_PM_LSU_REJECT 228 #define POWER6_PME_PM_LSU_REJECT_LHS_BOTH 229 #define POWER6_PME_PM_GXO_ADDR_CYC_BUSY 230 #define POWER6_PME_PM_LSU_SRQ_EMPTY_COUNT 231 #define POWER6_PME_PM_PTEG_FROM_L3 232 #define POWER6_PME_PM_VMX0_LD_ISSUED 233 #define POWER6_PME_PM_FXU_PIPELINED_MULT_DIV 234 #define POWER6_PME_PM_FPU1_STF 235 #define POWER6_PME_PM_DFU_ADD 236 #define POWER6_PME_PM_MEM_DP_CL_WR_GLOB 237 #define POWER6_PME_PM_MRK_LSU1_REJECT_ULD 238 #define POWER6_PME_PM_ITLB_REF 239 #define POWER6_PME_PM_LSU0_REJECT_L2MISS 240 #define POWER6_PME_PM_DATA_FROM_L35_SHR 241 #define POWER6_PME_PM_MRK_DATA_FROM_RL2L3_MOD 242 #define POWER6_PME_PM_FPU0_FPSCR 243 #define POWER6_PME_PM_DATA_FROM_L2 244 #define POWER6_PME_PM_DPU_HELD_XER 245 #define POWER6_PME_PM_FAB_NODE_PUMP 246 #define POWER6_PME_PM_VMX_RESULT_SAT_0_1 247 #define POWER6_PME_PM_LD_REF_L1 248 #define POWER6_PME_PM_TLB_REF 249 #define POWER6_PME_PM_DC_PREF_OUT_OF_STREAMS 250 #define POWER6_PME_PM_FLUSH_FPU 251 #define POWER6_PME_PM_MEM1_DP_CL_WR_LOC 252 #define POWER6_PME_PM_L2SB_LD_HIT 253 #define POWER6_PME_PM_FAB_DCLAIM 254 #define POWER6_PME_PM_MEM_DP_CL_WR_LOC 255 #define POWER6_PME_PM_BR_MPRED_CR 256 #define POWER6_PME_PM_LSU_REJECT_EXTERN 257 #define POWER6_PME_PM_DATA_FROM_RL2L3_MOD 258 #define POWER6_PME_PM_DPU_HELD_RU_WQ 259 #define POWER6_PME_PM_LD_MISS_L1 260 #define POWER6_PME_PM_DC_INV_L2 261 #define POWER6_PME_PM_MRK_PTEG_FROM_RMEM 262 #define POWER6_PME_PM_FPU_FIN 263 #define POWER6_PME_PM_FXU0_FIN 264 #define POWER6_PME_PM_DPU_HELD_FPQ 265 #define POWER6_PME_PM_GX_DMA_READ 266 #define POWER6_PME_PM_LSU1_REJECT_PARTIAL_SECTOR 267 #define POWER6_PME_PM_0INST_FETCH_COUNT 268 #define POWER6_PME_PM_PMC5_OVERFLOW 269 #define POWER6_PME_PM_L2SB_LD_REQ 270 #define POWER6_PME_PM_THRD_PRIO_DIFF_0_CYC 271 #define POWER6_PME_PM_DATA_FROM_RMEM 272 #define POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_BOTH_CYC 273 #define POWER6_PME_PM_ST_REF_L1_BOTH 274 #define POWER6_PME_PM_VMX_PERMUTE_ISSUED 275 #define POWER6_PME_PM_BR_TAKEN 276 #define POWER6_PME_PM_FAB_DMA 277 #define POWER6_PME_PM_GCT_EMPTY_COUNT 278 #define POWER6_PME_PM_FPU1_SINGLE 279 #define POWER6_PME_PM_L2SA_CASTOUT_SHR 280 #define POWER6_PME_PM_L3SB_REF 281 #define POWER6_PME_PM_FPU0_FRSP 282 #define POWER6_PME_PM_PMC4_SAVED 283 #define POWER6_PME_PM_L2SA_DC_INV 284 #define POWER6_PME_PM_GXI_ADDR_CYC_BUSY 285 #define POWER6_PME_PM_FPU0_FMA 286 #define POWER6_PME_PM_SLB_MISS 287 #define POWER6_PME_PM_MRK_ST_GPS 288 #define POWER6_PME_PM_DERAT_REF_4K 289 #define POWER6_PME_PM_L2_CASTOUT_SHR 290 #define POWER6_PME_PM_DPU_HELD_STCX_CR 291 #define POWER6_PME_PM_FPU0_ST_FOLDED 292 #define POWER6_PME_PM_MRK_DATA_FROM_L21 293 #define POWER6_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC 294 #define POWER6_PME_PM_DATA_FROM_L35_MOD 295 #define POWER6_PME_PM_DATA_FROM_DL2L3_SHR 296 #define POWER6_PME_PM_GXI_DATA_CYC_BUSY 297 #define POWER6_PME_PM_LSU_REJECT_STEAL 298 #define POWER6_PME_PM_ST_FIN 299 #define POWER6_PME_PM_DPU_HELD_CR_LOGICAL 300 #define POWER6_PME_PM_THRD_SEL_T0 301 #define POWER6_PME_PM_PTEG_RELOAD_VALID 302 #define POWER6_PME_PM_L2_PREF_ST 303 #define POWER6_PME_PM_MRK_STCX_FAIL 304 #define POWER6_PME_PM_LSU0_REJECT_LHS 305 #define POWER6_PME_PM_DFU_EXP_EQ 306 #define POWER6_PME_PM_DPU_HELD_FP_FX_MULT 307 #define POWER6_PME_PM_L2_LD_MISS_DATA 308 #define POWER6_PME_PM_DATA_FROM_L35_MOD_CYC 309 #define POWER6_PME_PM_FLUSH_FXU 310 #define POWER6_PME_PM_FPU_ISSUE_1 311 #define POWER6_PME_PM_DATA_FROM_LMEM_CYC 312 #define POWER6_PME_PM_DPU_HELD_LSU_SOPS 313 #define POWER6_PME_PM_INST_PTEG_2ND_HALF 314 #define POWER6_PME_PM_THRESH_TIMEO 315 #define POWER6_PME_PM_LSU_REJECT_UST_BOTH 316 #define POWER6_PME_PM_LSU_REJECT_FAST 317 #define POWER6_PME_PM_DPU_HELD_THRD_PRIO 318 #define POWER6_PME_PM_L2_PREF_LD 319 #define POWER6_PME_PM_FPU_FEST 320 #define POWER6_PME_PM_MRK_DATA_FROM_RMEM 321 #define POWER6_PME_PM_LD_MISS_L1_CYC 322 #define POWER6_PME_PM_DERAT_MISS_4K 323 #define POWER6_PME_PM_DPU_HELD_COMPLETION 324 #define POWER6_PME_PM_FPU_ISSUE_STALL_ST 325 #define POWER6_PME_PM_L2SB_DC_INV 326 #define POWER6_PME_PM_PTEG_FROM_L25_SHR 327 #define POWER6_PME_PM_PTEG_FROM_DL2L3_MOD 328 #define POWER6_PME_PM_FAB_CMD_RETRIED 329 #define POWER6_PME_PM_BR_PRED_LSTACK 330 #define POWER6_PME_PM_GXO_DATA_CYC_BUSY 331 #define POWER6_PME_PM_DFU_SUBNORM 332 #define POWER6_PME_PM_FPU_ISSUE_OOO 333 #define POWER6_PME_PM_LSU_REJECT_ULD_BOTH 334 #define POWER6_PME_PM_L2SB_ST_MISS 335 #define POWER6_PME_PM_DATA_FROM_L25_MOD_CYC 336 #define POWER6_PME_PM_INST_PTEG_1ST_HALF 337 #define POWER6_PME_PM_DERAT_MISS_16M 338 #define POWER6_PME_PM_GX_DMA_WRITE 339 #define POWER6_PME_PM_MRK_PTEG_FROM_DL2L3_MOD 340 #define POWER6_PME_PM_MEM1_DP_RQ_GLOB_LOC 341 #define POWER6_PME_PM_L2SB_LD_REQ_DATA 342 #define POWER6_PME_PM_L2SA_LD_MISS_INST 343 #define POWER6_PME_PM_MRK_LSU0_REJECT_L2MISS 344 #define POWER6_PME_PM_MRK_IFU_FIN 345 #define POWER6_PME_PM_INST_FROM_L3 346 #define POWER6_PME_PM_FXU1_FIN 347 #define POWER6_PME_PM_THRD_PRIO_4_CYC 348 #define POWER6_PME_PM_MRK_DATA_FROM_L35_MOD 349 #define POWER6_PME_PM_LSU_REJECT_SET_MPRED 350 #define POWER6_PME_PM_MRK_DERAT_MISS_16G 351 #define POWER6_PME_PM_FPU0_FXDIV 352 #define POWER6_PME_PM_MRK_LSU1_REJECT_UST 353 #define POWER6_PME_PM_FPU_ISSUE_DIV_SQRT_OVERLAP 354 #define POWER6_PME_PM_INST_FROM_L35_SHR 355 #define POWER6_PME_PM_MRK_LSU_REJECT_LHS 356 #define POWER6_PME_PM_LSU_LMQ_FULL_CYC 357 #define POWER6_PME_PM_SYNC_COUNT 358 #define POWER6_PME_PM_MEM0_DP_RQ_LOC_GLOB 359 #define POWER6_PME_PM_L2SA_CASTOUT_MOD 360 #define POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_BOTH_COUNT 361 #define POWER6_PME_PM_PTEG_FROM_MEM_DP 362 #define POWER6_PME_PM_LSU_REJECT_SLOW 363 #define POWER6_PME_PM_PTEG_FROM_L25_MOD 364 #define POWER6_PME_PM_THRD_PRIO_7_CYC 365 #define POWER6_PME_PM_MRK_PTEG_FROM_RL2L3_SHR 366 #define POWER6_PME_PM_ST_REQ_L2 367 #define POWER6_PME_PM_ST_REF_L1 368 #define POWER6_PME_PM_FPU_ISSUE_STALL_THRD 369 #define POWER6_PME_PM_RUN_COUNT 370 #define POWER6_PME_PM_RUN_CYC 371 #define POWER6_PME_PM_PTEG_FROM_RMEM 372 #define POWER6_PME_PM_LSU0_LDF 373 #define POWER6_PME_PM_ST_MISS_L1 374 #define POWER6_PME_PM_INST_FROM_DL2L3_SHR 375 #define POWER6_PME_PM_L2SA_IC_INV 376 #define POWER6_PME_PM_THRD_ONE_RUN_CYC 377 #define POWER6_PME_PM_L2SB_LD_REQ_INST 378 #define POWER6_PME_PM_MRK_DATA_FROM_L25_MOD 379 #define POWER6_PME_PM_DPU_HELD_XTHRD 380 #define POWER6_PME_PM_L2SB_ST_REQ 381 #define POWER6_PME_PM_INST_FROM_L21 382 #define POWER6_PME_PM_INST_FROM_L3MISS 383 #define POWER6_PME_PM_L3SB_HIT 384 #define POWER6_PME_PM_EE_OFF_EXT_INT 385 #define POWER6_PME_PM_INST_FROM_DL2L3_MOD 386 #define POWER6_PME_PM_PMC6_OVERFLOW 387 #define POWER6_PME_PM_FPU_FLOP 388 #define POWER6_PME_PM_FXU_BUSY 389 #define POWER6_PME_PM_FPU1_FLOP 390 #define POWER6_PME_PM_IC_RELOAD_SHR 391 #define POWER6_PME_PM_INST_TABLEWALK_CYC 392 #define POWER6_PME_PM_DATA_FROM_RL2L3_MOD_CYC 393 #define POWER6_PME_PM_THRD_PRIO_DIFF_5or6_CYC 394 #define POWER6_PME_PM_IBUF_FULL_CYC 395 #define POWER6_PME_PM_L2SA_LD_REQ 396 #define POWER6_PME_PM_VMX1_LD_WRBACK 397 #define POWER6_PME_PM_MRK_FPU_FIN 398 #define POWER6_PME_PM_THRD_PRIO_5_CYC 399 #define POWER6_PME_PM_DFU_BACK2BACK 400 #define POWER6_PME_PM_MRK_DATA_FROM_LMEM 401 #define POWER6_PME_PM_LSU_REJECT_LHS 402 #define POWER6_PME_PM_DPU_HELD_SPR 403 #define POWER6_PME_PM_FREQ_DOWN 404 #define POWER6_PME_PM_DFU_ENC_BCD_DPD 405 #define POWER6_PME_PM_DPU_HELD_GPR 406 #define POWER6_PME_PM_LSU0_NCST 407 #define POWER6_PME_PM_MRK_INST_ISSUED 408 #define POWER6_PME_PM_INST_FROM_RL2L3_SHR 409 #define POWER6_PME_PM_FPU_DENORM 410 #define POWER6_PME_PM_PTEG_FROM_L3MISS 411 #define POWER6_PME_PM_RUN_PURR 412 #define POWER6_PME_PM_MRK_VMX0_LD_WRBACK 413 #define POWER6_PME_PM_L2_MISS 414 #define POWER6_PME_PM_MRK_DATA_FROM_L3 415 #define POWER6_PME_PM_MRK_LSU1_REJECT_LHS 416 #define POWER6_PME_PM_L2SB_LD_MISS_INST 417 #define POWER6_PME_PM_PTEG_FROM_RL2L3_SHR 418 #define POWER6_PME_PM_MRK_DERAT_MISS_64K 419 #define POWER6_PME_PM_LWSYNC 420 #define POWER6_PME_PM_FPU1_FXMULT 421 #define POWER6_PME_PM_MEM0_DP_CL_WR_GLOB 422 #define POWER6_PME_PM_LSU0_REJECT_PARTIAL_SECTOR 423 #define POWER6_PME_PM_INST_IMC_MATCH_CMPL 424 #define POWER6_PME_PM_DPU_HELD_THERMAL 425 #define POWER6_PME_PM_FPU_FRSP 426 #define POWER6_PME_PM_MRK_INST_FIN 427 #define POWER6_PME_PM_MRK_PTEG_FROM_DL2L3_SHR 428 #define POWER6_PME_PM_MRK_DTLB_REF 429 #define POWER6_PME_PM_MRK_PTEG_FROM_L25_SHR 430 #define POWER6_PME_PM_DPU_HELD_LSU 431 #define POWER6_PME_PM_FPU_FSQRT_FDIV 432 #define POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_COUNT 433 #define POWER6_PME_PM_DATA_PTEG_SECONDARY 434 #define POWER6_PME_PM_FPU1_FEST 435 #define POWER6_PME_PM_L2SA_LD_HIT 436 #define POWER6_PME_PM_DATA_FROM_MEM_DP_CYC 437 #define POWER6_PME_PM_BR_MPRED_CCACHE 438 #define POWER6_PME_PM_DPU_HELD_COUNT 439 #define POWER6_PME_PM_LSU1_REJECT_SET_MPRED 440 #define POWER6_PME_PM_FPU_ISSUE_2 441 #define POWER6_PME_PM_LSU1_REJECT_L2_CORR 442 #define POWER6_PME_PM_MRK_PTEG_FROM_DMEM 443 #define POWER6_PME_PM_MEM1_DP_RQ_LOC_GLOB 444 #define POWER6_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC 445 #define POWER6_PME_PM_THRD_PRIO_0_CYC 446 #define POWER6_PME_PM_FXU0_BUSY_FXU1_IDLE 447 #define POWER6_PME_PM_LSU1_REJECT_DERAT_MPRED 448 #define POWER6_PME_PM_MRK_VMX1_LD_WRBACK 449 #define POWER6_PME_PM_DATA_FROM_RL2L3_SHR_CYC 450 #define POWER6_PME_PM_IERAT_MISS_16M 451 #define POWER6_PME_PM_MRK_DATA_FROM_MEM_DP 452 #define POWER6_PME_PM_LARX_L1HIT 453 #define POWER6_PME_PM_L2_ST_MISS_DATA 454 #define POWER6_PME_PM_FPU_ST_FOLDED 455 #define POWER6_PME_PM_MRK_DATA_FROM_L35_SHR 456 #define POWER6_PME_PM_DPU_HELD_MULT_GPR 457 #define POWER6_PME_PM_FPU0_1FLOP 458 #define POWER6_PME_PM_IERAT_MISS_16G 459 #define POWER6_PME_PM_IC_PREF_WRITE 460 #define POWER6_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC 461 #define POWER6_PME_PM_FPU0_FIN 462 #define POWER6_PME_PM_DATA_FROM_L2_CYC 463 #define POWER6_PME_PM_DERAT_REF_16G 464 #define POWER6_PME_PM_BR_PRED 465 #define POWER6_PME_PM_VMX1_LD_ISSUED 466 #define POWER6_PME_PM_L2SB_CASTOUT_MOD 467 #define POWER6_PME_PM_INST_FROM_DMEM 468 #define POWER6_PME_PM_DATA_FROM_L35_SHR_CYC 469 #define POWER6_PME_PM_LSU0_NCLD 470 #define POWER6_PME_PM_FAB_RETRY_NODE_PUMP 471 #define POWER6_PME_PM_VMX0_INST_ISSUED 472 #define POWER6_PME_PM_DATA_FROM_L25_MOD 473 #define POWER6_PME_PM_DPU_HELD_ITLB_ISLB 474 #define POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 475 #define POWER6_PME_PM_THRD_CONC_RUN_INST 476 #define POWER6_PME_PM_MRK_PTEG_FROM_L2 477 #define POWER6_PME_PM_PURR 478 #define POWER6_PME_PM_DERAT_MISS_64K 479 #define POWER6_PME_PM_PMC2_REWIND 480 #define POWER6_PME_PM_INST_FROM_L2 481 #define POWER6_PME_PM_INST_DISP 482 #define POWER6_PME_PM_DATA_FROM_L25_SHR 483 #define POWER6_PME_PM_L1_DCACHE_RELOAD_VALID 484 #define POWER6_PME_PM_LSU1_REJECT_UST 485 #define POWER6_PME_PM_FAB_ADDR_COLLISION 486 #define POWER6_PME_PM_MRK_FXU_FIN 487 #define POWER6_PME_PM_LSU0_REJECT_UST 488 #define POWER6_PME_PM_PMC4_OVERFLOW 489 #define POWER6_PME_PM_MRK_PTEG_FROM_L3 490 #define POWER6_PME_PM_INST_FROM_L2MISS 491 #define POWER6_PME_PM_L2SB_ST_HIT 492 #define POWER6_PME_PM_DPU_WT_IC_MISS_COUNT 493 #define POWER6_PME_PM_MRK_DATA_FROM_DL2L3_SHR 494 #define POWER6_PME_PM_MRK_PTEG_FROM_L35_MOD 495 #define POWER6_PME_PM_FPU1_FPSCR 496 #define POWER6_PME_PM_LSU_REJECT_UST 497 #define POWER6_PME_PM_LSU0_DERAT_MISS 498 #define POWER6_PME_PM_MRK_PTEG_FROM_MEM_DP 499 #define POWER6_PME_PM_MRK_DATA_FROM_L2 500 #define POWER6_PME_PM_FPU0_FSQRT_FDIV 501 #define POWER6_PME_PM_DPU_HELD_FXU_SOPS 502 #define POWER6_PME_PM_MRK_FPU0_FIN 503 #define POWER6_PME_PM_L2SB_LD_MISS_DATA 504 #define POWER6_PME_PM_LSU_SRQ_EMPTY_CYC 505 #define POWER6_PME_PM_1PLUS_PPC_DISP 506 #define POWER6_PME_PM_VMX_ST_ISSUED 507 #define POWER6_PME_PM_DATA_FROM_L2MISS 508 #define POWER6_PME_PM_LSU0_REJECT_ULD 509 #define POWER6_PME_PM_SUSPENDED 510 #define POWER6_PME_PM_DFU_ADD_SHIFTED_BOTH 511 #define POWER6_PME_PM_LSU_REJECT_NO_SCRATCH 512 #define POWER6_PME_PM_STCX_FAIL 513 #define POWER6_PME_PM_FPU1_DENORM 514 #define POWER6_PME_PM_GCT_NOSLOT_COUNT 515 #define POWER6_PME_PM_DATA_FROM_DL2L3_SHR_CYC 516 #define POWER6_PME_PM_DATA_FROM_L21 517 #define POWER6_PME_PM_FPU_1FLOP 518 #define POWER6_PME_PM_LSU1_REJECT 519 #define POWER6_PME_PM_IC_REQ 520 #define POWER6_PME_PM_MRK_DFU_FIN 521 #define POWER6_PME_PM_NOT_LLA_CYC 522 #define POWER6_PME_PM_INST_FROM_L1 523 #define POWER6_PME_PM_MRK_VMX_COMPLEX_ISSUED 524 #define POWER6_PME_PM_BRU_FIN 525 #define POWER6_PME_PM_LSU1_REJECT_EXTERN 526 #define POWER6_PME_PM_DATA_FROM_L21_CYC 527 #define POWER6_PME_PM_GXI_CYC_BUSY 528 #define POWER6_PME_PM_MRK_LD_MISS_L1 529 #define POWER6_PME_PM_L1_WRITE_CYC 530 #define POWER6_PME_PM_LLA_CYC 531 #define POWER6_PME_PM_MRK_DATA_FROM_L2MISS 532 #define POWER6_PME_PM_GCT_FULL_COUNT 533 #define POWER6_PME_PM_MEM_DP_RQ_LOC_GLOB 534 #define POWER6_PME_PM_DATA_FROM_RL2L3_SHR 535 #define POWER6_PME_PM_MRK_LSU_REJECT_UST 536 #define POWER6_PME_PM_MRK_VMX_PERMUTE_ISSUED 537 #define POWER6_PME_PM_MRK_PTEG_FROM_L21 538 #define POWER6_PME_PM_THRD_GRP_CMPL_BOTH_CYC 539 #define POWER6_PME_PM_BR_MPRED 540 #define POWER6_PME_PM_LD_REQ_L2 541 #define POWER6_PME_PM_FLUSH_ASYNC 542 #define POWER6_PME_PM_HV_CYC 543 #define POWER6_PME_PM_LSU1_DERAT_MISS 544 #define POWER6_PME_PM_DPU_HELD_SMT 545 #define POWER6_PME_PM_MRK_LSU_FIN 546 #define POWER6_PME_PM_MRK_DATA_FROM_RL2L3_SHR 547 #define POWER6_PME_PM_LSU0_REJECT_STQ_FULL 548 #define POWER6_PME_PM_MRK_DERAT_REF_4K 549 #define POWER6_PME_PM_FPU_ISSUE_STALL_FPR 550 #define POWER6_PME_PM_IFU_FIN 551 #define POWER6_PME_PM_GXO_CYC_BUSY 552 static const pme_power_entry_t power6_pe[] = { [ POWER6_PME_PM_LSU_REJECT_STQ_FULL ] = { .pme_name = "PM_LSU_REJECT_STQ_FULL", .pme_code = 0x1a0030, .pme_short_desc = "LSU reject due to store queue full", .pme_long_desc = "LSU reject due to store queue full", }, [ POWER6_PME_PM_DPU_HELD_FXU_MULTI ] = { .pme_name = "PM_DPU_HELD_FXU_MULTI", .pme_code = 0x210a6, .pme_short_desc = "DISP unit held due to FXU multicycle", .pme_long_desc = "DISP unit held due to FXU multicycle", }, [ POWER6_PME_PM_VMX1_STALL ] = { .pme_name = "PM_VMX1_STALL", .pme_code = 0xb008c, .pme_short_desc = "VMX1 stall", .pme_long_desc = "VMX1 stall", }, [ POWER6_PME_PM_PMC2_SAVED ] = { .pme_name = "PM_PMC2_SAVED", .pme_code = 0x100022, .pme_short_desc = "PMC2 rewind value saved", .pme_long_desc = "PMC2 rewind value saved", }, [ POWER6_PME_PM_L2SB_IC_INV ] = { .pme_name = "PM_L2SB_IC_INV", .pme_code = 0x5068c, .pme_short_desc = "L2 slice B I cache invalidate", .pme_long_desc = "L2 slice B I cache invalidate", }, [ POWER6_PME_PM_IERAT_MISS_64K ] = { .pme_name = "PM_IERAT_MISS_64K", .pme_code = 0x392076, .pme_short_desc = "IERAT misses for 64K page", .pme_long_desc = "IERAT misses for 64K page", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_3or4_CYC", .pme_code = 0x323040, .pme_short_desc = "Cycles thread priority difference is 3 or 4", .pme_long_desc = "Cycles thread priority difference is 3 or 4", }, [ POWER6_PME_PM_LD_REF_L1_BOTH ] = { .pme_name = "PM_LD_REF_L1_BOTH", .pme_code = 0x180036, .pme_short_desc = "Both units L1 D cache load reference", .pme_long_desc = "Both units L1 D cache load reference", }, [ POWER6_PME_PM_FPU1_FCONV ] = { .pme_name = "PM_FPU1_FCONV", .pme_code = 0xd10a8, .pme_short_desc = "FPU1 executed FCONV instruction", .pme_long_desc = "FPU1 executed FCONV instruction", }, [ POWER6_PME_PM_IBUF_FULL_COUNT ] = { .pme_name = "PM_IBUF_FULL_COUNT", .pme_code = 0x40085, .pme_short_desc = "Periods instruction buffer full", .pme_long_desc = "Periods instruction buffer full", }, [ POWER6_PME_PM_MRK_LSU_DERAT_MISS ] = { .pme_name = "PM_MRK_LSU_DERAT_MISS", .pme_code = 0x400012, .pme_short_desc = "Marked DERAT miss", .pme_long_desc = "Marked DERAT miss", }, [ POWER6_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x100006, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ POWER6_PME_PM_L2_CASTOUT_MOD ] = { .pme_name = "PM_L2_CASTOUT_MOD", .pme_code = 0x150630, .pme_short_desc = "L2 castouts - Modified (M, Mu, Me)", .pme_long_desc = "L2 castouts - Modified (M, Mu, Me)", }, [ POWER6_PME_PM_FPU1_ST_FOLDED ] = { .pme_name = "PM_FPU1_ST_FOLDED", .pme_code = 0xd10ac, .pme_short_desc = "FPU1 folded store", .pme_long_desc = "FPU1 folded store", }, [ POWER6_PME_PM_MRK_INST_TIMEO ] = { .pme_name = "PM_MRK_INST_TIMEO", .pme_code = 0x40003e, .pme_short_desc = "Marked Instruction finish timeout ", .pme_long_desc = "Marked Instruction finish timeout ", }, [ POWER6_PME_PM_DPU_WT ] = { .pme_name = "PM_DPU_WT", .pme_code = 0x300004, .pme_short_desc = "Cycles DISP unit is stalled waiting for instructions", .pme_long_desc = "Cycles DISP unit is stalled waiting for instructions", }, [ POWER6_PME_PM_DPU_HELD_RESTART ] = { .pme_name = "PM_DPU_HELD_RESTART", .pme_code = 0x30086, .pme_short_desc = "DISP unit held after restart coming", .pme_long_desc = "DISP unit held after restart coming", }, [ POWER6_PME_PM_IERAT_MISS ] = { .pme_name = "PM_IERAT_MISS", .pme_code = 0x420ce, .pme_short_desc = "IERAT miss count", .pme_long_desc = "IERAT miss count", }, [ POWER6_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x4c1030, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1", }, [ POWER6_PME_PM_MRK_PTEG_FROM_LMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_LMEM", .pme_code = 0x412042, .pme_short_desc = "Marked PTEG loaded from local memory", .pme_long_desc = "Marked PTEG loaded from local memory", }, [ POWER6_PME_PM_HV_COUNT ] = { .pme_name = "PM_HV_COUNT", .pme_code = 0x200017, .pme_short_desc = "Hypervisor Periods", .pme_long_desc = "Periods when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ POWER6_PME_PM_L2SA_ST_HIT ] = { .pme_name = "PM_L2SA_ST_HIT", .pme_code = 0x50786, .pme_short_desc = "L2 slice A store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A,B, and C.", }, [ POWER6_PME_PM_L2_LD_MISS_INST ] = { .pme_name = "PM_L2_LD_MISS_INST", .pme_code = 0x250530, .pme_short_desc = "L2 instruction load misses", .pme_long_desc = "L2 instruction load misses", }, [ POWER6_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x2000f8, .pme_short_desc = "External interrupts", .pme_long_desc = "An external interrupt occurred", }, [ POWER6_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0x8008c, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 1", }, [ POWER6_PME_PM_FAB_CMD_ISSUED ] = { .pme_name = "PM_FAB_CMD_ISSUED", .pme_code = 0x150130, .pme_short_desc = "Fabric command issued", .pme_long_desc = "Fabric command issued", }, [ POWER6_PME_PM_PTEG_FROM_L21 ] = { .pme_name = "PM_PTEG_FROM_L21", .pme_code = 0x213048, .pme_short_desc = "PTEG loaded from private L2 other core", .pme_long_desc = "PTEG loaded from private L2 other core", }, [ POWER6_PME_PM_L2SA_MISS ] = { .pme_name = "PM_L2SA_MISS", .pme_code = 0x50584, .pme_short_desc = "L2 slice A misses", .pme_long_desc = "L2 slice A misses", }, [ POWER6_PME_PM_PTEG_FROM_RL2L3_MOD ] = { .pme_name = "PM_PTEG_FROM_RL2L3_MOD", .pme_code = 0x11304c, .pme_short_desc = "PTEG loaded from remote L2 or L3 modified", .pme_long_desc = "PTEG loaded from remote L2 or L3 modified", }, [ POWER6_PME_PM_DPU_WT_COUNT ] = { .pme_name = "PM_DPU_WT_COUNT", .pme_code = 0x300005, .pme_short_desc = "Periods DISP unit is stalled waiting for instructions", .pme_long_desc = "Periods DISP unit is stalled waiting for instructions", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L25_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_L25_MOD", .pme_code = 0x312046, .pme_short_desc = "Marked PTEG loaded from L2.5 modified", .pme_long_desc = "Marked PTEG loaded from L2.5 modified", }, [ POWER6_PME_PM_LD_HIT_L2 ] = { .pme_name = "PM_LD_HIT_L2", .pme_code = 0x250730, .pme_short_desc = "L2 D cache load hits", .pme_long_desc = "L2 D cache load hits", }, [ POWER6_PME_PM_PTEG_FROM_DL2L3_SHR ] = { .pme_name = "PM_PTEG_FROM_DL2L3_SHR", .pme_code = 0x31304c, .pme_short_desc = "PTEG loaded from distant L2 or L3 shared", .pme_long_desc = "PTEG loaded from distant L2 or L3 shared", }, [ POWER6_PME_PM_MEM_DP_RQ_GLOB_LOC ] = { .pme_name = "PM_MEM_DP_RQ_GLOB_LOC", .pme_code = 0x150230, .pme_short_desc = "Memory read queue marking cache line double pump state from global to local", .pme_long_desc = "Memory read queue marking cache line double pump state from global to local", }, [ POWER6_PME_PM_L3SA_MISS ] = { .pme_name = "PM_L3SA_MISS", .pme_code = 0x50084, .pme_short_desc = "L3 slice A misses", .pme_long_desc = "L3 slice A misses", }, [ POWER6_PME_PM_NO_ITAG_COUNT ] = { .pme_name = "PM_NO_ITAG_COUNT", .pme_code = 0x40089, .pme_short_desc = "Periods no ITAG available", .pme_long_desc = "Periods no ITAG available", }, [ POWER6_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x830e8, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve", }, [ POWER6_PME_PM_LSU_FLUSH_ALIGN ] = { .pme_name = "PM_LSU_FLUSH_ALIGN", .pme_code = 0x220cc, .pme_short_desc = "Flush caused by alignement exception", .pme_long_desc = "Flush caused by alignement exception", }, [ POWER6_PME_PM_DPU_HELD_FPU_CR ] = { .pme_name = "PM_DPU_HELD_FPU_CR", .pme_code = 0x210a0, .pme_short_desc = "DISP unit held due to FPU updating CR", .pme_long_desc = "DISP unit held due to FPU updating CR", }, [ POWER6_PME_PM_PTEG_FROM_L2MISS ] = { .pme_name = "PM_PTEG_FROM_L2MISS", .pme_code = 0x113028, .pme_short_desc = "PTEG loaded from L2 miss", .pme_long_desc = "PTEG loaded from L2 miss", }, [ POWER6_PME_PM_MRK_DATA_FROM_DMEM ] = { .pme_name = "PM_MRK_DATA_FROM_DMEM", .pme_code = 0x20304a, .pme_short_desc = "Marked data loaded from distant memory", .pme_long_desc = "Marked data loaded from distant memory", }, [ POWER6_PME_PM_PTEG_FROM_LMEM ] = { .pme_name = "PM_PTEG_FROM_LMEM", .pme_code = 0x41304a, .pme_short_desc = "PTEG loaded from local memory", .pme_long_desc = "PTEG loaded from local memory", }, [ POWER6_PME_PM_MRK_DERAT_REF_64K ] = { .pme_name = "PM_MRK_DERAT_REF_64K", .pme_code = 0x182044, .pme_short_desc = "Marked DERAT reference for 64K page", .pme_long_desc = "Marked DERAT reference for 64K page", }, [ POWER6_PME_PM_L2SA_LD_REQ_INST ] = { .pme_name = "PM_L2SA_LD_REQ_INST", .pme_code = 0x50580, .pme_short_desc = "L2 slice A instruction load requests", .pme_long_desc = "L2 slice A instruction load requests", }, [ POWER6_PME_PM_MRK_DERAT_MISS_16M ] = { .pme_name = "PM_MRK_DERAT_MISS_16M", .pme_code = 0x392044, .pme_short_desc = "Marked DERAT misses for 16M page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 16M page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_DATA_FROM_DL2L3_MOD ] = { .pme_name = "PM_DATA_FROM_DL2L3_MOD", .pme_code = 0x40005c, .pme_short_desc = "Data loaded from distant L2 or L3 modified", .pme_long_desc = "Data loaded from distant L2 or L3 modified", }, [ POWER6_PME_PM_FPU0_FXMULT ] = { .pme_name = "PM_FPU0_FXMULT", .pme_code = 0xd0086, .pme_short_desc = "FPU0 executed fixed point multiplication", .pme_long_desc = "FPU0 executed fixed point multiplication", }, [ POWER6_PME_PM_L3SB_MISS ] = { .pme_name = "PM_L3SB_MISS", .pme_code = 0x5008c, .pme_short_desc = "L3 slice B misses", .pme_long_desc = "L3 slice B misses", }, [ POWER6_PME_PM_STCX_CANCEL ] = { .pme_name = "PM_STCX_CANCEL", .pme_code = 0x830ec, .pme_short_desc = "stcx cancel by core", .pme_long_desc = "stcx cancel by core", }, [ POWER6_PME_PM_L2SA_LD_MISS_DATA ] = { .pme_name = "PM_L2SA_LD_MISS_DATA", .pme_code = 0x50482, .pme_short_desc = "L2 slice A data load misses", .pme_long_desc = "L2 slice A data load misses", }, [ POWER6_PME_PM_IC_INV_L2 ] = { .pme_name = "PM_IC_INV_L2", .pme_code = 0x250632, .pme_short_desc = "L1 I cache entries invalidated from L2", .pme_long_desc = "L1 I cache entries invalidated from L2", }, [ POWER6_PME_PM_DPU_HELD ] = { .pme_name = "PM_DPU_HELD", .pme_code = 0x200004, .pme_short_desc = "DISP unit held", .pme_long_desc = "DISP unit held", }, [ POWER6_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x200014, .pme_short_desc = "PMC1 Overflow", .pme_long_desc = "PMC1 Overflow", }, [ POWER6_PME_PM_THRD_PRIO_6_CYC ] = { .pme_name = "PM_THRD_PRIO_6_CYC", .pme_code = 0x222046, .pme_short_desc = "Cycles thread running at priority level 6", .pme_long_desc = "Cycles thread running at priority level 6", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L3MISS ] = { .pme_name = "PM_MRK_PTEG_FROM_L3MISS", .pme_code = 0x312054, .pme_short_desc = "Marked PTEG loaded from L3 miss", .pme_long_desc = "Marked PTEG loaded from L3 miss", }, [ POWER6_PME_PM_MRK_LSU0_REJECT_UST ] = { .pme_name = "PM_MRK_LSU0_REJECT_UST", .pme_code = 0x930e2, .pme_short_desc = "LSU0 marked unaligned store reject", .pme_long_desc = "LSU0 marked unaligned store reject", }, [ POWER6_PME_PM_MRK_INST_DISP ] = { .pme_name = "PM_MRK_INST_DISP", .pme_code = 0x10001a, .pme_short_desc = "Marked instruction dispatched", .pme_long_desc = "Marked instruction dispatched", }, [ POWER6_PME_PM_LARX ] = { .pme_name = "PM_LARX", .pme_code = 0x830ea, .pme_short_desc = "Larx executed", .pme_long_desc = "Larx executed", }, [ POWER6_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x2, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of PPC instructions completed. ", }, [ POWER6_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x100050, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle", }, [ POWER6_PME_PM_MRK_DATA_FROM_DL2L3_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_MOD", .pme_code = 0x40304c, .pme_short_desc = "Marked data loaded from distant L2 or L3 modified", .pme_long_desc = "Marked data loaded from distant L2 or L3 modified", }, [ POWER6_PME_PM_L2_LD_REQ_DATA ] = { .pme_name = "PM_L2_LD_REQ_DATA", .pme_code = 0x150430, .pme_short_desc = "L2 data load requests", .pme_long_desc = "L2 data load requests", }, [ POWER6_PME_PM_LSU_DERAT_MISS_CYC ] = { .pme_name = "PM_LSU_DERAT_MISS_CYC", .pme_code = 0x1000fc, .pme_short_desc = "DERAT miss latency", .pme_long_desc = "DERAT miss latency", }, [ POWER6_PME_PM_DPU_HELD_POWER_COUNT ] = { .pme_name = "PM_DPU_HELD_POWER_COUNT", .pme_code = 0x20003d, .pme_short_desc = "Periods DISP unit held due to Power Management", .pme_long_desc = "Periods DISP unit held due to Power Management", }, [ POWER6_PME_PM_INST_FROM_RL2L3_MOD ] = { .pme_name = "PM_INST_FROM_RL2L3_MOD", .pme_code = 0x142044, .pme_short_desc = "Instruction fetched from remote L2 or L3 modified", .pme_long_desc = "Instruction fetched from remote L2 or L3 modified", }, [ POWER6_PME_PM_DATA_FROM_DMEM_CYC ] = { .pme_name = "PM_DATA_FROM_DMEM_CYC", .pme_code = 0x20002e, .pme_short_desc = "Load latency from distant memory", .pme_long_desc = "Load latency from distant memory", }, [ POWER6_PME_PM_DATA_FROM_DMEM ] = { .pme_name = "PM_DATA_FROM_DMEM", .pme_code = 0x20005e, .pme_short_desc = "Data loaded from distant memory", .pme_long_desc = "Data loaded from distant memory", }, [ POWER6_PME_PM_LSU_REJECT_PARTIAL_SECTOR ] = { .pme_name = "PM_LSU_REJECT_PARTIAL_SECTOR", .pme_code = 0x1a0032, .pme_short_desc = "LSU reject due to partial sector valid", .pme_long_desc = "LSU reject due to partial sector valid", }, [ POWER6_PME_PM_LSU_REJECT_DERAT_MPRED ] = { .pme_name = "PM_LSU_REJECT_DERAT_MPRED", .pme_code = 0x2a0030, .pme_short_desc = "LSU reject due to mispredicted DERAT", .pme_long_desc = "LSU reject due to mispredicted DERAT", }, [ POWER6_PME_PM_LSU1_REJECT_ULD ] = { .pme_name = "PM_LSU1_REJECT_ULD", .pme_code = 0x90088, .pme_short_desc = "LSU1 unaligned load reject", .pme_long_desc = "LSU1 unaligned load reject", }, [ POWER6_PME_PM_DATA_FROM_L3_CYC ] = { .pme_name = "PM_DATA_FROM_L3_CYC", .pme_code = 0x200022, .pme_short_desc = "Load latency from L3", .pme_long_desc = "Load latency from L3", }, [ POWER6_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x400050, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy", }, [ POWER6_PME_PM_INST_FROM_MEM_DP ] = { .pme_name = "PM_INST_FROM_MEM_DP", .pme_code = 0x142042, .pme_short_desc = "Instruction fetched from double pump memory", .pme_long_desc = "Instruction fetched from double pump memory", }, [ POWER6_PME_PM_LSU_FLUSH_DSI ] = { .pme_name = "PM_LSU_FLUSH_DSI", .pme_code = 0x220ce, .pme_short_desc = "Flush caused by DSI", .pme_long_desc = "Flush caused by DSI", }, [ POWER6_PME_PM_MRK_DERAT_REF_16G ] = { .pme_name = "PM_MRK_DERAT_REF_16G", .pme_code = 0x482044, .pme_short_desc = "Marked DERAT reference for 16G page", .pme_long_desc = "Marked DERAT reference for 16G page", }, [ POWER6_PME_PM_LSU_LDF_BOTH ] = { .pme_name = "PM_LSU_LDF_BOTH", .pme_code = 0x180038, .pme_short_desc = "Both LSU units executed Floating Point load instruction", .pme_long_desc = "Both LSU units executed Floating Point load instruction", }, [ POWER6_PME_PM_FPU1_1FLOP ] = { .pme_name = "PM_FPU1_1FLOP", .pme_code = 0xc0088, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ POWER6_PME_PM_DATA_FROM_RMEM_CYC ] = { .pme_name = "PM_DATA_FROM_RMEM_CYC", .pme_code = 0x40002c, .pme_short_desc = "Load latency from remote memory", .pme_long_desc = "Load latency from remote memory", }, [ POWER6_PME_PM_INST_PTEG_SECONDARY ] = { .pme_name = "PM_INST_PTEG_SECONDARY", .pme_code = 0x910ac, .pme_short_desc = "Instruction table walk matched in secondary PTEG", .pme_long_desc = "Instruction table walk matched in secondary PTEG", }, [ POWER6_PME_PM_L1_ICACHE_MISS ] = { .pme_name = "PM_L1_ICACHE_MISS", .pme_code = 0x100056, .pme_short_desc = "L1 I cache miss count", .pme_long_desc = "L1 I cache miss count", }, [ POWER6_PME_PM_INST_DISP_LLA ] = { .pme_name = "PM_INST_DISP_LLA", .pme_code = 0x310a2, .pme_short_desc = "Instruction dispatched under load look ahead", .pme_long_desc = "Instruction dispatched under load look ahead", }, [ POWER6_PME_PM_THRD_BOTH_RUN_CYC ] = { .pme_name = "PM_THRD_BOTH_RUN_CYC", .pme_code = 0x400004, .pme_short_desc = "Both threads in run cycles", .pme_long_desc = "Both threads in run cycles", }, [ POWER6_PME_PM_LSU_ST_CHAINED ] = { .pme_name = "PM_LSU_ST_CHAINED", .pme_code = 0x820ce, .pme_short_desc = "number of chained stores", .pme_long_desc = "number of chained stores", }, [ POWER6_PME_PM_FPU1_FXDIV ] = { .pme_name = "PM_FPU1_FXDIV", .pme_code = 0xc10a8, .pme_short_desc = "FPU1 executed fixed point division", .pme_long_desc = "FPU1 executed fixed point division", }, [ POWER6_PME_PM_FREQ_UP ] = { .pme_name = "PM_FREQ_UP", .pme_code = 0x40003c, .pme_short_desc = "Frequency is being slewed up due to Power Management", .pme_long_desc = "Frequency is being slewed up due to Power Management", }, [ POWER6_PME_PM_FAB_RETRY_SYS_PUMP ] = { .pme_name = "PM_FAB_RETRY_SYS_PUMP", .pme_code = 0x50182, .pme_short_desc = "Retry of a system pump, locally mastered ", .pme_long_desc = "Retry of a system pump, locally mastered ", }, [ POWER6_PME_PM_DATA_FROM_LMEM ] = { .pme_name = "PM_DATA_FROM_LMEM", .pme_code = 0x40005e, .pme_short_desc = "Data loaded from local memory", .pme_long_desc = "Data loaded from local memory", }, [ POWER6_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x400014, .pme_short_desc = "PMC3 Overflow", .pme_long_desc = "PMC3 Overflow", }, [ POWER6_PME_PM_LSU0_REJECT_SET_MPRED ] = { .pme_name = "PM_LSU0_REJECT_SET_MPRED", .pme_code = 0xa0084, .pme_short_desc = "LSU0 reject due to mispredicted set", .pme_long_desc = "LSU0 reject due to mispredicted set", }, [ POWER6_PME_PM_LSU0_REJECT_DERAT_MPRED ] = { .pme_name = "PM_LSU0_REJECT_DERAT_MPRED", .pme_code = 0xa0082, .pme_short_desc = "LSU0 reject due to mispredicted DERAT", .pme_long_desc = "LSU0 reject due to mispredicted DERAT", }, [ POWER6_PME_PM_LSU1_REJECT_STQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_STQ_FULL", .pme_code = 0xa0088, .pme_short_desc = "LSU1 reject due to store queue full", .pme_long_desc = "LSU1 reject due to store queue full", }, [ POWER6_PME_PM_MRK_BR_MPRED ] = { .pme_name = "PM_MRK_BR_MPRED", .pme_code = 0x300052, .pme_short_desc = "Marked branch mispredicted", .pme_long_desc = "Marked branch mispredicted", }, [ POWER6_PME_PM_L2SA_ST_MISS ] = { .pme_name = "PM_L2SA_ST_MISS", .pme_code = 0x50486, .pme_short_desc = "L2 slice A store misses", .pme_long_desc = "L2 slice A store misses", }, [ POWER6_PME_PM_LSU0_REJECT_EXTERN ] = { .pme_name = "PM_LSU0_REJECT_EXTERN", .pme_code = 0xa10a4, .pme_short_desc = "LSU0 external reject request ", .pme_long_desc = "LSU0 external reject request ", }, [ POWER6_PME_PM_MRK_BR_TAKEN ] = { .pme_name = "PM_MRK_BR_TAKEN", .pme_code = 0x100052, .pme_short_desc = "Marked branch taken", .pme_long_desc = "Marked branch taken", }, [ POWER6_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x830e0, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ POWER6_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x1e, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ POWER6_PME_PM_FPU_FXDIV ] = { .pme_name = "PM_FPU_FXDIV", .pme_code = 0x1c1034, .pme_short_desc = "FPU executed fixed point division", .pme_long_desc = "FPU executed fixed point division", }, [ POWER6_PME_PM_DPU_HELD_LLA_END ] = { .pme_name = "PM_DPU_HELD_LLA_END", .pme_code = 0x30084, .pme_short_desc = "DISP unit held due to load look ahead ended", .pme_long_desc = "DISP unit held due to load look ahead ended", }, [ POWER6_PME_PM_MEM0_DP_CL_WR_LOC ] = { .pme_name = "PM_MEM0_DP_CL_WR_LOC", .pme_code = 0x50286, .pme_short_desc = "cacheline write setting dp to local side 0", .pme_long_desc = "cacheline write setting dp to local side 0", }, [ POWER6_PME_PM_MRK_LSU_REJECT_ULD ] = { .pme_name = "PM_MRK_LSU_REJECT_ULD", .pme_code = 0x193034, .pme_short_desc = "Marked unaligned load reject", .pme_long_desc = "Marked unaligned load reject", }, [ POWER6_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x100004, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER6_PME_PM_PTEG_FROM_DMEM ] = { .pme_name = "PM_PTEG_FROM_DMEM", .pme_code = 0x21304a, .pme_short_desc = "PTEG loaded from distant memory", .pme_long_desc = "PTEG loaded from distant memory", }, [ POWER6_PME_PM_DPU_WT_BR_MPRED_COUNT ] = { .pme_name = "PM_DPU_WT_BR_MPRED_COUNT", .pme_code = 0x40000d, .pme_short_desc = "Periods DISP unit is stalled due to branch misprediction", .pme_long_desc = "Periods DISP unit is stalled due to branch misprediction", }, [ POWER6_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x40086, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The ISU sends a signal indicating the gct is full. ", }, [ POWER6_PME_PM_INST_FROM_L25_SHR ] = { .pme_name = "PM_INST_FROM_L25_SHR", .pme_code = 0x442046, .pme_short_desc = "Instruction fetched from L2.5 shared", .pme_long_desc = "Instruction fetched from L2.5 shared", }, [ POWER6_PME_PM_MRK_DERAT_MISS_4K ] = { .pme_name = "PM_MRK_DERAT_MISS_4K", .pme_code = 0x292044, .pme_short_desc = "Marked DERAT misses for 4K page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 4K page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x810a2, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated", }, [ POWER6_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0xd0088, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "fp1 finished, produced a result. This only indicates finish, not completion. ", }, [ POWER6_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x410ac, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "branch miss predict due to a target address prediction. This signal will be asserted each time the branch execution unit detects an incorrect target address prediction. This signal will be asserted after a valid branch execution unit issue and will cause a branch mispredict flush unless a flush is detected from an older instruction.", }, [ POWER6_PME_PM_DPU_HELD_POWER ] = { .pme_name = "PM_DPU_HELD_POWER", .pme_code = 0x20003c, .pme_short_desc = "DISP unit held due to Power Management", .pme_long_desc = "DISP unit held due to Power Management", }, [ POWER6_PME_PM_RUN_INST_CMPL ] = { .pme_name = "PM_RUN_INST_CMPL", .pme_code = 0x500009, .pme_short_desc = "Run instructions completed", .pme_long_desc = "Number of run instructions completed. ", }, [ POWER6_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x1000f8, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ POWER6_PME_PM_LLA_COUNT ] = { .pme_name = "PM_LLA_COUNT", .pme_code = 0xc01f, .pme_short_desc = "Transitions into Load Look Ahead mode", .pme_long_desc = "Transitions into Load Look Ahead mode", }, [ POWER6_PME_PM_LSU0_REJECT_NO_SCRATCH ] = { .pme_name = "PM_LSU0_REJECT_NO_SCRATCH", .pme_code = 0xa10a2, .pme_short_desc = "LSU0 reject due to scratch register not available", .pme_long_desc = "LSU0 reject due to scratch register not available", }, [ POWER6_PME_PM_DPU_WT_IC_MISS ] = { .pme_name = "PM_DPU_WT_IC_MISS", .pme_code = 0x20000c, .pme_short_desc = "Cycles DISP unit is stalled due to I cache miss", .pme_long_desc = "Cycles DISP unit is stalled due to I cache miss", }, [ POWER6_PME_PM_DATA_FROM_L3MISS ] = { .pme_name = "PM_DATA_FROM_L3MISS", .pme_code = 0x3000fe, .pme_short_desc = "Data loaded from private L3 miss", .pme_long_desc = "Data loaded from private L3 miss", }, [ POWER6_PME_PM_FPU_FPSCR ] = { .pme_name = "PM_FPU_FPSCR", .pme_code = 0x2d0032, .pme_short_desc = "FPU executed FPSCR instruction", .pme_long_desc = "FPU executed FPSCR instruction", }, [ POWER6_PME_PM_VMX1_INST_ISSUED ] = { .pme_name = "PM_VMX1_INST_ISSUED", .pme_code = 0x60088, .pme_short_desc = "VMX1 instruction issued", .pme_long_desc = "VMX1 instruction issued", }, [ POWER6_PME_PM_FLUSH ] = { .pme_name = "PM_FLUSH", .pme_code = 0x100010, .pme_short_desc = "Flushes", .pme_long_desc = "Flushes", }, [ POWER6_PME_PM_ST_HIT_L2 ] = { .pme_name = "PM_ST_HIT_L2", .pme_code = 0x150732, .pme_short_desc = "L2 D cache store hits", .pme_long_desc = "L2 D cache store hits", }, [ POWER6_PME_PM_SYNC_CYC ] = { .pme_name = "PM_SYNC_CYC", .pme_code = 0x920cc, .pme_short_desc = "Sync duration", .pme_long_desc = "Sync duration", }, [ POWER6_PME_PM_FAB_SYS_PUMP ] = { .pme_name = "PM_FAB_SYS_PUMP", .pme_code = 0x50180, .pme_short_desc = "System pump operation, locally mastered", .pme_long_desc = "System pump operation, locally mastered", }, [ POWER6_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x4008c, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "Asserted when a non-canceled prefetch is made to the cache interface unit (CIU).", }, [ POWER6_PME_PM_MEM0_DP_RQ_GLOB_LOC ] = { .pme_name = "PM_MEM0_DP_RQ_GLOB_LOC", .pme_code = 0x50280, .pme_short_desc = "Memory read queue marking cache line double pump state from global to local side 0", .pme_long_desc = "Memory read queue marking cache line double pump state from global to local side 0", }, [ POWER6_PME_PM_FPU_ISSUE_0 ] = { .pme_name = "PM_FPU_ISSUE_0", .pme_code = 0x320c6, .pme_short_desc = "FPU issue 0 per cycle", .pme_long_desc = "FPU issue 0 per cycle", }, [ POWER6_PME_PM_THRD_PRIO_2_CYC ] = { .pme_name = "PM_THRD_PRIO_2_CYC", .pme_code = 0x322040, .pme_short_desc = "Cycles thread running at priority level 2", .pme_long_desc = "Cycles thread running at priority level 2", }, [ POWER6_PME_PM_VMX_SIMPLE_ISSUED ] = { .pme_name = "PM_VMX_SIMPLE_ISSUED", .pme_code = 0x70082, .pme_short_desc = "VMX instruction issued to simple", .pme_long_desc = "VMX instruction issued to simple", }, [ POWER6_PME_PM_MRK_FPU1_FIN ] = { .pme_name = "PM_MRK_FPU1_FIN", .pme_code = 0xd008a, .pme_short_desc = "Marked instruction FPU1 processing finished", .pme_long_desc = "Marked instruction FPU1 processing finished", }, [ POWER6_PME_PM_DPU_HELD_CW ] = { .pme_name = "PM_DPU_HELD_CW", .pme_code = 0x20084, .pme_short_desc = "DISP unit held due to cache writes ", .pme_long_desc = "DISP unit held due to cache writes ", }, [ POWER6_PME_PM_L3SA_REF ] = { .pme_name = "PM_L3SA_REF", .pme_code = 0x50080, .pme_short_desc = "L3 slice A references", .pme_long_desc = "L3 slice A references", }, [ POWER6_PME_PM_STCX ] = { .pme_name = "PM_STCX", .pme_code = 0x830e6, .pme_short_desc = "STCX executed", .pme_long_desc = "STCX executed", }, [ POWER6_PME_PM_L2SB_MISS ] = { .pme_name = "PM_L2SB_MISS", .pme_code = 0x5058c, .pme_short_desc = "L2 slice B misses", .pme_long_desc = "L2 slice B misses", }, [ POWER6_PME_PM_LSU0_REJECT ] = { .pme_name = "PM_LSU0_REJECT", .pme_code = 0xa10a6, .pme_short_desc = "LSU0 reject", .pme_long_desc = "LSU0 reject", }, [ POWER6_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x100026, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ POWER6_PME_PM_THERMAL_MAX ] = { .pme_name = "PM_THERMAL_MAX", .pme_code = 0x30002a, .pme_short_desc = "Processor in thermal MAX", .pme_long_desc = "Processor in thermal MAX", }, [ POWER6_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0xc10a4, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a store instruction.", }, [ POWER6_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0xc008a, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER6_PME_PM_LSU1_REJECT_LHS ] = { .pme_name = "PM_LSU1_REJECT_LHS", .pme_code = 0x9008e, .pme_short_desc = "LSU1 load hit store reject", .pme_long_desc = "LSU1 load hit store reject", }, [ POWER6_PME_PM_DPU_HELD_INT ] = { .pme_name = "PM_DPU_HELD_INT", .pme_code = 0x310a8, .pme_short_desc = "DISP unit held due to exception", .pme_long_desc = "DISP unit held due to exception", }, [ POWER6_PME_PM_THRD_LLA_BOTH_CYC ] = { .pme_name = "PM_THRD_LLA_BOTH_CYC", .pme_code = 0x400008, .pme_short_desc = "Both threads in Load Look Ahead", .pme_long_desc = "Both threads in Load Look Ahead", }, [ POWER6_PME_PM_DPU_HELD_THERMAL_COUNT ] = { .pme_name = "PM_DPU_HELD_THERMAL_COUNT", .pme_code = 0x10002b, .pme_short_desc = "Periods DISP unit held due to thermal condition", .pme_long_desc = "Periods DISP unit held due to thermal condition", }, [ POWER6_PME_PM_PMC4_REWIND ] = { .pme_name = "PM_PMC4_REWIND", .pme_code = 0x100020, .pme_short_desc = "PMC4 rewind event", .pme_long_desc = "PMC4 rewind event", }, [ POWER6_PME_PM_DERAT_REF_16M ] = { .pme_name = "PM_DERAT_REF_16M", .pme_code = 0x382070, .pme_short_desc = "DERAT reference for 16M page", .pme_long_desc = "DERAT reference for 16M page", }, [ POWER6_PME_PM_FPU0_FCONV ] = { .pme_name = "PM_FPU0_FCONV", .pme_code = 0xd10a0, .pme_short_desc = "FPU0 executed FCONV instruction", .pme_long_desc = "FPU0 executed FCONV instruction", }, [ POWER6_PME_PM_L2SA_LD_REQ_DATA ] = { .pme_name = "PM_L2SA_LD_REQ_DATA", .pme_code = 0x50480, .pme_short_desc = "L2 slice A data load requests", .pme_long_desc = "L2 slice A data load requests", }, [ POWER6_PME_PM_DATA_FROM_MEM_DP ] = { .pme_name = "PM_DATA_FROM_MEM_DP", .pme_code = 0x10005e, .pme_short_desc = "Data loaded from double pump memory", .pme_long_desc = "Data loaded from double pump memory", }, [ POWER6_PME_PM_MRK_VMX_FLOAT_ISSUED ] = { .pme_name = "PM_MRK_VMX_FLOAT_ISSUED", .pme_code = 0x70088, .pme_short_desc = "Marked VMX instruction issued to float", .pme_long_desc = "Marked VMX instruction issued to float", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L2MISS ] = { .pme_name = "PM_MRK_PTEG_FROM_L2MISS", .pme_code = 0x412054, .pme_short_desc = "Marked PTEG loaded from L2 miss", .pme_long_desc = "Marked PTEG loaded from L2 miss", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_1or2_CYC", .pme_code = 0x223040, .pme_short_desc = "Cycles thread priority difference is 1 or 2", .pme_long_desc = "Cycles thread priority difference is 1 or 2", }, [ POWER6_PME_PM_VMX0_STALL ] = { .pme_name = "PM_VMX0_STALL", .pme_code = 0xb0084, .pme_short_desc = "VMX0 stall", .pme_long_desc = "VMX0 stall", }, [ POWER6_PME_PM_IC_DEMAND_L2_BHT_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BHT_REDIRECT", .pme_code = 0x420ca, .pme_short_desc = "L2 I cache demand request due to BHT redirect", .pme_long_desc = "L2 I cache demand request due to BHT redirect", }, [ POWER6_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x20000e, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total DERAT Misses (Unit 0 + Unit 1). Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ POWER6_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0xc10a6, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing single precision instruction.", }, [ POWER6_PME_PM_FPU_ISSUE_STEERING ] = { .pme_name = "PM_FPU_ISSUE_STEERING", .pme_code = 0x320c4, .pme_short_desc = "FPU issue steering", .pme_long_desc = "FPU issue steering", }, [ POWER6_PME_PM_THRD_PRIO_1_CYC ] = { .pme_name = "PM_THRD_PRIO_1_CYC", .pme_code = 0x222040, .pme_short_desc = "Cycles thread running at priority level 1", .pme_long_desc = "Cycles thread running at priority level 1", }, [ POWER6_PME_PM_VMX_COMPLEX_ISSUED ] = { .pme_name = "PM_VMX_COMPLEX_ISSUED", .pme_code = 0x70084, .pme_short_desc = "VMX instruction issued to complex", .pme_long_desc = "VMX instruction issued to complex", }, [ POWER6_PME_PM_FPU_ISSUE_ST_FOLDED ] = { .pme_name = "PM_FPU_ISSUE_ST_FOLDED", .pme_code = 0x320c2, .pme_short_desc = "FPU issue a folded store", .pme_long_desc = "FPU issue a folded store", }, [ POWER6_PME_PM_DFU_FIN ] = { .pme_name = "PM_DFU_FIN", .pme_code = 0xe0080, .pme_short_desc = "DFU instruction finish", .pme_long_desc = "DFU instruction finish", }, [ POWER6_PME_PM_BR_PRED_CCACHE ] = { .pme_name = "PM_BR_PRED_CCACHE", .pme_code = 0x410a4, .pme_short_desc = "Branch count cache prediction", .pme_long_desc = "Branch count cache prediction", }, [ POWER6_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x300006, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ POWER6_PME_PM_FAB_MMIO ] = { .pme_name = "PM_FAB_MMIO", .pme_code = 0x50186, .pme_short_desc = "MMIO operation, locally mastered", .pme_long_desc = "MMIO operation, locally mastered", }, [ POWER6_PME_PM_MRK_VMX_SIMPLE_ISSUED ] = { .pme_name = "PM_MRK_VMX_SIMPLE_ISSUED", .pme_code = 0x7008a, .pme_short_desc = "Marked VMX instruction issued to simple", .pme_long_desc = "Marked VMX instruction issued to simple", }, [ POWER6_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x3c1030, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU is executing a store instruction. Combined Unit 0 + Unit 1", }, [ POWER6_PME_PM_MEM1_DP_CL_WR_GLOB ] = { .pme_name = "PM_MEM1_DP_CL_WR_GLOB", .pme_code = 0x5028c, .pme_short_desc = "cacheline write setting dp to global side 1", .pme_long_desc = "cacheline write setting dp to global side 1", }, [ POWER6_PME_PM_MRK_DATA_FROM_L3MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L3MISS", .pme_code = 0x303028, .pme_short_desc = "Marked data loaded from L3 miss", .pme_long_desc = "Marked data loaded from L3 miss", }, [ POWER6_PME_PM_GCT_NOSLOT_CYC ] = { .pme_name = "PM_GCT_NOSLOT_CYC", .pme_code = 0x100008, .pme_short_desc = "Cycles no GCT slot allocated", .pme_long_desc = "Cycles this thread does not have any slots allocated in the GCT.", }, [ POWER6_PME_PM_L2_ST_REQ_DATA ] = { .pme_name = "PM_L2_ST_REQ_DATA", .pme_code = 0x250432, .pme_short_desc = "L2 data store requests", .pme_long_desc = "L2 data store requests", }, [ POWER6_PME_PM_INST_TABLEWALK_COUNT ] = { .pme_name = "PM_INST_TABLEWALK_COUNT", .pme_code = 0x920cb, .pme_short_desc = "Periods doing instruction tablewalks", .pme_long_desc = "Periods doing instruction tablewalks", }, [ POWER6_PME_PM_PTEG_FROM_L35_SHR ] = { .pme_name = "PM_PTEG_FROM_L35_SHR", .pme_code = 0x21304e, .pme_short_desc = "PTEG loaded from L3.5 shared", .pme_long_desc = "PTEG loaded from L3.5 shared", }, [ POWER6_PME_PM_DPU_HELD_ISYNC ] = { .pme_name = "PM_DPU_HELD_ISYNC", .pme_code = 0x2008a, .pme_short_desc = "DISP unit held due to ISYNC ", .pme_long_desc = "DISP unit held due to ISYNC ", }, [ POWER6_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x40304e, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a marked demand load", }, [ POWER6_PME_PM_L3SA_HIT ] = { .pme_name = "PM_L3SA_HIT", .pme_code = 0x50082, .pme_short_desc = "L3 slice A hits", .pme_long_desc = "L3 slice A hits", }, [ POWER6_PME_PM_DERAT_MISS_16G ] = { .pme_name = "PM_DERAT_MISS_16G", .pme_code = 0x492070, .pme_short_desc = "DERAT misses for 16G page", .pme_long_desc = "A data request (load or store) missed the ERAT for 16G page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_DATA_PTEG_2ND_HALF ] = { .pme_name = "PM_DATA_PTEG_2ND_HALF", .pme_code = 0x910a2, .pme_short_desc = "Data table walk matched in second half pri­mary PTEG", .pme_long_desc = "Data table walk matched in second half pri­mary PTEG", }, [ POWER6_PME_PM_L2SA_ST_REQ ] = { .pme_name = "PM_L2SA_ST_REQ", .pme_code = 0x50484, .pme_short_desc = "L2 slice A store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A,B,and C.", }, [ POWER6_PME_PM_INST_FROM_LMEM ] = { .pme_name = "PM_INST_FROM_LMEM", .pme_code = 0x442042, .pme_short_desc = "Instruction fetched from local memory", .pme_long_desc = "Instruction fetched from local memory", }, [ POWER6_PME_PM_IC_DEMAND_L2_BR_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BR_REDIRECT", .pme_code = 0x420cc, .pme_short_desc = "L2 I cache demand request due to branch redirect", .pme_long_desc = "L2 I cache demand request due to branch redirect", }, [ POWER6_PME_PM_PTEG_FROM_L2 ] = { .pme_name = "PM_PTEG_FROM_L2", .pme_code = 0x113048, .pme_short_desc = "PTEG loaded from L2", .pme_long_desc = "PTEG loaded from L2", }, [ POWER6_PME_PM_DATA_PTEG_1ST_HALF ] = { .pme_name = "PM_DATA_PTEG_1ST_HALF", .pme_code = 0x910a0, .pme_short_desc = "Data table walk matched in first half primary PTEG", .pme_long_desc = "Data table walk matched in first half primary PTEG", }, [ POWER6_PME_PM_BR_MPRED_COUNT ] = { .pme_name = "PM_BR_MPRED_COUNT", .pme_code = 0x410aa, .pme_short_desc = "Branch misprediction due to count prediction", .pme_long_desc = "Branch misprediction due to count prediction", }, [ POWER6_PME_PM_IERAT_MISS_4K ] = { .pme_name = "PM_IERAT_MISS_4K", .pme_code = 0x492076, .pme_short_desc = "IERAT misses for 4K page", .pme_long_desc = "IERAT misses for 4K page", }, [ POWER6_PME_PM_THRD_BOTH_RUN_COUNT ] = { .pme_name = "PM_THRD_BOTH_RUN_COUNT", .pme_code = 0x400005, .pme_short_desc = "Periods both threads in run cycles", .pme_long_desc = "Periods both threads in run cycles", }, [ POWER6_PME_PM_LSU_REJECT_ULD ] = { .pme_name = "PM_LSU_REJECT_ULD", .pme_code = 0x190030, .pme_short_desc = "Unaligned load reject", .pme_long_desc = "Unaligned load reject", }, [ POWER6_PME_PM_DATA_FROM_DL2L3_MOD_CYC ] = { .pme_name = "PM_DATA_FROM_DL2L3_MOD_CYC", .pme_code = 0x40002a, .pme_short_desc = "Load latency from distant L2 or L3 modified", .pme_long_desc = "Load latency from distant L2 or L3 modified", }, [ POWER6_PME_PM_MRK_PTEG_FROM_RL2L3_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_RL2L3_MOD", .pme_code = 0x112044, .pme_short_desc = "Marked PTEG loaded from remote L2 or L3 modified", .pme_long_desc = "Marked PTEG loaded from remote L2 or L3 modified", }, [ POWER6_PME_PM_FPU0_FLOP ] = { .pme_name = "PM_FPU0_FLOP", .pme_code = 0xc0086, .pme_short_desc = "FPU0 executed 1FLOP, FMA, FSQRT or FDIV instruction", .pme_long_desc = "FPU0 executed 1FLOP, FMA, FSQRT or FDIV instruction", }, [ POWER6_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0xd10a6, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ POWER6_PME_PM_MRK_LSU0_REJECT_LHS ] = { .pme_name = "PM_MRK_LSU0_REJECT_LHS", .pme_code = 0x930e6, .pme_short_desc = "LSU0 marked load hit store reject", .pme_long_desc = "LSU0 marked load hit store reject", }, [ POWER6_PME_PM_VMX_RESULT_SAT_1 ] = { .pme_name = "PM_VMX_RESULT_SAT_1", .pme_code = 0xb0086, .pme_short_desc = "VMX valid result with sat=1", .pme_long_desc = "VMX valid result with sat=1", }, [ POWER6_PME_PM_NO_ITAG_CYC ] = { .pme_name = "PM_NO_ITAG_CYC", .pme_code = 0x40088, .pme_short_desc = "Cyles no ITAG available", .pme_long_desc = "Cyles no ITAG available", }, [ POWER6_PME_PM_LSU1_REJECT_NO_SCRATCH ] = { .pme_name = "PM_LSU1_REJECT_NO_SCRATCH", .pme_code = 0xa10aa, .pme_short_desc = "LSU1 reject due to scratch register not available", .pme_long_desc = "LSU1 reject due to scratch register not available", }, [ POWER6_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x40080, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ POWER6_PME_PM_DPU_WT_BR_MPRED ] = { .pme_name = "PM_DPU_WT_BR_MPRED", .pme_code = 0x40000c, .pme_short_desc = "Cycles DISP unit is stalled due to branch misprediction", .pme_long_desc = "Cycles DISP unit is stalled due to branch misprediction", }, [ POWER6_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0x810a4, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ POWER6_PME_PM_VMX_FLOAT_MULTICYCLE ] = { .pme_name = "PM_VMX_FLOAT_MULTICYCLE", .pme_code = 0xb0082, .pme_short_desc = "VMX multi-cycle floating point instruction issued", .pme_long_desc = "VMX multi-cycle floating point instruction issued", }, [ POWER6_PME_PM_DATA_FROM_L25_SHR_CYC ] = { .pme_name = "PM_DATA_FROM_L25_SHR_CYC", .pme_code = 0x200024, .pme_short_desc = "Load latency from L2.5 shared", .pme_long_desc = "Load latency from L2.5 shared", }, [ POWER6_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x300058, .pme_short_desc = "Data loaded from L3", .pme_long_desc = "DL1 was reloaded from the local L3 due to a demand load", }, [ POWER6_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x300014, .pme_short_desc = "PMC2 Overflow", .pme_long_desc = "PMC2 Overflow", }, [ POWER6_PME_PM_VMX0_LD_WRBACK ] = { .pme_name = "PM_VMX0_LD_WRBACK", .pme_code = 0x60084, .pme_short_desc = "VMX0 load writeback valid", .pme_long_desc = "VMX0 load writeback valid", }, [ POWER6_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0xc10a2, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ POWER6_PME_PM_INST_FETCH_CYC ] = { .pme_name = "PM_INST_FETCH_CYC", .pme_code = 0x420c8, .pme_short_desc = "Cycles at least 1 instruction fetched", .pme_long_desc = "Asserted each cycle when the IFU sends at least one instruction to the IDU. ", }, [ POWER6_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x280032, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction", }, [ POWER6_PME_PM_LSU_REJECT_L2_CORR ] = { .pme_name = "PM_LSU_REJECT_L2_CORR", .pme_code = 0x1a1034, .pme_short_desc = "LSU reject due to L2 correctable error", .pme_long_desc = "LSU reject due to L2 correctable error", }, [ POWER6_PME_PM_DERAT_REF_64K ] = { .pme_name = "PM_DERAT_REF_64K", .pme_code = 0x282070, .pme_short_desc = "DERAT reference for 64K page", .pme_long_desc = "DERAT reference for 64K page", }, [ POWER6_PME_PM_THRD_PRIO_3_CYC ] = { .pme_name = "PM_THRD_PRIO_3_CYC", .pme_code = 0x422040, .pme_short_desc = "Cycles thread running at priority level 3", .pme_long_desc = "Cycles thread running at priority level 3", }, [ POWER6_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x2c0030, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ POWER6_PME_PM_INST_FROM_L35_MOD ] = { .pme_name = "PM_INST_FROM_L35_MOD", .pme_code = 0x142046, .pme_short_desc = "Instruction fetched from L3.5 modified", .pme_long_desc = "Instruction fetched from L3.5 modified", }, [ POWER6_PME_PM_DFU_CONV ] = { .pme_name = "PM_DFU_CONV", .pme_code = 0xe008e, .pme_short_desc = "DFU convert from fixed op", .pme_long_desc = "DFU convert from fixed op", }, [ POWER6_PME_PM_INST_FROM_L25_MOD ] = { .pme_name = "PM_INST_FROM_L25_MOD", .pme_code = 0x342046, .pme_short_desc = "Instruction fetched from L2.5 modified", .pme_long_desc = "Instruction fetched from L2.5 modified", }, [ POWER6_PME_PM_PTEG_FROM_L35_MOD ] = { .pme_name = "PM_PTEG_FROM_L35_MOD", .pme_code = 0x11304e, .pme_short_desc = "PTEG loaded from L3.5 modified", .pme_long_desc = "PTEG loaded from L3.5 modified", }, [ POWER6_PME_PM_MRK_VMX_ST_ISSUED ] = { .pme_name = "PM_MRK_VMX_ST_ISSUED", .pme_code = 0xb0088, .pme_short_desc = "Marked VMX store issued", .pme_long_desc = "Marked VMX store issued", }, [ POWER6_PME_PM_VMX_FLOAT_ISSUED ] = { .pme_name = "PM_VMX_FLOAT_ISSUED", .pme_code = 0x70080, .pme_short_desc = "VMX instruction issued to float", .pme_long_desc = "VMX instruction issued to float", }, [ POWER6_PME_PM_LSU0_REJECT_L2_CORR ] = { .pme_name = "PM_LSU0_REJECT_L2_CORR", .pme_code = 0xa10a0, .pme_short_desc = "LSU0 reject due to L2 correctable error", .pme_long_desc = "LSU0 reject due to L2 correctable error", }, [ POWER6_PME_PM_THRD_L2MISS ] = { .pme_name = "PM_THRD_L2MISS", .pme_code = 0x310a0, .pme_short_desc = "Thread in L2 miss", .pme_long_desc = "Thread in L2 miss", }, [ POWER6_PME_PM_FPU_FCONV ] = { .pme_name = "PM_FPU_FCONV", .pme_code = 0x1d1034, .pme_short_desc = "FPU executed FCONV instruction", .pme_long_desc = "FPU executed FCONV instruction", }, [ POWER6_PME_PM_FPU_FXMULT ] = { .pme_name = "PM_FPU_FXMULT", .pme_code = 0x1d0032, .pme_short_desc = "FPU executed fixed point multiplication", .pme_long_desc = "FPU executed fixed point multiplication", }, [ POWER6_PME_PM_FPU1_FRSP ] = { .pme_name = "PM_FPU1_FRSP", .pme_code = 0xd10aa, .pme_short_desc = "FPU1 executed FRSP instruction", .pme_long_desc = "FPU1 executed FRSP instruction", }, [ POWER6_PME_PM_MRK_DERAT_REF_16M ] = { .pme_name = "PM_MRK_DERAT_REF_16M", .pme_code = 0x382044, .pme_short_desc = "Marked DERAT reference for 16M page", .pme_long_desc = "Marked DERAT reference for 16M page", }, [ POWER6_PME_PM_L2SB_CASTOUT_SHR ] = { .pme_name = "PM_L2SB_CASTOUT_SHR", .pme_code = 0x5068a, .pme_short_desc = "L2 slice B castouts - Shared", .pme_long_desc = "L2 slice B castouts - Shared", }, [ POWER6_PME_PM_THRD_ONE_RUN_COUNT ] = { .pme_name = "PM_THRD_ONE_RUN_COUNT", .pme_code = 0x1000fb, .pme_short_desc = "Periods one of the threads in run cycles", .pme_long_desc = "Periods one of the threads in run cycles", }, [ POWER6_PME_PM_INST_FROM_RMEM ] = { .pme_name = "PM_INST_FROM_RMEM", .pme_code = 0x342042, .pme_short_desc = "Instruction fetched from remote memory", .pme_long_desc = "Instruction fetched from remote memory", }, [ POWER6_PME_PM_LSU_BOTH_BUS ] = { .pme_name = "PM_LSU_BOTH_BUS", .pme_code = 0x810aa, .pme_short_desc = "Both data return buses busy simultaneously", .pme_long_desc = "Both data return buses busy simultaneously", }, [ POWER6_PME_PM_FPU1_FSQRT_FDIV ] = { .pme_name = "PM_FPU1_FSQRT_FDIV", .pme_code = 0xc008c, .pme_short_desc = "FPU1 executed FSQRT or FDIV instruction", .pme_long_desc = "FPU1 executed FSQRT or FDIV instruction", }, [ POWER6_PME_PM_L2_LD_REQ_INST ] = { .pme_name = "PM_L2_LD_REQ_INST", .pme_code = 0x150530, .pme_short_desc = "L2 instruction load requests", .pme_long_desc = "L2 instruction load requests", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L35_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_L35_SHR", .pme_code = 0x212046, .pme_short_desc = "Marked PTEG loaded from L3.5 shared", .pme_long_desc = "Marked PTEG loaded from L3.5 shared", }, [ POWER6_PME_PM_BR_PRED_CR ] = { .pme_name = "PM_BR_PRED_CR", .pme_code = 0x410a2, .pme_short_desc = "A conditional branch was predicted, CR prediction", .pme_long_desc = "A conditional branch was predicted, CR prediction", }, [ POWER6_PME_PM_MRK_LSU0_REJECT_ULD ] = { .pme_name = "PM_MRK_LSU0_REJECT_ULD", .pme_code = 0x930e0, .pme_short_desc = "LSU0 marked unaligned load reject", .pme_long_desc = "LSU0 marked unaligned load reject", }, [ POWER6_PME_PM_LSU_REJECT ] = { .pme_name = "PM_LSU_REJECT", .pme_code = 0x4a1030, .pme_short_desc = "LSU reject", .pme_long_desc = "LSU reject", }, [ POWER6_PME_PM_LSU_REJECT_LHS_BOTH ] = { .pme_name = "PM_LSU_REJECT_LHS_BOTH", .pme_code = 0x290038, .pme_short_desc = "Load hit store reject both units", .pme_long_desc = "Load hit store reject both units", }, [ POWER6_PME_PM_GXO_ADDR_CYC_BUSY ] = { .pme_name = "PM_GXO_ADDR_CYC_BUSY", .pme_code = 0x50382, .pme_short_desc = "Outbound GX address utilization (# of cycles address out is valid)", .pme_long_desc = "Outbound GX address utilization (# of cycles address out is valid)", }, [ POWER6_PME_PM_LSU_SRQ_EMPTY_COUNT ] = { .pme_name = "PM_LSU_SRQ_EMPTY_COUNT", .pme_code = 0x40001d, .pme_short_desc = "Periods SRQ empty", .pme_long_desc = "The Store Request Queue is empty", }, [ POWER6_PME_PM_PTEG_FROM_L3 ] = { .pme_name = "PM_PTEG_FROM_L3", .pme_code = 0x313048, .pme_short_desc = "PTEG loaded from L3", .pme_long_desc = "PTEG loaded from L3", }, [ POWER6_PME_PM_VMX0_LD_ISSUED ] = { .pme_name = "PM_VMX0_LD_ISSUED", .pme_code = 0x60082, .pme_short_desc = "VMX0 load issued", .pme_long_desc = "VMX0 load issued", }, [ POWER6_PME_PM_FXU_PIPELINED_MULT_DIV ] = { .pme_name = "PM_FXU_PIPELINED_MULT_DIV", .pme_code = 0x210ae, .pme_short_desc = "Fix point multiply/divide pipelined", .pme_long_desc = "Fix point multiply/divide pipelined", }, [ POWER6_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0xc10ac, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a store instruction.", }, [ POWER6_PME_PM_DFU_ADD ] = { .pme_name = "PM_DFU_ADD", .pme_code = 0xe008c, .pme_short_desc = "DFU add type instruction", .pme_long_desc = "DFU add type instruction", }, [ POWER6_PME_PM_MEM_DP_CL_WR_GLOB ] = { .pme_name = "PM_MEM_DP_CL_WR_GLOB", .pme_code = 0x250232, .pme_short_desc = "cache line write setting double pump state to global", .pme_long_desc = "cache line write setting double pump state to global", }, [ POWER6_PME_PM_MRK_LSU1_REJECT_ULD ] = { .pme_name = "PM_MRK_LSU1_REJECT_ULD", .pme_code = 0x930e8, .pme_short_desc = "LSU1 marked unaligned load reject", .pme_long_desc = "LSU1 marked unaligned load reject", }, [ POWER6_PME_PM_ITLB_REF ] = { .pme_name = "PM_ITLB_REF", .pme_code = 0x920c2, .pme_short_desc = "Instruction TLB reference", .pme_long_desc = "Instruction TLB reference", }, [ POWER6_PME_PM_LSU0_REJECT_L2MISS ] = { .pme_name = "PM_LSU0_REJECT_L2MISS", .pme_code = 0x90084, .pme_short_desc = "LSU0 L2 miss reject", .pme_long_desc = "LSU0 L2 miss reject", }, [ POWER6_PME_PM_DATA_FROM_L35_SHR ] = { .pme_name = "PM_DATA_FROM_L35_SHR", .pme_code = 0x20005a, .pme_short_desc = "Data loaded from L3.5 shared", .pme_long_desc = "Data loaded from L3.5 shared", }, [ POWER6_PME_PM_MRK_DATA_FROM_RL2L3_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_MOD", .pme_code = 0x10304c, .pme_short_desc = "Marked data loaded from remote L2 or L3 modified", .pme_long_desc = "Marked data loaded from remote L2 or L3 modified", }, [ POWER6_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0xd0084, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing fpscr move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*. mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs", }, [ POWER6_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x100058, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a demand load", }, [ POWER6_PME_PM_DPU_HELD_XER ] = { .pme_name = "PM_DPU_HELD_XER", .pme_code = 0x20088, .pme_short_desc = "DISP unit held due to XER dependency", .pme_long_desc = "DISP unit held due to XER dependency", }, [ POWER6_PME_PM_FAB_NODE_PUMP ] = { .pme_name = "PM_FAB_NODE_PUMP", .pme_code = 0x50188, .pme_short_desc = "Node pump operation, locally mastered", .pme_long_desc = "Node pump operation, locally mastered", }, [ POWER6_PME_PM_VMX_RESULT_SAT_0_1 ] = { .pme_name = "PM_VMX_RESULT_SAT_0_1", .pme_code = 0xb008e, .pme_short_desc = "VMX valid result with sat bit is set (0->1)", .pme_long_desc = "VMX valid result with sat bit is set (0->1)", }, [ POWER6_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x80082, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Total DL1 Load references", }, [ POWER6_PME_PM_TLB_REF ] = { .pme_name = "PM_TLB_REF", .pme_code = 0x920c8, .pme_short_desc = "TLB reference", .pme_long_desc = "TLB reference", }, [ POWER6_PME_PM_DC_PREF_OUT_OF_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_OF_STREAMS", .pme_code = 0x810a0, .pme_short_desc = "D cache out of streams", .pme_long_desc = "out of streams", }, [ POWER6_PME_PM_FLUSH_FPU ] = { .pme_name = "PM_FLUSH_FPU", .pme_code = 0x230ec, .pme_short_desc = "Flush caused by FPU exception", .pme_long_desc = "Flush caused by FPU exception", }, [ POWER6_PME_PM_MEM1_DP_CL_WR_LOC ] = { .pme_name = "PM_MEM1_DP_CL_WR_LOC", .pme_code = 0x5028e, .pme_short_desc = "cacheline write setting dp to local side 1", .pme_long_desc = "cacheline write setting dp to local side 1", }, [ POWER6_PME_PM_L2SB_LD_HIT ] = { .pme_name = "PM_L2SB_LD_HIT", .pme_code = 0x5078a, .pme_short_desc = "L2 slice B load hits", .pme_long_desc = "L2 slice B load hits", }, [ POWER6_PME_PM_FAB_DCLAIM ] = { .pme_name = "PM_FAB_DCLAIM", .pme_code = 0x50184, .pme_short_desc = "Dclaim operation, locally mastered", .pme_long_desc = "Dclaim operation, locally mastered", }, [ POWER6_PME_PM_MEM_DP_CL_WR_LOC ] = { .pme_name = "PM_MEM_DP_CL_WR_LOC", .pme_code = 0x150232, .pme_short_desc = "cache line write setting double pump state to local", .pme_long_desc = "cache line write setting double pump state to local", }, [ POWER6_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x410a8, .pme_short_desc = "Branch mispredictions due to CR bit setting", .pme_long_desc = "This signal is asserted when the branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This signal is asserted after a branch issue event and will result in a branch redirect flush if not overridden by a flush of an older instruction.", }, [ POWER6_PME_PM_LSU_REJECT_EXTERN ] = { .pme_name = "PM_LSU_REJECT_EXTERN", .pme_code = 0x3a1030, .pme_short_desc = "LSU external reject request ", .pme_long_desc = "LSU external reject request ", }, [ POWER6_PME_PM_DATA_FROM_RL2L3_MOD ] = { .pme_name = "PM_DATA_FROM_RL2L3_MOD", .pme_code = 0x10005c, .pme_short_desc = "Data loaded from remote L2 or L3 modified", .pme_long_desc = "Data loaded from remote L2 or L3 modified", }, [ POWER6_PME_PM_DPU_HELD_RU_WQ ] = { .pme_name = "PM_DPU_HELD_RU_WQ", .pme_code = 0x2008e, .pme_short_desc = "DISP unit held due to RU FXU write queue full", .pme_long_desc = "DISP unit held due to RU FXU write queue full", }, [ POWER6_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x80080, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Total DL1 Load references that miss the DL1", }, [ POWER6_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0x150632, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ POWER6_PME_PM_MRK_PTEG_FROM_RMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_RMEM", .pme_code = 0x312042, .pme_short_desc = "Marked PTEG loaded from remote memory", .pme_long_desc = "Marked PTEG loaded from remote memory", }, [ POWER6_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x1d0030, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result This only indicates finish, not completion. Combined Unit 0 + Unit 1", }, [ POWER6_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x300016, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result", }, [ POWER6_PME_PM_DPU_HELD_FPQ ] = { .pme_name = "PM_DPU_HELD_FPQ", .pme_code = 0x20086, .pme_short_desc = "DISP unit held due to FPU issue queue full", .pme_long_desc = "DISP unit held due to FPU issue queue full", }, [ POWER6_PME_PM_GX_DMA_READ ] = { .pme_name = "PM_GX_DMA_READ", .pme_code = 0x5038c, .pme_short_desc = "DMA Read Request", .pme_long_desc = "DMA Read Request", }, [ POWER6_PME_PM_LSU1_REJECT_PARTIAL_SECTOR ] = { .pme_name = "PM_LSU1_REJECT_PARTIAL_SECTOR", .pme_code = 0xa008e, .pme_short_desc = "LSU1 reject due to partial sector valid", .pme_long_desc = "LSU1 reject due to partial sector valid", }, [ POWER6_PME_PM_0INST_FETCH_COUNT ] = { .pme_name = "PM_0INST_FETCH_COUNT", .pme_code = 0x40081, .pme_short_desc = "Periods with no instructions fetched", .pme_long_desc = "No instructions were fetched this periods (due to IFU hold, redirect, or icache miss)", }, [ POWER6_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x100024, .pme_short_desc = "PMC5 Overflow", .pme_long_desc = "PMC5 Overflow", }, [ POWER6_PME_PM_L2SB_LD_REQ ] = { .pme_name = "PM_L2SB_LD_REQ", .pme_code = 0x50788, .pme_short_desc = "L2 slice B load requests ", .pme_long_desc = "L2 slice B load requests ", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_0_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_0_CYC", .pme_code = 0x123040, .pme_short_desc = "Cycles no thread priority difference", .pme_long_desc = "Cycles no thread priority difference", }, [ POWER6_PME_PM_DATA_FROM_RMEM ] = { .pme_name = "PM_DATA_FROM_RMEM", .pme_code = 0x30005e, .pme_short_desc = "Data loaded from remote memory", .pme_long_desc = "Data loaded from remote memory", }, [ POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_BOTH_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_BOTH_CYC", .pme_code = 0x30001c, .pme_short_desc = "Cycles both threads LMQ and SRQ empty", .pme_long_desc = "Cycles both threads LMQ and SRQ empty", }, [ POWER6_PME_PM_ST_REF_L1_BOTH ] = { .pme_name = "PM_ST_REF_L1_BOTH", .pme_code = 0x280038, .pme_short_desc = "Both units L1 D cache store reference", .pme_long_desc = "Both units L1 D cache store reference", }, [ POWER6_PME_PM_VMX_PERMUTE_ISSUED ] = { .pme_name = "PM_VMX_PERMUTE_ISSUED", .pme_code = 0x70086, .pme_short_desc = "VMX instruction issued to permute", .pme_long_desc = "VMX instruction issued to permute", }, [ POWER6_PME_PM_BR_TAKEN ] = { .pme_name = "PM_BR_TAKEN", .pme_code = 0x200052, .pme_short_desc = "Branches taken", .pme_long_desc = "Branches taken", }, [ POWER6_PME_PM_FAB_DMA ] = { .pme_name = "PM_FAB_DMA", .pme_code = 0x5018c, .pme_short_desc = "DMA operation, locally mastered", .pme_long_desc = "DMA operation, locally mastered", }, [ POWER6_PME_PM_GCT_EMPTY_COUNT ] = { .pme_name = "PM_GCT_EMPTY_COUNT", .pme_code = 0x200009, .pme_short_desc = "Periods GCT empty", .pme_long_desc = "The Global Completion Table is completely empty.", }, [ POWER6_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0xc10ae, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing single precision instruction.", }, [ POWER6_PME_PM_L2SA_CASTOUT_SHR ] = { .pme_name = "PM_L2SA_CASTOUT_SHR", .pme_code = 0x50682, .pme_short_desc = "L2 slice A castouts - Shared", .pme_long_desc = "L2 slice A castouts - Shared", }, [ POWER6_PME_PM_L3SB_REF ] = { .pme_name = "PM_L3SB_REF", .pme_code = 0x50088, .pme_short_desc = "L3 slice B references", .pme_long_desc = "L3 slice B references", }, [ POWER6_PME_PM_FPU0_FRSP ] = { .pme_name = "PM_FPU0_FRSP", .pme_code = 0xd10a2, .pme_short_desc = "FPU0 executed FRSP instruction", .pme_long_desc = "FPU0 executed FRSP instruction", }, [ POWER6_PME_PM_PMC4_SAVED ] = { .pme_name = "PM_PMC4_SAVED", .pme_code = 0x300022, .pme_short_desc = "PMC4 rewind value saved", .pme_long_desc = "PMC4 rewind value saved", }, [ POWER6_PME_PM_L2SA_DC_INV ] = { .pme_name = "PM_L2SA_DC_INV", .pme_code = 0x50686, .pme_short_desc = "L2 slice A D cache invalidate", .pme_long_desc = "L2 slice A D cache invalidate", }, [ POWER6_PME_PM_GXI_ADDR_CYC_BUSY ] = { .pme_name = "PM_GXI_ADDR_CYC_BUSY", .pme_code = 0x50388, .pme_short_desc = "Inbound GX address utilization (# of cycle address is in valid)", .pme_long_desc = "Inbound GX address utilization (# of cycle address is in valid)", }, [ POWER6_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0xc0082, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER6_PME_PM_SLB_MISS ] = { .pme_name = "PM_SLB_MISS", .pme_code = 0x183034, .pme_short_desc = "SLB misses", .pme_long_desc = "SLB misses", }, [ POWER6_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x200006, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ POWER6_PME_PM_DERAT_REF_4K ] = { .pme_name = "PM_DERAT_REF_4K", .pme_code = 0x182070, .pme_short_desc = "DERAT reference for 4K page", .pme_long_desc = "DERAT reference for 4K page", }, [ POWER6_PME_PM_L2_CASTOUT_SHR ] = { .pme_name = "PM_L2_CASTOUT_SHR", .pme_code = 0x250630, .pme_short_desc = "L2 castouts - Shared (T, Te, Si, S)", .pme_long_desc = "L2 castouts - Shared (T, Te, Si, S)", }, [ POWER6_PME_PM_DPU_HELD_STCX_CR ] = { .pme_name = "PM_DPU_HELD_STCX_CR", .pme_code = 0x2008c, .pme_short_desc = "DISP unit held due to STCX updating CR ", .pme_long_desc = "DISP unit held due to STCX updating CR ", }, [ POWER6_PME_PM_FPU0_ST_FOLDED ] = { .pme_name = "PM_FPU0_ST_FOLDED", .pme_code = 0xd10a4, .pme_short_desc = "FPU0 folded store", .pme_long_desc = "FPU0 folded store", }, [ POWER6_PME_PM_MRK_DATA_FROM_L21 ] = { .pme_name = "PM_MRK_DATA_FROM_L21", .pme_code = 0x203048, .pme_short_desc = "Marked data loaded from private L2 other core", .pme_long_desc = "Marked data loaded from private L2 other core", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus3or4_CYC", .pme_code = 0x323046, .pme_short_desc = "Cycles thread priority difference is -3 or -4", .pme_long_desc = "Cycles thread priority difference is -3 or -4", }, [ POWER6_PME_PM_DATA_FROM_L35_MOD ] = { .pme_name = "PM_DATA_FROM_L35_MOD", .pme_code = 0x10005a, .pme_short_desc = "Data loaded from L3.5 modified", .pme_long_desc = "Data loaded from L3.5 modified", }, [ POWER6_PME_PM_DATA_FROM_DL2L3_SHR ] = { .pme_name = "PM_DATA_FROM_DL2L3_SHR", .pme_code = 0x30005c, .pme_short_desc = "Data loaded from distant L2 or L3 shared", .pme_long_desc = "Data loaded from distant L2 or L3 shared", }, [ POWER6_PME_PM_GXI_DATA_CYC_BUSY ] = { .pme_name = "PM_GXI_DATA_CYC_BUSY", .pme_code = 0x5038a, .pme_short_desc = "Inbound GX Data utilization (# of cycle data in is valid)", .pme_long_desc = "Inbound GX Data utilization (# of cycle data in is valid)", }, [ POWER6_PME_PM_LSU_REJECT_STEAL ] = { .pme_name = "PM_LSU_REJECT_STEAL", .pme_code = 0x9008c, .pme_short_desc = "LSU reject due to steal", .pme_long_desc = "LSU reject due to steal", }, [ POWER6_PME_PM_ST_FIN ] = { .pme_name = "PM_ST_FIN", .pme_code = 0x100054, .pme_short_desc = "Store instructions finished", .pme_long_desc = "Store instructions finished", }, [ POWER6_PME_PM_DPU_HELD_CR_LOGICAL ] = { .pme_name = "PM_DPU_HELD_CR_LOGICAL", .pme_code = 0x3008e, .pme_short_desc = "DISP unit held due to CR, LR or CTR updated by CR logical, MTCRF, MTLR or MTCTR", .pme_long_desc = "DISP unit held due to CR, LR or CTR updated by CR logical, MTCRF, MTLR or MTCTR", }, [ POWER6_PME_PM_THRD_SEL_T0 ] = { .pme_name = "PM_THRD_SEL_T0", .pme_code = 0x310a6, .pme_short_desc = "Decode selected thread 0", .pme_long_desc = "Decode selected thread 0", }, [ POWER6_PME_PM_PTEG_RELOAD_VALID ] = { .pme_name = "PM_PTEG_RELOAD_VALID", .pme_code = 0x130e8, .pme_short_desc = "TLB reload valid", .pme_long_desc = "TLB reload valid", }, [ POWER6_PME_PM_L2_PREF_ST ] = { .pme_name = "PM_L2_PREF_ST", .pme_code = 0x810a8, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "L2 cache prefetches", }, [ POWER6_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x830e4, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ POWER6_PME_PM_LSU0_REJECT_LHS ] = { .pme_name = "PM_LSU0_REJECT_LHS", .pme_code = 0x90086, .pme_short_desc = "LSU0 load hit store reject", .pme_long_desc = "LSU0 load hit store reject", }, [ POWER6_PME_PM_DFU_EXP_EQ ] = { .pme_name = "PM_DFU_EXP_EQ", .pme_code = 0xe0084, .pme_short_desc = "DFU operand exponents are equal for add type", .pme_long_desc = "DFU operand exponents are equal for add type", }, [ POWER6_PME_PM_DPU_HELD_FP_FX_MULT ] = { .pme_name = "PM_DPU_HELD_FP_FX_MULT", .pme_code = 0x210a8, .pme_short_desc = "DISP unit held due to non fixed multiple/divide after fixed multiply/divide", .pme_long_desc = "DISP unit held due to non fixed multiple/divide after fixed multiply/divide", }, [ POWER6_PME_PM_L2_LD_MISS_DATA ] = { .pme_name = "PM_L2_LD_MISS_DATA", .pme_code = 0x250430, .pme_short_desc = "L2 data load misses", .pme_long_desc = "L2 data load misses", }, [ POWER6_PME_PM_DATA_FROM_L35_MOD_CYC ] = { .pme_name = "PM_DATA_FROM_L35_MOD_CYC", .pme_code = 0x400026, .pme_short_desc = "Load latency from L3.5 modified", .pme_long_desc = "Load latency from L3.5 modified", }, [ POWER6_PME_PM_FLUSH_FXU ] = { .pme_name = "PM_FLUSH_FXU", .pme_code = 0x230ea, .pme_short_desc = "Flush caused by FXU exception", .pme_long_desc = "Flush caused by FXU exception", }, [ POWER6_PME_PM_FPU_ISSUE_1 ] = { .pme_name = "PM_FPU_ISSUE_1", .pme_code = 0x320c8, .pme_short_desc = "FPU issue 1 per cycle", .pme_long_desc = "FPU issue 1 per cycle", }, [ POWER6_PME_PM_DATA_FROM_LMEM_CYC ] = { .pme_name = "PM_DATA_FROM_LMEM_CYC", .pme_code = 0x20002c, .pme_short_desc = "Load latency from local memory", .pme_long_desc = "Load latency from local memory", }, [ POWER6_PME_PM_DPU_HELD_LSU_SOPS ] = { .pme_name = "PM_DPU_HELD_LSU_SOPS", .pme_code = 0x30080, .pme_short_desc = "DISP unit held due to LSU slow ops (sync, tlbie, stcx)", .pme_long_desc = "DISP unit held due to LSU slow ops (sync, tlbie, stcx)", }, [ POWER6_PME_PM_INST_PTEG_2ND_HALF ] = { .pme_name = "PM_INST_PTEG_2ND_HALF", .pme_code = 0x910aa, .pme_short_desc = "Instruction table walk matched in second half primary PTEG", .pme_long_desc = "Instruction table walk matched in second half primary PTEG", }, [ POWER6_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x300018, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ POWER6_PME_PM_LSU_REJECT_UST_BOTH ] = { .pme_name = "PM_LSU_REJECT_UST_BOTH", .pme_code = 0x190036, .pme_short_desc = "Unaligned store reject both units", .pme_long_desc = "Unaligned store reject both units", }, [ POWER6_PME_PM_LSU_REJECT_FAST ] = { .pme_name = "PM_LSU_REJECT_FAST", .pme_code = 0x30003e, .pme_short_desc = "LSU fast reject", .pme_long_desc = "LSU fast reject", }, [ POWER6_PME_PM_DPU_HELD_THRD_PRIO ] = { .pme_name = "PM_DPU_HELD_THRD_PRIO", .pme_code = 0x3008a, .pme_short_desc = "DISP unit held due to lower priority thread", .pme_long_desc = "DISP unit held due to lower priority thread", }, [ POWER6_PME_PM_L2_PREF_LD ] = { .pme_name = "PM_L2_PREF_LD", .pme_code = 0x810a6, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "L2 cache prefetches", }, [ POWER6_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x4d1030, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ POWER6_PME_PM_MRK_DATA_FROM_RMEM ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM", .pme_code = 0x30304a, .pme_short_desc = "Marked data loaded from remote memory", .pme_long_desc = "Marked data loaded from remote memory", }, [ POWER6_PME_PM_LD_MISS_L1_CYC ] = { .pme_name = "PM_LD_MISS_L1_CYC", .pme_code = 0x10000c, .pme_short_desc = "L1 data load miss cycles", .pme_long_desc = "L1 data load miss cycles", }, [ POWER6_PME_PM_DERAT_MISS_4K ] = { .pme_name = "PM_DERAT_MISS_4K", .pme_code = 0x192070, .pme_short_desc = "DERAT misses for 4K page", .pme_long_desc = "A data request (load or store) missed the ERAT for 4K page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_DPU_HELD_COMPLETION ] = { .pme_name = "PM_DPU_HELD_COMPLETION", .pme_code = 0x210ac, .pme_short_desc = "DISP unit held due to completion holding dispatch ", .pme_long_desc = "DISP unit held due to completion holding dispatch ", }, [ POWER6_PME_PM_FPU_ISSUE_STALL_ST ] = { .pme_name = "PM_FPU_ISSUE_STALL_ST", .pme_code = 0x320ce, .pme_short_desc = "FPU issue stalled due to store", .pme_long_desc = "FPU issue stalled due to store", }, [ POWER6_PME_PM_L2SB_DC_INV ] = { .pme_name = "PM_L2SB_DC_INV", .pme_code = 0x5068e, .pme_short_desc = "L2 slice B D cache invalidate", .pme_long_desc = "L2 slice B D cache invalidate", }, [ POWER6_PME_PM_PTEG_FROM_L25_SHR ] = { .pme_name = "PM_PTEG_FROM_L25_SHR", .pme_code = 0x41304e, .pme_short_desc = "PTEG loaded from L2.5 shared", .pme_long_desc = "PTEG loaded from L2.5 shared", }, [ POWER6_PME_PM_PTEG_FROM_DL2L3_MOD ] = { .pme_name = "PM_PTEG_FROM_DL2L3_MOD", .pme_code = 0x41304c, .pme_short_desc = "PTEG loaded from distant L2 or L3 modified", .pme_long_desc = "PTEG loaded from distant L2 or L3 modified", }, [ POWER6_PME_PM_FAB_CMD_RETRIED ] = { .pme_name = "PM_FAB_CMD_RETRIED", .pme_code = 0x250130, .pme_short_desc = "Fabric command retried", .pme_long_desc = "Fabric command retried", }, [ POWER6_PME_PM_BR_PRED_LSTACK ] = { .pme_name = "PM_BR_PRED_LSTACK", .pme_code = 0x410a6, .pme_short_desc = "A conditional branch was predicted, link stack", .pme_long_desc = "A conditional branch was predicted, link stack", }, [ POWER6_PME_PM_GXO_DATA_CYC_BUSY ] = { .pme_name = "PM_GXO_DATA_CYC_BUSY", .pme_code = 0x50384, .pme_short_desc = "Outbound GX Data utilization (# of cycles data out is valid)", .pme_long_desc = "Outbound GX Data utilization (# of cycles data out is valid)", }, [ POWER6_PME_PM_DFU_SUBNORM ] = { .pme_name = "PM_DFU_SUBNORM", .pme_code = 0xe0086, .pme_short_desc = "DFU result is a subnormal", .pme_long_desc = "DFU result is a subnormal", }, [ POWER6_PME_PM_FPU_ISSUE_OOO ] = { .pme_name = "PM_FPU_ISSUE_OOO", .pme_code = 0x320c0, .pme_short_desc = "FPU issue out-of-order", .pme_long_desc = "FPU issue out-of-order", }, [ POWER6_PME_PM_LSU_REJECT_ULD_BOTH ] = { .pme_name = "PM_LSU_REJECT_ULD_BOTH", .pme_code = 0x290036, .pme_short_desc = "Unaligned load reject both units", .pme_long_desc = "Unaligned load reject both units", }, [ POWER6_PME_PM_L2SB_ST_MISS ] = { .pme_name = "PM_L2SB_ST_MISS", .pme_code = 0x5048e, .pme_short_desc = "L2 slice B store misses", .pme_long_desc = "L2 slice B store misses", }, [ POWER6_PME_PM_DATA_FROM_L25_MOD_CYC ] = { .pme_name = "PM_DATA_FROM_L25_MOD_CYC", .pme_code = 0x400024, .pme_short_desc = "Load latency from L2.5 modified", .pme_long_desc = "Load latency from L2.5 modified", }, [ POWER6_PME_PM_INST_PTEG_1ST_HALF ] = { .pme_name = "PM_INST_PTEG_1ST_HALF", .pme_code = 0x910a8, .pme_short_desc = "Instruction table walk matched in first half primary PTEG", .pme_long_desc = "Instruction table walk matched in first half primary PTEG", }, [ POWER6_PME_PM_DERAT_MISS_16M ] = { .pme_name = "PM_DERAT_MISS_16M", .pme_code = 0x392070, .pme_short_desc = "DERAT misses for 16M page", .pme_long_desc = "A data request (load or store) missed the ERAT for 16M page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_GX_DMA_WRITE ] = { .pme_name = "PM_GX_DMA_WRITE", .pme_code = 0x5038e, .pme_short_desc = "All DMA Write Requests (including dma wrt lgcy)", .pme_long_desc = "All DMA Write Requests (including dma wrt lgcy)", }, [ POWER6_PME_PM_MRK_PTEG_FROM_DL2L3_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_DL2L3_MOD", .pme_code = 0x412044, .pme_short_desc = "Marked PTEG loaded from distant L2 or L3 modified", .pme_long_desc = "Marked PTEG loaded from distant L2 or L3 modified", }, [ POWER6_PME_PM_MEM1_DP_RQ_GLOB_LOC ] = { .pme_name = "PM_MEM1_DP_RQ_GLOB_LOC", .pme_code = 0x50288, .pme_short_desc = "Memory read queue marking cache line double pump state from global to local side 1", .pme_long_desc = "Memory read queue marking cache line double pump state from global to local side 1", }, [ POWER6_PME_PM_L2SB_LD_REQ_DATA ] = { .pme_name = "PM_L2SB_LD_REQ_DATA", .pme_code = 0x50488, .pme_short_desc = "L2 slice B data load requests", .pme_long_desc = "L2 slice B data load requests", }, [ POWER6_PME_PM_L2SA_LD_MISS_INST ] = { .pme_name = "PM_L2SA_LD_MISS_INST", .pme_code = 0x50582, .pme_short_desc = "L2 slice A instruction load misses", .pme_long_desc = "L2 slice A instruction load misses", }, [ POWER6_PME_PM_MRK_LSU0_REJECT_L2MISS ] = { .pme_name = "PM_MRK_LSU0_REJECT_L2MISS", .pme_code = 0x930e4, .pme_short_desc = "LSU0 marked L2 miss reject", .pme_long_desc = "LSU0 marked L2 miss reject", }, [ POWER6_PME_PM_MRK_IFU_FIN ] = { .pme_name = "PM_MRK_IFU_FIN", .pme_code = 0x20000a, .pme_short_desc = "Marked instruction IFU processing finished", .pme_long_desc = "Marked instruction IFU processing finished", }, [ POWER6_PME_PM_INST_FROM_L3 ] = { .pme_name = "PM_INST_FROM_L3", .pme_code = 0x342040, .pme_short_desc = "Instruction fetched from L3", .pme_long_desc = "An instruction fetch group was fetched from L3. Fetch Groups can contain up to 8 instructions", }, [ POWER6_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x400016, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result", }, [ POWER6_PME_PM_THRD_PRIO_4_CYC ] = { .pme_name = "PM_THRD_PRIO_4_CYC", .pme_code = 0x422046, .pme_short_desc = "Cycles thread running at priority level 4", .pme_long_desc = "Cycles thread running at priority level 4", }, [ POWER6_PME_PM_MRK_DATA_FROM_L35_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L35_MOD", .pme_code = 0x10304e, .pme_short_desc = "Marked data loaded from L3.5 modified", .pme_long_desc = "Marked data loaded from L3.5 modified", }, [ POWER6_PME_PM_LSU_REJECT_SET_MPRED ] = { .pme_name = "PM_LSU_REJECT_SET_MPRED", .pme_code = 0x2a0032, .pme_short_desc = "LSU reject due to mispredicted set", .pme_long_desc = "LSU reject due to mispredicted set", }, [ POWER6_PME_PM_MRK_DERAT_MISS_16G ] = { .pme_name = "PM_MRK_DERAT_MISS_16G", .pme_code = 0x492044, .pme_short_desc = "Marked DERAT misses for 16G page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 16G page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_FPU0_FXDIV ] = { .pme_name = "PM_FPU0_FXDIV", .pme_code = 0xc10a0, .pme_short_desc = "FPU0 executed fixed point division", .pme_long_desc = "FPU0 executed fixed point division", }, [ POWER6_PME_PM_MRK_LSU1_REJECT_UST ] = { .pme_name = "PM_MRK_LSU1_REJECT_UST", .pme_code = 0x930ea, .pme_short_desc = "LSU1 marked unaligned store reject", .pme_long_desc = "LSU1 marked unaligned store reject", }, [ POWER6_PME_PM_FPU_ISSUE_DIV_SQRT_OVERLAP ] = { .pme_name = "PM_FPU_ISSUE_DIV_SQRT_OVERLAP", .pme_code = 0x320cc, .pme_short_desc = "FPU divide/sqrt overlapped with other divide/sqrt", .pme_long_desc = "FPU divide/sqrt overlapped with other divide/sqrt", }, [ POWER6_PME_PM_INST_FROM_L35_SHR ] = { .pme_name = "PM_INST_FROM_L35_SHR", .pme_code = 0x242046, .pme_short_desc = "Instruction fetched from L3.5 shared", .pme_long_desc = "Instruction fetched from L3.5 shared", }, [ POWER6_PME_PM_MRK_LSU_REJECT_LHS ] = { .pme_name = "PM_MRK_LSU_REJECT_LHS", .pme_code = 0x493030, .pme_short_desc = "Marked load hit store reject", .pme_long_desc = "Marked load hit store reject", }, [ POWER6_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0x810ac, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The LMQ was full", }, [ POWER6_PME_PM_SYNC_COUNT ] = { .pme_name = "PM_SYNC_COUNT", .pme_code = 0x920cd, .pme_short_desc = "SYNC instructions completed", .pme_long_desc = "SYNC instructions completed", }, [ POWER6_PME_PM_MEM0_DP_RQ_LOC_GLOB ] = { .pme_name = "PM_MEM0_DP_RQ_LOC_GLOB", .pme_code = 0x50282, .pme_short_desc = "Memory read queue marking cache line double pump state from local to global side 0", .pme_long_desc = "Memory read queue marking cache line double pump state from local to global side 0", }, [ POWER6_PME_PM_L2SA_CASTOUT_MOD ] = { .pme_name = "PM_L2SA_CASTOUT_MOD", .pme_code = 0x50680, .pme_short_desc = "L2 slice A castouts - Modified", .pme_long_desc = "L2 slice A castouts - Modified", }, [ POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_BOTH_COUNT ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_BOTH_COUNT", .pme_code = 0x30001d, .pme_short_desc = "Periods both threads LMQ and SRQ empty", .pme_long_desc = "Periods both threads LMQ and SRQ empty", }, [ POWER6_PME_PM_PTEG_FROM_MEM_DP ] = { .pme_name = "PM_PTEG_FROM_MEM_DP", .pme_code = 0x11304a, .pme_short_desc = "PTEG loaded from double pump memory", .pme_long_desc = "PTEG loaded from double pump memory", }, [ POWER6_PME_PM_LSU_REJECT_SLOW ] = { .pme_name = "PM_LSU_REJECT_SLOW", .pme_code = 0x20003e, .pme_short_desc = "LSU slow reject", .pme_long_desc = "LSU slow reject", }, [ POWER6_PME_PM_PTEG_FROM_L25_MOD ] = { .pme_name = "PM_PTEG_FROM_L25_MOD", .pme_code = 0x31304e, .pme_short_desc = "PTEG loaded from L2.5 modified", .pme_long_desc = "PTEG loaded from L2.5 modified", }, [ POWER6_PME_PM_THRD_PRIO_7_CYC ] = { .pme_name = "PM_THRD_PRIO_7_CYC", .pme_code = 0x122046, .pme_short_desc = "Cycles thread running at priority level 7", .pme_long_desc = "Cycles thread running at priority level 7", }, [ POWER6_PME_PM_MRK_PTEG_FROM_RL2L3_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_RL2L3_SHR", .pme_code = 0x212044, .pme_short_desc = "Marked PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "Marked PTEG loaded from remote L2 or L3 shared", }, [ POWER6_PME_PM_ST_REQ_L2 ] = { .pme_name = "PM_ST_REQ_L2", .pme_code = 0x250732, .pme_short_desc = "L2 store requests", .pme_long_desc = "L2 store requests", }, [ POWER6_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x80086, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Total DL1 Store references", }, [ POWER6_PME_PM_FPU_ISSUE_STALL_THRD ] = { .pme_name = "PM_FPU_ISSUE_STALL_THRD", .pme_code = 0x330e0, .pme_short_desc = "FPU issue stalled due to thread resource conflict", .pme_long_desc = "FPU issue stalled due to thread resource conflict", }, [ POWER6_PME_PM_RUN_COUNT ] = { .pme_name = "PM_RUN_COUNT", .pme_code = 0x10000b, .pme_short_desc = "Run Periods", .pme_long_desc = "Processor Periods gated by the run latch", }, [ POWER6_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x10000a, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch", }, [ POWER6_PME_PM_PTEG_FROM_RMEM ] = { .pme_name = "PM_PTEG_FROM_RMEM", .pme_code = 0x31304a, .pme_short_desc = "PTEG loaded from remote memory", .pme_long_desc = "PTEG loaded from remote memory", }, [ POWER6_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0x80084, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 0", }, [ POWER6_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0x80088, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache", }, [ POWER6_PME_PM_INST_FROM_DL2L3_SHR ] = { .pme_name = "PM_INST_FROM_DL2L3_SHR", .pme_code = 0x342044, .pme_short_desc = "Instruction fetched from distant L2 or L3 shared", .pme_long_desc = "Instruction fetched from distant L2 or L3 shared", }, [ POWER6_PME_PM_L2SA_IC_INV ] = { .pme_name = "PM_L2SA_IC_INV", .pme_code = 0x50684, .pme_short_desc = "L2 slice A I cache invalidate", .pme_long_desc = "L2 slice A I cache invalidate", }, [ POWER6_PME_PM_THRD_ONE_RUN_CYC ] = { .pme_name = "PM_THRD_ONE_RUN_CYC", .pme_code = 0x100016, .pme_short_desc = "One of the threads in run cycles", .pme_long_desc = "One of the threads in run cycles", }, [ POWER6_PME_PM_L2SB_LD_REQ_INST ] = { .pme_name = "PM_L2SB_LD_REQ_INST", .pme_code = 0x50588, .pme_short_desc = "L2 slice B instruction load requests", .pme_long_desc = "L2 slice B instruction load requests", }, [ POWER6_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x30304e, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a marked demand load", }, [ POWER6_PME_PM_DPU_HELD_XTHRD ] = { .pme_name = "PM_DPU_HELD_XTHRD", .pme_code = 0x30082, .pme_short_desc = "DISP unit held due to cross thread resource conflicts", .pme_long_desc = "DISP unit held due to cross thread resource conflicts", }, [ POWER6_PME_PM_L2SB_ST_REQ ] = { .pme_name = "PM_L2SB_ST_REQ", .pme_code = 0x5048c, .pme_short_desc = "L2 slice B store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A,B,and C.", }, [ POWER6_PME_PM_INST_FROM_L21 ] = { .pme_name = "PM_INST_FROM_L21", .pme_code = 0x242040, .pme_short_desc = "Instruction fetched from private L2 other core", .pme_long_desc = "Instruction fetched from private L2 other core", }, [ POWER6_PME_PM_INST_FROM_L3MISS ] = { .pme_name = "PM_INST_FROM_L3MISS", .pme_code = 0x342054, .pme_short_desc = "Instruction fetched missed L3", .pme_long_desc = "Instruction fetched missed L3", }, [ POWER6_PME_PM_L3SB_HIT ] = { .pme_name = "PM_L3SB_HIT", .pme_code = 0x5008a, .pme_short_desc = "L3 slice B hits", .pme_long_desc = "L3 slice B hits", }, [ POWER6_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x230ee, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles MSR(EE) bit off and external interrupt pending", }, [ POWER6_PME_PM_INST_FROM_DL2L3_MOD ] = { .pme_name = "PM_INST_FROM_DL2L3_MOD", .pme_code = 0x442044, .pme_short_desc = "Instruction fetched from distant L2 or L3 modified", .pme_long_desc = "Instruction fetched from distant L2 or L3 modified", }, [ POWER6_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x300024, .pme_short_desc = "PMC6 Overflow", .pme_long_desc = "PMC6 Overflow", }, [ POWER6_PME_PM_FPU_FLOP ] = { .pme_name = "PM_FPU_FLOP", .pme_code = 0x1c0032, .pme_short_desc = "FPU executed 1FLOP, FMA, FSQRT or FDIV instruction", .pme_long_desc = "FPU executed 1FLOP, FMA, FSQRT or FDIV instruction", }, [ POWER6_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x200050, .pme_short_desc = "FXU busy", .pme_long_desc = "FXU0 and FXU1 are both busy", }, [ POWER6_PME_PM_FPU1_FLOP ] = { .pme_name = "PM_FPU1_FLOP", .pme_code = 0xc008e, .pme_short_desc = "FPU1 executed 1FLOP, FMA, FSQRT or FDIV instruction", .pme_long_desc = "FPU1 executed 1FLOP, FMA, FSQRT or FDIV instruction", }, [ POWER6_PME_PM_IC_RELOAD_SHR ] = { .pme_name = "PM_IC_RELOAD_SHR", .pme_code = 0x4008e, .pme_short_desc = "I cache line reloading to be shared by threads", .pme_long_desc = "I cache line reloading to be shared by threads", }, [ POWER6_PME_PM_INST_TABLEWALK_CYC ] = { .pme_name = "PM_INST_TABLEWALK_CYC", .pme_code = 0x920ca, .pme_short_desc = "Cycles doing instruction tablewalks", .pme_long_desc = "Cycles doing instruction tablewalks", }, [ POWER6_PME_PM_DATA_FROM_RL2L3_MOD_CYC ] = { .pme_name = "PM_DATA_FROM_RL2L3_MOD_CYC", .pme_code = 0x400028, .pme_short_desc = "Load latency from remote L2 or L3 modified", .pme_long_desc = "Load latency from remote L2 or L3 modified", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_5or6_CYC", .pme_code = 0x423040, .pme_short_desc = "Cycles thread priority difference is 5 or 6", .pme_long_desc = "Cycles thread priority difference is 5 or 6", }, [ POWER6_PME_PM_IBUF_FULL_CYC ] = { .pme_name = "PM_IBUF_FULL_CYC", .pme_code = 0x40084, .pme_short_desc = "Cycles instruction buffer full", .pme_long_desc = "Cycles instruction buffer full", }, [ POWER6_PME_PM_L2SA_LD_REQ ] = { .pme_name = "PM_L2SA_LD_REQ", .pme_code = 0x50780, .pme_short_desc = "L2 slice A load requests ", .pme_long_desc = "L2 slice A load requests ", }, [ POWER6_PME_PM_VMX1_LD_WRBACK ] = { .pme_name = "PM_VMX1_LD_WRBACK", .pme_code = 0x6008c, .pme_short_desc = "VMX1 load writeback valid", .pme_long_desc = "VMX1 load writeback valid", }, [ POWER6_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x2d0030, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER6_PME_PM_THRD_PRIO_5_CYC ] = { .pme_name = "PM_THRD_PRIO_5_CYC", .pme_code = 0x322046, .pme_short_desc = "Cycles thread running at priority level 5", .pme_long_desc = "Cycles thread running at priority level 5", }, [ POWER6_PME_PM_DFU_BACK2BACK ] = { .pme_name = "PM_DFU_BACK2BACK", .pme_code = 0xe0082, .pme_short_desc = "DFU back to back operations executed", .pme_long_desc = "DFU back to back operations executed", }, [ POWER6_PME_PM_MRK_DATA_FROM_LMEM ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM", .pme_code = 0x40304a, .pme_short_desc = "Marked data loaded from local memory", .pme_long_desc = "Marked data loaded from local memory", }, [ POWER6_PME_PM_LSU_REJECT_LHS ] = { .pme_name = "PM_LSU_REJECT_LHS", .pme_code = 0x190032, .pme_short_desc = "Load hit store reject", .pme_long_desc = "Load hit store reject", }, [ POWER6_PME_PM_DPU_HELD_SPR ] = { .pme_name = "PM_DPU_HELD_SPR", .pme_code = 0x3008c, .pme_short_desc = "DISP unit held due to MTSPR/MFSPR", .pme_long_desc = "DISP unit held due to MTSPR/MFSPR", }, [ POWER6_PME_PM_FREQ_DOWN ] = { .pme_name = "PM_FREQ_DOWN", .pme_code = 0x30003c, .pme_short_desc = "Frequency is being slewed down due to Power Management", .pme_long_desc = "Frequency is being slewed down due to Power Management", }, [ POWER6_PME_PM_DFU_ENC_BCD_DPD ] = { .pme_name = "PM_DFU_ENC_BCD_DPD", .pme_code = 0xe008a, .pme_short_desc = "DFU Encode BCD to DPD", .pme_long_desc = "DFU Encode BCD to DPD", }, [ POWER6_PME_PM_DPU_HELD_GPR ] = { .pme_name = "PM_DPU_HELD_GPR", .pme_code = 0x20080, .pme_short_desc = "DISP unit held due to GPR dependencies", .pme_long_desc = "DISP unit held due to GPR dependencies", }, [ POWER6_PME_PM_LSU0_NCST ] = { .pme_name = "PM_LSU0_NCST", .pme_code = 0x820cc, .pme_short_desc = "LSU0 non-cachable stores", .pme_long_desc = "LSU0 non-cachable stores", }, [ POWER6_PME_PM_MRK_INST_ISSUED ] = { .pme_name = "PM_MRK_INST_ISSUED", .pme_code = 0x10001c, .pme_short_desc = "Marked instruction issued", .pme_long_desc = "Marked instruction issued", }, [ POWER6_PME_PM_INST_FROM_RL2L3_SHR ] = { .pme_name = "PM_INST_FROM_RL2L3_SHR", .pme_code = 0x242044, .pme_short_desc = "Instruction fetched from remote L2 or L3 shared", .pme_long_desc = "Instruction fetched from remote L2 or L3 shared", }, [ POWER6_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x2c1034, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized. Combined Unit 0 + Unit 1", }, [ POWER6_PME_PM_PTEG_FROM_L3MISS ] = { .pme_name = "PM_PTEG_FROM_L3MISS", .pme_code = 0x313028, .pme_short_desc = "PTEG loaded from L3 miss", .pme_long_desc = "PTEG loaded from L3 miss", }, [ POWER6_PME_PM_RUN_PURR ] = { .pme_name = "PM_RUN_PURR", .pme_code = 0x4000f4, .pme_short_desc = "Run PURR Event", .pme_long_desc = "Run PURR Event", }, [ POWER6_PME_PM_MRK_VMX0_LD_WRBACK ] = { .pme_name = "PM_MRK_VMX0_LD_WRBACK", .pme_code = 0x60086, .pme_short_desc = "Marked VMX0 load writeback valid", .pme_long_desc = "Marked VMX0 load writeback valid", }, [ POWER6_PME_PM_L2_MISS ] = { .pme_name = "PM_L2_MISS", .pme_code = 0x250532, .pme_short_desc = "L2 cache misses", .pme_long_desc = "L2 cache misses", }, [ POWER6_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x303048, .pme_short_desc = "Marked data loaded from L3", .pme_long_desc = "DL1 was reloaded from the local L3 due to a marked demand load", }, [ POWER6_PME_PM_MRK_LSU1_REJECT_LHS ] = { .pme_name = "PM_MRK_LSU1_REJECT_LHS", .pme_code = 0x930ee, .pme_short_desc = "LSU1 marked load hit store reject", .pme_long_desc = "LSU1 marked load hit store reject", }, [ POWER6_PME_PM_L2SB_LD_MISS_INST ] = { .pme_name = "PM_L2SB_LD_MISS_INST", .pme_code = 0x5058a, .pme_short_desc = "L2 slice B instruction load misses", .pme_long_desc = "L2 slice B instruction load misses", }, [ POWER6_PME_PM_PTEG_FROM_RL2L3_SHR ] = { .pme_name = "PM_PTEG_FROM_RL2L3_SHR", .pme_code = 0x21304c, .pme_short_desc = "PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "PTEG loaded from remote L2 or L3 shared", }, [ POWER6_PME_PM_MRK_DERAT_MISS_64K ] = { .pme_name = "PM_MRK_DERAT_MISS_64K", .pme_code = 0x192044, .pme_short_desc = "Marked DERAT misses for 64K page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 64K page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_LWSYNC ] = { .pme_name = "PM_LWSYNC", .pme_code = 0x810ae, .pme_short_desc = "Isync instruction completed", .pme_long_desc = "Isync instruction completed", }, [ POWER6_PME_PM_FPU1_FXMULT ] = { .pme_name = "PM_FPU1_FXMULT", .pme_code = 0xd008e, .pme_short_desc = "FPU1 executed fixed point multiplication", .pme_long_desc = "FPU1 executed fixed point multiplication", }, [ POWER6_PME_PM_MEM0_DP_CL_WR_GLOB ] = { .pme_name = "PM_MEM0_DP_CL_WR_GLOB", .pme_code = 0x50284, .pme_short_desc = "cacheline write setting dp to global side 0", .pme_long_desc = "cacheline write setting dp to global side 0", }, [ POWER6_PME_PM_LSU0_REJECT_PARTIAL_SECTOR ] = { .pme_name = "PM_LSU0_REJECT_PARTIAL_SECTOR", .pme_code = 0xa0086, .pme_short_desc = "LSU0 reject due to partial sector valid", .pme_long_desc = "LSU0 reject due to partial sector valid", }, [ POWER6_PME_PM_INST_IMC_MATCH_CMPL ] = { .pme_name = "PM_INST_IMC_MATCH_CMPL", .pme_code = 0x1000f0, .pme_short_desc = "IMC matched instructions completed", .pme_long_desc = "Number of instructions resulting from the marked instructions expansion that completed.", }, [ POWER6_PME_PM_DPU_HELD_THERMAL ] = { .pme_name = "PM_DPU_HELD_THERMAL", .pme_code = 0x10002a, .pme_short_desc = "DISP unit held due to thermal condition", .pme_long_desc = "DISP unit held due to thermal condition", }, [ POWER6_PME_PM_FPU_FRSP ] = { .pme_name = "PM_FPU_FRSP", .pme_code = 0x2d1034, .pme_short_desc = "FPU executed FRSP instruction", .pme_long_desc = "FPU executed FRSP instruction", }, [ POWER6_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x30000a, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER6_PME_PM_MRK_PTEG_FROM_DL2L3_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_DL2L3_SHR", .pme_code = 0x312044, .pme_short_desc = "Marked PTEG loaded from distant L2 or L3 shared", .pme_long_desc = "Marked PTEG loaded from distant L2 or L3 shared", }, [ POWER6_PME_PM_MRK_DTLB_REF ] = { .pme_name = "PM_MRK_DTLB_REF", .pme_code = 0x920c0, .pme_short_desc = "Marked Data TLB reference", .pme_long_desc = "Marked Data TLB reference", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L25_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_L25_SHR", .pme_code = 0x412046, .pme_short_desc = "Marked PTEG loaded from L2.5 shared", .pme_long_desc = "Marked PTEG loaded from L2.5 shared", }, [ POWER6_PME_PM_DPU_HELD_LSU ] = { .pme_name = "PM_DPU_HELD_LSU", .pme_code = 0x210a2, .pme_short_desc = "DISP unit held due to LSU move or invalidate SLB and SR", .pme_long_desc = "DISP unit held due to LSU move or invalidate SLB and SR", }, [ POWER6_PME_PM_FPU_FSQRT_FDIV ] = { .pme_name = "PM_FPU_FSQRT_FDIV", .pme_code = 0x2c0032, .pme_short_desc = "FPU executed FSQRT or FDIV instruction", .pme_long_desc = "FPU executed FSQRT or FDIV instruction", }, [ POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_COUNT ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_COUNT", .pme_code = 0x20001d, .pme_short_desc = "Periods LMQ and SRQ empty", .pme_long_desc = "Periods when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER6_PME_PM_DATA_PTEG_SECONDARY ] = { .pme_name = "PM_DATA_PTEG_SECONDARY", .pme_code = 0x910a4, .pme_short_desc = "Data table walk matched in secondary PTEG", .pme_long_desc = "Data table walk matched in secondary PTEG", }, [ POWER6_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0xd10ae, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ POWER6_PME_PM_L2SA_LD_HIT ] = { .pme_name = "PM_L2SA_LD_HIT", .pme_code = 0x50782, .pme_short_desc = "L2 slice A load hits", .pme_long_desc = "L2 slice A load hits", }, [ POWER6_PME_PM_DATA_FROM_MEM_DP_CYC ] = { .pme_name = "PM_DATA_FROM_MEM_DP_CYC", .pme_code = 0x40002e, .pme_short_desc = "Load latency from double pump memory", .pme_long_desc = "Load latency from double pump memory", }, [ POWER6_PME_PM_BR_MPRED_CCACHE ] = { .pme_name = "PM_BR_MPRED_CCACHE", .pme_code = 0x410ae, .pme_short_desc = "Branch misprediction due to count cache prediction", .pme_long_desc = "Branch misprediction due to count cache prediction", }, [ POWER6_PME_PM_DPU_HELD_COUNT ] = { .pme_name = "PM_DPU_HELD_COUNT", .pme_code = 0x200005, .pme_short_desc = "Periods DISP unit held", .pme_long_desc = "Dispatch unit held", }, [ POWER6_PME_PM_LSU1_REJECT_SET_MPRED ] = { .pme_name = "PM_LSU1_REJECT_SET_MPRED", .pme_code = 0xa008c, .pme_short_desc = "LSU1 reject due to mispredicted set", .pme_long_desc = "LSU1 reject due to mispredicted set", }, [ POWER6_PME_PM_FPU_ISSUE_2 ] = { .pme_name = "PM_FPU_ISSUE_2", .pme_code = 0x320ca, .pme_short_desc = "FPU issue 2 per cycle", .pme_long_desc = "FPU issue 2 per cycle", }, [ POWER6_PME_PM_LSU1_REJECT_L2_CORR ] = { .pme_name = "PM_LSU1_REJECT_L2_CORR", .pme_code = 0xa10a8, .pme_short_desc = "LSU1 reject due to L2 correctable error", .pme_long_desc = "LSU1 reject due to L2 correctable error", }, [ POWER6_PME_PM_MRK_PTEG_FROM_DMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_DMEM", .pme_code = 0x212042, .pme_short_desc = "Marked PTEG loaded from distant memory", .pme_long_desc = "Marked PTEG loaded from distant memory", }, [ POWER6_PME_PM_MEM1_DP_RQ_LOC_GLOB ] = { .pme_name = "PM_MEM1_DP_RQ_LOC_GLOB", .pme_code = 0x5028a, .pme_short_desc = "Memory read queue marking cache line double pump state from local to global side 1", .pme_long_desc = "Memory read queue marking cache line double pump state from local to global side 1", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus1or2_CYC", .pme_code = 0x223046, .pme_short_desc = "Cycles thread priority difference is -1 or -2", .pme_long_desc = "Cycles thread priority difference is -1 or -2", }, [ POWER6_PME_PM_THRD_PRIO_0_CYC ] = { .pme_name = "PM_THRD_PRIO_0_CYC", .pme_code = 0x122040, .pme_short_desc = "Cycles thread running at priority level 0", .pme_long_desc = "Cycles thread running at priority level 0", }, [ POWER6_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x300050, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ POWER6_PME_PM_LSU1_REJECT_DERAT_MPRED ] = { .pme_name = "PM_LSU1_REJECT_DERAT_MPRED", .pme_code = 0xa008a, .pme_short_desc = "LSU1 reject due to mispredicted DERAT", .pme_long_desc = "LSU1 reject due to mispredicted DERAT", }, [ POWER6_PME_PM_MRK_VMX1_LD_WRBACK ] = { .pme_name = "PM_MRK_VMX1_LD_WRBACK", .pme_code = 0x6008e, .pme_short_desc = "Marked VMX1 load writeback valid", .pme_long_desc = "Marked VMX1 load writeback valid", }, [ POWER6_PME_PM_DATA_FROM_RL2L3_SHR_CYC ] = { .pme_name = "PM_DATA_FROM_RL2L3_SHR_CYC", .pme_code = 0x200028, .pme_short_desc = "Load latency from remote L2 or L3 shared", .pme_long_desc = "Load latency from remote L2 or L3 shared", }, [ POWER6_PME_PM_IERAT_MISS_16M ] = { .pme_name = "PM_IERAT_MISS_16M", .pme_code = 0x292076, .pme_short_desc = "IERAT misses for 16M page", .pme_long_desc = "IERAT misses for 16M page", }, [ POWER6_PME_PM_MRK_DATA_FROM_MEM_DP ] = { .pme_name = "PM_MRK_DATA_FROM_MEM_DP", .pme_code = 0x10304a, .pme_short_desc = "Marked data loaded from double pump memory", .pme_long_desc = "Marked data loaded from double pump memory", }, [ POWER6_PME_PM_LARX_L1HIT ] = { .pme_name = "PM_LARX_L1HIT", .pme_code = 0x830e2, .pme_short_desc = "larx hits in L1", .pme_long_desc = "larx hits in L1", }, [ POWER6_PME_PM_L2_ST_MISS_DATA ] = { .pme_name = "PM_L2_ST_MISS_DATA", .pme_code = 0x150432, .pme_short_desc = "L2 data store misses", .pme_long_desc = "L2 data store misses", }, [ POWER6_PME_PM_FPU_ST_FOLDED ] = { .pme_name = "PM_FPU_ST_FOLDED", .pme_code = 0x3d1030, .pme_short_desc = "FPU folded store", .pme_long_desc = "FPU folded store", }, [ POWER6_PME_PM_MRK_DATA_FROM_L35_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L35_SHR", .pme_code = 0x20304e, .pme_short_desc = "Marked data loaded from L3.5 shared", .pme_long_desc = "Marked data loaded from L3.5 shared", }, [ POWER6_PME_PM_DPU_HELD_MULT_GPR ] = { .pme_name = "PM_DPU_HELD_MULT_GPR", .pme_code = 0x210aa, .pme_short_desc = "DISP unit held due to multiple/divide multiply/divide GPR dependencies", .pme_long_desc = "DISP unit held due to multiple/divide multiply/divide GPR dependencies", }, [ POWER6_PME_PM_FPU0_1FLOP ] = { .pme_name = "PM_FPU0_1FLOP", .pme_code = 0xc0080, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ POWER6_PME_PM_IERAT_MISS_16G ] = { .pme_name = "PM_IERAT_MISS_16G", .pme_code = 0x192076, .pme_short_desc = "IERAT misses for 16G page", .pme_long_desc = "IERAT misses for 16G page", }, [ POWER6_PME_PM_IC_PREF_WRITE ] = { .pme_name = "PM_IC_PREF_WRITE", .pme_code = 0x430e0, .pme_short_desc = "Instruction prefetch written into I cache", .pme_long_desc = "Instruction prefetch written into I cache", }, [ POWER6_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus5or6_CYC", .pme_code = 0x423046, .pme_short_desc = "Cycles thread priority difference is -5 or -6", .pme_long_desc = "Cycles thread priority difference is -5 or -6", }, [ POWER6_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0xd0080, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "fp0 finished, produced a result This only indicates finish, not completion. ", }, [ POWER6_PME_PM_DATA_FROM_L2_CYC ] = { .pme_name = "PM_DATA_FROM_L2_CYC", .pme_code = 0x200020, .pme_short_desc = "Load latency from L2", .pme_long_desc = "Load latency from L2", }, [ POWER6_PME_PM_DERAT_REF_16G ] = { .pme_name = "PM_DERAT_REF_16G", .pme_code = 0x482070, .pme_short_desc = "DERAT reference for 16G page", .pme_long_desc = "DERAT reference for 16G page", }, [ POWER6_PME_PM_BR_PRED ] = { .pme_name = "PM_BR_PRED", .pme_code = 0x410a0, .pme_short_desc = "A conditional branch was predicted", .pme_long_desc = "A conditional branch was predicted", }, [ POWER6_PME_PM_VMX1_LD_ISSUED ] = { .pme_name = "PM_VMX1_LD_ISSUED", .pme_code = 0x6008a, .pme_short_desc = "VMX1 load issued", .pme_long_desc = "VMX1 load issued", }, [ POWER6_PME_PM_L2SB_CASTOUT_MOD ] = { .pme_name = "PM_L2SB_CASTOUT_MOD", .pme_code = 0x50688, .pme_short_desc = "L2 slice B castouts - Modified", .pme_long_desc = "L2 slice B castouts - Modified", }, [ POWER6_PME_PM_INST_FROM_DMEM ] = { .pme_name = "PM_INST_FROM_DMEM", .pme_code = 0x242042, .pme_short_desc = "Instruction fetched from distant memory", .pme_long_desc = "Instruction fetched from distant memory", }, [ POWER6_PME_PM_DATA_FROM_L35_SHR_CYC ] = { .pme_name = "PM_DATA_FROM_L35_SHR_CYC", .pme_code = 0x200026, .pme_short_desc = "Load latency from L3.5 shared", .pme_long_desc = "Load latency from L3.5 shared", }, [ POWER6_PME_PM_LSU0_NCLD ] = { .pme_name = "PM_LSU0_NCLD", .pme_code = 0x820ca, .pme_short_desc = "LSU0 non-cacheable loads", .pme_long_desc = "LSU0 non-cacheable loads", }, [ POWER6_PME_PM_FAB_RETRY_NODE_PUMP ] = { .pme_name = "PM_FAB_RETRY_NODE_PUMP", .pme_code = 0x5018a, .pme_short_desc = "Retry of a node pump, locally mastered", .pme_long_desc = "Retry of a node pump, locally mastered", }, [ POWER6_PME_PM_VMX0_INST_ISSUED ] = { .pme_name = "PM_VMX0_INST_ISSUED", .pme_code = 0x60080, .pme_short_desc = "VMX0 instruction issued", .pme_long_desc = "VMX0 instruction issued", }, [ POWER6_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x30005a, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a demand load", }, [ POWER6_PME_PM_DPU_HELD_ITLB_ISLB ] = { .pme_name = "PM_DPU_HELD_ITLB_ISLB", .pme_code = 0x210a4, .pme_short_desc = "DISP unit held due to SLB or TLB invalidates ", .pme_long_desc = "DISP unit held due to SLB or TLB invalidates ", }, [ POWER6_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x20001c, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER6_PME_PM_THRD_CONC_RUN_INST ] = { .pme_name = "PM_THRD_CONC_RUN_INST", .pme_code = 0x300026, .pme_short_desc = "Concurrent run instructions", .pme_long_desc = "Concurrent run instructions", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L2 ] = { .pme_name = "PM_MRK_PTEG_FROM_L2", .pme_code = 0x112040, .pme_short_desc = "Marked PTEG loaded from L2.5 modified", .pme_long_desc = "Marked PTEG loaded from L2.5 modified", }, [ POWER6_PME_PM_PURR ] = { .pme_name = "PM_PURR", .pme_code = 0x10000e, .pme_short_desc = "PURR Event", .pme_long_desc = "PURR Event", }, [ POWER6_PME_PM_DERAT_MISS_64K ] = { .pme_name = "PM_DERAT_MISS_64K", .pme_code = 0x292070, .pme_short_desc = "DERAT misses for 64K page", .pme_long_desc = "A data request (load or store) missed the ERAT for 64K page and resulted in an ERAT reload.", }, [ POWER6_PME_PM_PMC2_REWIND ] = { .pme_name = "PM_PMC2_REWIND", .pme_code = 0x300020, .pme_short_desc = "PMC2 rewind event", .pme_long_desc = "PMC2 rewind event", }, [ POWER6_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x142040, .pme_short_desc = "Instructions fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ POWER6_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x200012, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "The ISU sends the number of instructions dispatched.", }, [ POWER6_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x40005a, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a demand load", }, [ POWER6_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0x3000f6, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid", }, [ POWER6_PME_PM_LSU1_REJECT_UST ] = { .pme_name = "PM_LSU1_REJECT_UST", .pme_code = 0x9008a, .pme_short_desc = "LSU1 unaligned store reject", .pme_long_desc = "LSU1 unaligned store reject", }, [ POWER6_PME_PM_FAB_ADDR_COLLISION ] = { .pme_name = "PM_FAB_ADDR_COLLISION", .pme_code = 0x5018e, .pme_short_desc = "local node launch collision with off-node address ", .pme_long_desc = "local node launch collision with off-node address ", }, [ POWER6_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x20001a, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "The fixed point units (Unit 0 + Unit 1) finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER6_PME_PM_LSU0_REJECT_UST ] = { .pme_name = "PM_LSU0_REJECT_UST", .pme_code = 0x90082, .pme_short_desc = "LSU0 unaligned store reject", .pme_long_desc = "LSU0 unaligned store reject", }, [ POWER6_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x100014, .pme_short_desc = "PMC4 Overflow", .pme_long_desc = "PMC4 Overflow", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L3 ] = { .pme_name = "PM_MRK_PTEG_FROM_L3", .pme_code = 0x312040, .pme_short_desc = "Marked PTEG loaded from L3", .pme_long_desc = "Marked PTEG loaded from L3", }, [ POWER6_PME_PM_INST_FROM_L2MISS ] = { .pme_name = "PM_INST_FROM_L2MISS", .pme_code = 0x442054, .pme_short_desc = "Instructions fetched missed L2", .pme_long_desc = "An instruction fetch group was fetched from beyond L2.", }, [ POWER6_PME_PM_L2SB_ST_HIT ] = { .pme_name = "PM_L2SB_ST_HIT", .pme_code = 0x5078e, .pme_short_desc = "L2 slice B store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A,B, and C.", }, [ POWER6_PME_PM_DPU_WT_IC_MISS_COUNT ] = { .pme_name = "PM_DPU_WT_IC_MISS_COUNT", .pme_code = 0x20000d, .pme_short_desc = "Periods DISP unit is stalled due to I cache miss", .pme_long_desc = "Periods DISP unit is stalled due to I cache miss", }, [ POWER6_PME_PM_MRK_DATA_FROM_DL2L3_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_SHR", .pme_code = 0x30304c, .pme_short_desc = "Marked data loaded from distant L2 or L3 shared", .pme_long_desc = "Marked data loaded from distant L2 or L3 shared", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L35_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_L35_MOD", .pme_code = 0x112046, .pme_short_desc = "Marked PTEG loaded from L3.5 modified", .pme_long_desc = "Marked PTEG loaded from L3.5 modified", }, [ POWER6_PME_PM_FPU1_FPSCR ] = { .pme_name = "PM_FPU1_FPSCR", .pme_code = 0xd008c, .pme_short_desc = "FPU1 executed FPSCR instruction", .pme_long_desc = "FPU1 executed FPSCR instruction", }, [ POWER6_PME_PM_LSU_REJECT_UST ] = { .pme_name = "PM_LSU_REJECT_UST", .pme_code = 0x290030, .pme_short_desc = "Unaligned store reject", .pme_long_desc = "Unaligned store reject", }, [ POWER6_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x910a6, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 0 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER6_PME_PM_MRK_PTEG_FROM_MEM_DP ] = { .pme_name = "PM_MRK_PTEG_FROM_MEM_DP", .pme_code = 0x112042, .pme_short_desc = "Marked PTEG loaded from double pump memory", .pme_long_desc = "Marked PTEG loaded from double pump memory", }, [ POWER6_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x103048, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a marked demand load", }, [ POWER6_PME_PM_FPU0_FSQRT_FDIV ] = { .pme_name = "PM_FPU0_FSQRT_FDIV", .pme_code = 0xc0084, .pme_short_desc = "FPU0 executed FSQRT or FDIV instruction", .pme_long_desc = "FPU0 executed FSQRT or FDIV instruction", }, [ POWER6_PME_PM_DPU_HELD_FXU_SOPS ] = { .pme_name = "PM_DPU_HELD_FXU_SOPS", .pme_code = 0x30088, .pme_short_desc = "DISP unit held due to FXU slow ops (mtmsr, scv, rfscv)", .pme_long_desc = "DISP unit held due to FXU slow ops (mtmsr, scv, rfscv)", }, [ POWER6_PME_PM_MRK_FPU0_FIN ] = { .pme_name = "PM_MRK_FPU0_FIN", .pme_code = 0xd0082, .pme_short_desc = "Marked instruction FPU0 processing finished", .pme_long_desc = "Marked instruction FPU0 processing finished", }, [ POWER6_PME_PM_L2SB_LD_MISS_DATA ] = { .pme_name = "PM_L2SB_LD_MISS_DATA", .pme_code = 0x5048a, .pme_short_desc = "L2 slice B data load misses", .pme_long_desc = "L2 slice B data load misses", }, [ POWER6_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x40001c, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "The Store Request Queue is empty", }, [ POWER6_PME_PM_1PLUS_PPC_DISP ] = { .pme_name = "PM_1PLUS_PPC_DISP", .pme_code = 0x100012, .pme_short_desc = "Cycles at least one instruction dispatched", .pme_long_desc = "Cycles at least one instruction dispatched", }, [ POWER6_PME_PM_VMX_ST_ISSUED ] = { .pme_name = "PM_VMX_ST_ISSUED", .pme_code = 0xb0080, .pme_short_desc = "VMX store issued", .pme_long_desc = "VMX store issued", }, [ POWER6_PME_PM_DATA_FROM_L2MISS ] = { .pme_name = "PM_DATA_FROM_L2MISS", .pme_code = 0x2000fe, .pme_short_desc = "Data loaded missed L2", .pme_long_desc = "DL1 was reloaded from beyond L2.", }, [ POWER6_PME_PM_LSU0_REJECT_ULD ] = { .pme_name = "PM_LSU0_REJECT_ULD", .pme_code = 0x90080, .pme_short_desc = "LSU0 unaligned load reject", .pme_long_desc = "LSU0 unaligned load reject", }, [ POWER6_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Suspended", .pme_long_desc = "Suspended", }, [ POWER6_PME_PM_DFU_ADD_SHIFTED_BOTH ] = { .pme_name = "PM_DFU_ADD_SHIFTED_BOTH", .pme_code = 0xe0088, .pme_short_desc = "DFU add type with both operands shifted", .pme_long_desc = "DFU add type with both operands shifted", }, [ POWER6_PME_PM_LSU_REJECT_NO_SCRATCH ] = { .pme_name = "PM_LSU_REJECT_NO_SCRATCH", .pme_code = 0x2a1034, .pme_short_desc = "LSU reject due to scratch register not available", .pme_long_desc = "LSU reject due to scratch register not available", }, [ POWER6_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x830ee, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ POWER6_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0xc10aa, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ POWER6_PME_PM_GCT_NOSLOT_COUNT ] = { .pme_name = "PM_GCT_NOSLOT_COUNT", .pme_code = 0x100009, .pme_short_desc = "Periods no GCT slot allocated", .pme_long_desc = "Periods this thread does not have any slots allocated in the GCT.", }, [ POWER6_PME_PM_DATA_FROM_DL2L3_SHR_CYC ] = { .pme_name = "PM_DATA_FROM_DL2L3_SHR_CYC", .pme_code = 0x20002a, .pme_short_desc = "Load latency from distant L2 or L3 shared", .pme_long_desc = "Load latency from distant L2 or L3 shared", }, [ POWER6_PME_PM_DATA_FROM_L21 ] = { .pme_name = "PM_DATA_FROM_L21", .pme_code = 0x200058, .pme_short_desc = "Data loaded from private L2 other core", .pme_long_desc = "Data loaded from private L2 other core", }, [ POWER6_PME_PM_FPU_1FLOP ] = { .pme_name = "PM_FPU_1FLOP", .pme_code = 0x1c0030, .pme_short_desc = "FPU executed one flop instruction ", .pme_long_desc = "This event counts the number of one flop instructions. These could be fadd*, fmul*, fsub*, fneg+, fabs+, fnabs+, fres+, frsqrte+, fcmp**, or fsel where XYZ* means XYZ, XYZs, XYZ., XYZs., XYZ+ means XYZ, XYZ., and XYZ** means XYZu, XYZo.", }, [ POWER6_PME_PM_LSU1_REJECT ] = { .pme_name = "PM_LSU1_REJECT", .pme_code = 0xa10ae, .pme_short_desc = "LSU1 reject", .pme_long_desc = "LSU1 reject", }, [ POWER6_PME_PM_IC_REQ ] = { .pme_name = "PM_IC_REQ", .pme_code = 0x4008a, .pme_short_desc = "I cache demand of prefetch request", .pme_long_desc = "I cache demand of prefetch request", }, [ POWER6_PME_PM_MRK_DFU_FIN ] = { .pme_name = "PM_MRK_DFU_FIN", .pme_code = 0x300008, .pme_short_desc = "DFU marked instruction finish", .pme_long_desc = "DFU marked instruction finish", }, [ POWER6_PME_PM_NOT_LLA_CYC ] = { .pme_name = "PM_NOT_LLA_CYC", .pme_code = 0x401e, .pme_short_desc = "Load Look Ahead not Active", .pme_long_desc = "Load Look Ahead not Active", }, [ POWER6_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x40082, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ POWER6_PME_PM_MRK_VMX_COMPLEX_ISSUED ] = { .pme_name = "PM_MRK_VMX_COMPLEX_ISSUED", .pme_code = 0x7008c, .pme_short_desc = "Marked VMX instruction issued to complex", .pme_long_desc = "Marked VMX instruction issued to complex", }, [ POWER6_PME_PM_BRU_FIN ] = { .pme_name = "PM_BRU_FIN", .pme_code = 0x430e6, .pme_short_desc = "BRU produced a result", .pme_long_desc = "BRU produced a result", }, [ POWER6_PME_PM_LSU1_REJECT_EXTERN ] = { .pme_name = "PM_LSU1_REJECT_EXTERN", .pme_code = 0xa10ac, .pme_short_desc = "LSU1 external reject request ", .pme_long_desc = "LSU1 external reject request ", }, [ POWER6_PME_PM_DATA_FROM_L21_CYC ] = { .pme_name = "PM_DATA_FROM_L21_CYC", .pme_code = 0x400020, .pme_short_desc = "Load latency from private L2 other core", .pme_long_desc = "Load latency from private L2 other core", }, [ POWER6_PME_PM_GXI_CYC_BUSY ] = { .pme_name = "PM_GXI_CYC_BUSY", .pme_code = 0x50386, .pme_short_desc = "Inbound GX bus utilizations (# of cycles in use)", .pme_long_desc = "Inbound GX bus utilizations (# of cycles in use)", }, [ POWER6_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x200056, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ POWER6_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x430e2, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "This signal is asserted each cycle a cache write is active.", }, [ POWER6_PME_PM_LLA_CYC ] = { .pme_name = "PM_LLA_CYC", .pme_code = 0xc01e, .pme_short_desc = "Load Look Ahead Active", .pme_long_desc = "Load Look Ahead Active", }, [ POWER6_PME_PM_MRK_DATA_FROM_L2MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS", .pme_code = 0x103028, .pme_short_desc = "Marked data loaded missed L2", .pme_long_desc = "DL1 was reloaded from beyond L2 due to a marked demand load.", }, [ POWER6_PME_PM_GCT_FULL_COUNT ] = { .pme_name = "PM_GCT_FULL_COUNT", .pme_code = 0x40087, .pme_short_desc = "Periods GCT full", .pme_long_desc = "The ISU sends a signal indicating the gct is full.", }, [ POWER6_PME_PM_MEM_DP_RQ_LOC_GLOB ] = { .pme_name = "PM_MEM_DP_RQ_LOC_GLOB", .pme_code = 0x250230, .pme_short_desc = "Memory read queue marking cache line double pump state from local to global", .pme_long_desc = "Memory read queue marking cache line double pump state from local to global", }, [ POWER6_PME_PM_DATA_FROM_RL2L3_SHR ] = { .pme_name = "PM_DATA_FROM_RL2L3_SHR", .pme_code = 0x20005c, .pme_short_desc = "Data loaded from remote L2 or L3 shared", .pme_long_desc = "Data loaded from remote L2 or L3 shared", }, [ POWER6_PME_PM_MRK_LSU_REJECT_UST ] = { .pme_name = "PM_MRK_LSU_REJECT_UST", .pme_code = 0x293034, .pme_short_desc = "Marked unaligned store reject", .pme_long_desc = "Marked unaligned store reject", }, [ POWER6_PME_PM_MRK_VMX_PERMUTE_ISSUED ] = { .pme_name = "PM_MRK_VMX_PERMUTE_ISSUED", .pme_code = 0x7008e, .pme_short_desc = "Marked VMX instruction issued to permute", .pme_long_desc = "Marked VMX instruction issued to permute", }, [ POWER6_PME_PM_MRK_PTEG_FROM_L21 ] = { .pme_name = "PM_MRK_PTEG_FROM_L21", .pme_code = 0x212040, .pme_short_desc = "Marked PTEG loaded from private L2 other core", .pme_long_desc = "Marked PTEG loaded from private L2 other core", }, [ POWER6_PME_PM_THRD_GRP_CMPL_BOTH_CYC ] = { .pme_name = "PM_THRD_GRP_CMPL_BOTH_CYC", .pme_code = 0x200018, .pme_short_desc = "Cycles group completed by both threads", .pme_long_desc = "Cycles group completed by both threads", }, [ POWER6_PME_PM_BR_MPRED ] = { .pme_name = "PM_BR_MPRED", .pme_code = 0x400052, .pme_short_desc = "Branches incorrectly predicted", .pme_long_desc = "Branches incorrectly predicted", }, [ POWER6_PME_PM_LD_REQ_L2 ] = { .pme_name = "PM_LD_REQ_L2", .pme_code = 0x150730, .pme_short_desc = "L2 load requests ", .pme_long_desc = "L2 load requests ", }, [ POWER6_PME_PM_FLUSH_ASYNC ] = { .pme_name = "PM_FLUSH_ASYNC", .pme_code = 0x220ca, .pme_short_desc = "Flush caused by asynchronous exception", .pme_long_desc = "Flush caused by asynchronous exception", }, [ POWER6_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x200016, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ POWER6_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x910ae, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER6_PME_PM_DPU_HELD_SMT ] = { .pme_name = "PM_DPU_HELD_SMT", .pme_code = 0x20082, .pme_short_desc = "DISP unit held due to SMT conflicts ", .pme_long_desc = "DISP unit held due to SMT conflicts ", }, [ POWER6_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x40001a, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER6_PME_PM_MRK_DATA_FROM_RL2L3_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_SHR", .pme_code = 0x20304c, .pme_short_desc = "Marked data loaded from remote L2 or L3 shared", .pme_long_desc = "Marked data loaded from remote L2 or L3 shared", }, [ POWER6_PME_PM_LSU0_REJECT_STQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_STQ_FULL", .pme_code = 0xa0080, .pme_short_desc = "LSU0 reject due to store queue full", .pme_long_desc = "LSU0 reject due to store queue full", }, [ POWER6_PME_PM_MRK_DERAT_REF_4K ] = { .pme_name = "PM_MRK_DERAT_REF_4K", .pme_code = 0x282044, .pme_short_desc = "Marked DERAT reference for 4K page", .pme_long_desc = "Marked DERAT reference for 4K page", }, [ POWER6_PME_PM_FPU_ISSUE_STALL_FPR ] = { .pme_name = "PM_FPU_ISSUE_STALL_FPR", .pme_code = 0x330e2, .pme_short_desc = "FPU issue stalled due to FPR dependencies", .pme_long_desc = "FPU issue stalled due to FPR dependencies", }, [ POWER6_PME_PM_IFU_FIN ] = { .pme_name = "PM_IFU_FIN", .pme_code = 0x430e4, .pme_short_desc = "IFU finished an instruction", .pme_long_desc = "IFU finished an instruction", }, [ POWER6_PME_PM_GXO_CYC_BUSY ] = { .pme_name = "PM_GXO_CYC_BUSY", .pme_code = 0x50380, .pme_short_desc = "Outbound GX bus utilizations (# of cycles in use)", .pme_long_desc = "Outbound GX bus utilizations (# of cycles in use)", } }; #endif papi-5.3.0/src/libpfm4/lib/events/intel_netburst_events.h0000600003276200002170000011321412247131123023134 0ustar ralphundrgrad/* * Copyright (c) 2006 IBM Corp. * Contributed by Kevin Corry * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * This header contains arrays to describe the Event-Selection-Control * Registers (ESCRs), Counter-Configuration-Control Registers (CCCRs), * and countable events on Pentium4/Xeon/EM64T systems. * * For more details, see: * - IA-32 Intel Architecture Software Developer's Manual, * Volume 3B: System Programming Guide, Part 2 * (available at: http://www.intel.com/design/Pentium4/manuals/253669.htm) * - Chapter 18.10: Performance Monitoring Overview * - Chapter 18.13: Performance Monitoring - Pentium4 and Xeon Processors * - Chapter 18.14: Performance Monitoring and Hyper-Threading Technology * - Appendix A.1: Pentium4 and Xeon Processor Performance-Monitoring Events * * This header also contains an array to describe how the Perfmon PMCs map to * the ESCRs and CCCRs. */ #ifndef _NETBURST_EVENTS_H_ #define _NETBURST_EVENTS_H_ /** * netburst_events * * Array of events that can be counted on Pentium4. **/ static const netburst_entry_t netburst_events[] = { /* 0 */ {.name = "TC_deliver_mode", .desc = "The duration (in clock cycles) of the operating modes of " "the trace cache and decode engine in the processor package", .event_select = 0x1, .escr_select = 0x1, .allowed_escrs = { 9, 32 }, .perf_code = P4_EVENT_TC_DELIVER_MODE, .event_masks = { {.name = "DD", .desc = "Both logical CPUs in deliver mode", .bit = 0, }, {.name = "DB", .desc = "Logical CPU 0 in deliver mode and " "logical CPU 1 in build mode", .bit = 1, }, {.name = "DI", .desc = "Logical CPU 0 in deliver mode and logical CPU 1 " "either halted, under machine clear condition, or " "transitioning to a long microcode flow", .bit = 2, }, {.name = "BD", .desc = "Logical CPU 0 in build mode and " "logical CPU 1 is in deliver mode", .bit = 3, }, {.name = "BB", .desc = "Both logical CPUs in build mode", .bit = 4, }, {.name = "BI", .desc = "Logical CPU 0 in build mode and logical CPU 1 " "either halted, under machine clear condition, or " "transitioning to a long microcode flow", .bit = 5, }, {.name = "ID", .desc = "Logical CPU 0 either halted, under machine clear " "condition, or transitioning to a long microcode " "flow, and logical CPU 1 in deliver mode", .bit = 6, }, {.name = "IB", .desc = "Logical CPU 0 either halted, under machine clear " "condition, or transitioning to a long microcode " "flow, and logical CPU 1 in build mode", .bit = 7, }, }, }, /* 1 */ {.name = "BPU_fetch_request", .desc = "Instruction fetch requests by the Branch Prediction Unit", .event_select = 0x3, .escr_select = 0x0, .allowed_escrs = { 0, 23 }, .perf_code = P4_EVENT_BPU_FETCH_REQUEST, .event_masks = { {.name = "TCMISS", .desc = "Trace cache lookup miss", .bit = 0, .flags = NETBURST_FL_DFL, }, }, }, /* 2 */ {.name = "ITLB_reference", .desc = "Translations using the Instruction " "Translation Look-Aside Buffer", .event_select = 0x18, .escr_select = 0x3, .allowed_escrs = { 3, 26 }, .perf_code = P4_EVENT_ITLB_REFERENCE, .event_masks = { {.name = "HIT", .desc = "ITLB hit", .bit = 0, }, {.name = "MISS", .desc = "ITLB miss", .bit = 1, }, {.name = "HIT_UC", .desc = "Uncacheable ITLB hit", .bit = 2, }, }, }, /* 3 */ {.name = "memory_cancel", .desc = "Canceling of various types of requests in the " "Data cache Address Control unit (DAC)", .event_select = 0x2, .escr_select = 0x5, .allowed_escrs = { 15, 38 }, .perf_code = P4_EVENT_MEMORY_CANCEL, .event_masks = { {.name = "ST_RB_FULL", .desc = "Replayed because no store request " "buffer is available", .bit = 2, }, {.name = "64K_CONF", .desc = "Conflicts due to 64K aliasing", .bit = 3, }, }, }, /* 4 */ {.name = "memory_complete", .desc = "Completions of a load split, store split, " "uncacheable (UC) split, or UC load", .event_select = 0x8, .escr_select = 0x2, .allowed_escrs = { 13, 36 }, .perf_code = P4_EVENT_MEMORY_COMPLETE, .event_masks = { {.name = "LSC", .desc = "Load split completed, excluding UC/WC loads", .bit = 0, }, {.name = "SSC", .desc = "Any split stores completed", .bit = 1, }, }, }, /* 5 */ {.name = "load_port_replay", .desc = "Replayed events at the load port", .event_select = 0x4, .escr_select = 0x2, .allowed_escrs = { 13, 36 }, .perf_code = P4_EVENT_LOAD_PORT_REPLAY, .event_masks = { {.name = "SPLIT_LD", .desc = "Split load", .bit = 1, .flags = NETBURST_FL_DFL, }, }, }, /* 6 */ {.name = "store_port_replay", .desc = "Replayed events at the store port", .event_select = 0x5, .escr_select = 0x2, .allowed_escrs = { 13, 36 }, .perf_code = P4_EVENT_STORE_PORT_REPLAY, .event_masks = { {.name = "SPLIT_ST", .desc = "Split store", .bit = 1, .flags = NETBURST_FL_DFL, }, }, }, /* 7 */ {.name = "MOB_load_replay", .desc = "Count of times the memory order buffer (MOB) " "caused a load operation to be replayed", .event_select = 0x3, .escr_select = 0x2, .allowed_escrs = { 2, 25 }, .perf_code = P4_EVENT_MOB_LOAD_REPLAY, .event_masks = { {.name = "NO_STA", .desc = "Replayed because of unknown store address", .bit = 1, }, {.name = "NO_STD", .desc = "Replayed because of unknown store data", .bit = 3, }, {.name = "PARTIAL_DATA", .desc = "Replayed because of partially overlapped data " "access between the load and store operations", .bit = 4, }, {.name = "UNALGN_ADDR", .desc = "Replayed because the lower 4 bits of the " "linear address do not match between the " "load and store operations", .bit = 5, }, }, }, /* 8 */ {.name = "page_walk_type", .desc = "Page walks that the page miss handler (PMH) performs", .event_select = 0x1, .escr_select = 0x4, .allowed_escrs = { 4, 27 }, .perf_code = P4_EVENT_PAGE_WALK_TYPE, .event_masks = { {.name = "DTMISS", .desc = "Page walk for a data TLB miss (load or store)", .bit = 0, }, {.name = "ITMISS", .desc = "Page walk for an instruction TLB miss", .bit = 1, }, }, }, /* 9 */ {.name = "BSQ_cache_reference", .desc = "Cache references (2nd or 3rd level caches) as seen by the " "bus unit. Read types include both load and RFO, and write " "types include writebacks and evictions", .event_select = 0xC, .escr_select = 0x7, .allowed_escrs = { 7, 30 }, .perf_code = P4_EVENT_BSQ_CACHE_REFERENCE, .event_masks = { {.name = "RD_2ndL_HITS", .desc = "Read 2nd level cache hit Shared", .bit = 0, }, {.name = "RD_2ndL_HITE", .desc = "Read 2nd level cache hit Exclusive", .bit = 1, }, {.name = "RD_2ndL_HITM", .desc = "Read 2nd level cache hit Modified", .bit = 2, }, {.name = "RD_3rdL_HITS", .desc = "Read 3rd level cache hit Shared", .bit = 3, }, {.name = "RD_3rdL_HITE", .desc = "Read 3rd level cache hit Exclusive", .bit = 4, }, {.name = "RD_3rdL_HITM", .desc = "Read 3rd level cache hit Modified", .bit = 5, }, {.name = "RD_2ndL_MISS", .desc = "Read 2nd level cache miss", .bit = 8, }, {.name = "RD_3rdL_MISS", .desc = "Read 3rd level cache miss", .bit = 9, }, {.name = "WR_2ndL_MISS", .desc = "A writeback lookup from DAC misses the 2nd " "level cache (unlikely to happen)", .bit = 10, }, }, }, /* 10 */ {.name = "IOQ_allocation", .desc = "Count of various types of transactions on the bus. A count " "is generated each time a transaction is allocated into the " "IOQ that matches the specified mask bits. An allocated entry " "can be a sector (64 bytes) or a chunk of 8 bytes. Requests " "are counted once per retry. All 'TYPE_BIT*' event-masks " "together are treated as a single 5-bit value", .event_select = 0x3, .escr_select = 0x6, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_IOQ_ALLOCATION, .event_masks = { {.name = "TYPE_BIT0", .desc = "Bus request type (bit 0)", .bit = 0, }, {.name = "TYPE_BIT1", .desc = "Bus request type (bit 1)", .bit = 1, }, {.name = "TYPE_BIT2", .desc = "Bus request type (bit 2)", .bit = 2, }, {.name = "TYPE_BIT3", .desc = "Bus request type (bit 3)", .bit = 3, }, {.name = "TYPE_BIT4", .desc = "Bus request type (bit 4)", .bit = 4, }, {.name = "ALL_READ", .desc = "Count read entries", .bit = 5, }, {.name = "ALL_WRITE", .desc = "Count write entries", .bit = 6, }, {.name = "MEM_UC", .desc = "Count UC memory access entries", .bit = 7, }, {.name = "MEM_WC", .desc = "Count WC memory access entries", .bit = 8, }, {.name = "MEM_WT", .desc = "Count write-through (WT) memory access entries", .bit = 9, }, {.name = "MEM_WP", .desc = "Count write-protected (WP) memory access entries", .bit = 10, }, {.name = "MEM_WB", .desc = "Count WB memory access entries", .bit = 11, }, {.name = "OWN", .desc = "Count all store requests driven by processor, as " "opposed to other processor or DMA", .bit = 13, }, {.name = "OTHER", .desc = "Count all requests driven by other " "processors or DMA", .bit = 14, }, {.name = "PREFETCH", .desc = "Include HW and SW prefetch requests in the count", .bit = 15, }, }, }, /* 11 */ {.name = "IOQ_active_entries", .desc = "Number of entries (clipped at 15) in the IOQ that are " "active. An allocated entry can be a sector (64 bytes) " "or a chunk of 8 bytes. This event must be programmed in " "conjuction with IOQ_allocation. All 'TYPE_BIT*' event-masks " "together are treated as a single 5-bit value", .event_select = 0x1A, .escr_select = 0x6, .allowed_escrs = { 29, -1 }, .perf_code = P4_EVENT_IOQ_ACTIVE_ENTRIES, .event_masks = { {.name = "TYPE_BIT0", .desc = "Bus request type (bit 0)", .bit = 0, }, {.name = "TYPE_BIT1", .desc = "Bus request type (bit 1)", .bit = 1, }, {.name = "TYPE_BIT2", .desc = "Bus request type (bit 2)", .bit = 2, }, {.name = "TYPE_BIT3", .desc = "Bus request type (bit 3)", .bit = 3, }, {.name = "TYPE_BIT4", .desc = "Bus request type (bit 4)", .bit = 4, }, {.name = "ALL_READ", .desc = "Count read entries", .bit = 5, }, {.name = "ALL_WRITE", .desc = "Count write entries", .bit = 6, }, {.name = "MEM_UC", .desc = "Count UC memory access entries", .bit = 7, }, {.name = "MEM_WC", .desc = "Count WC memory access entries", .bit = 8, }, {.name = "MEM_WT", .desc = "Count write-through (WT) memory access entries", .bit = 9, }, {.name = "MEM_WP", .desc = "Count write-protected (WP) memory access entries", .bit = 10, }, {.name = "MEM_WB", .desc = "Count WB memory access entries", .bit = 11, }, {.name = "OWN", .desc = "Count all store requests driven by processor, as " "opposed to other processor or DMA", .bit = 13, }, {.name = "OTHER", .desc = "Count all requests driven by other " "processors or DMA", .bit = 14, }, {.name = "PREFETCH", .desc = "Include HW and SW prefetch requests in the count", .bit = 15, }, }, }, /* 12 */ {.name = "FSB_data_activity", .desc = "Count of DRDY or DBSY events that " "occur on the front side bus", .event_select = 0x17, .escr_select = 0x6, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_FSB_DATA_ACTIVITY, .event_masks = { {.name = "DRDY_DRV", .desc = "Count when this processor drives data onto the bus. " "Includes writes and implicit writebacks", .bit = 0, }, {.name = "DRDY_OWN", .desc = "Count when this processor reads data from the bus. " "Includes loads and some PIC transactions. Count " "DRDY events that we drive. Count DRDY events sampled " "that we own", .bit = 1, }, {.name = "DRDY_OTHER", .desc = "Count when data is on the bus but not being sampled " "by the processor. It may or may not be driven by " "this processor", .bit = 2, }, {.name = "DBSY_DRV", .desc = "Count when this processor reserves the bus for use " "in the next bus cycle in order to drive data", .bit = 3, }, {.name = "DBSY_OWN", .desc = "Count when some agent reserves the bus for use in " "the next bus cycle to drive data that this processor " "will sample", .bit = 4, }, {.name = "DBSY_OTHER", .desc = "Count when some agent reserves the bus for use in " "the next bus cycle to drive data that this processor " "will NOT sample. It may or may not be being driven " "by this processor", .bit = 5, }, }, }, /* 13 */ {.name = "BSQ_allocation", .desc = "Allocations in the Bus Sequence Unit (BSQ). The event mask " "bits consist of four sub-groups: request type, request " "length, memory type, and a sub-group consisting mostly of " "independent bits (5 through 10). Must specify a mask for " "each sub-group", .event_select = 0x5, .escr_select = 0x7, .allowed_escrs = { 7, -1 }, .perf_code = P4_EVENT_BSQ_ALLOCATION, .event_masks = { {.name = "REQ_TYPE0", .desc = "Along with REQ_TYPE1, request type encodings are: " "0 - Read (excludes read invalidate), 1 - Read " "invalidate, 2 - Write (other than writebacks), 3 - " "Writeback (evicted from cache)", .bit = 0, }, {.name = "REQ_TYPE1", .desc = "Along with REQ_TYPE0, request type encodings are: " "0 - Read (excludes read invalidate), 1 - Read " "invalidate, 2 - Write (other than writebacks), 3 - " "Writeback (evicted from cache)", .bit = 1, }, {.name = "REQ_LEN0", .desc = "Along with REQ_LEN1, request length encodings are: " "0 - zero chunks, 1 - one chunk, 3 - eight chunks", .bit = 2, }, {.name = "REQ_LEN1", .desc = "Along with REQ_LEN0, request length encodings are: " "0 - zero chunks, 1 - one chunk, 3 - eight chunks", .bit = 3, }, {.name = "REQ_IO_TYPE", .desc = "Request type is input or output", .bit = 5, }, {.name = "REQ_LOCK_TYPE", .desc = "Request type is bus lock", .bit = 6, }, {.name = "REQ_CACHE_TYPE", .desc = "Request type is cacheable", .bit = 7, }, {.name = "REQ_SPLIT_TYPE", .desc = "Request type is a bus 8-byte chunk split across " "an 8-byte boundary", .bit = 8, }, {.name = "REQ_DEM_TYPE", .desc = "0: Request type is HW.SW prefetch. " "1: Request type is a demand", .bit = 9, }, {.name = "REQ_ORD_TYPE", .desc = "Request is an ordered type", .bit = 10, }, {.name = "MEM_TYPE0", .desc = "Along with MEM_TYPE1 and MEM_TYPE2, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 11, }, {.name = "MEM_TYPE1", .desc = "Along with MEM_TYPE0 and MEM_TYPE2, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 12, }, {.name = "MEM_TYPE2", .desc = "Along with MEM_TYPE0 and MEM_TYPE1, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 13, }, }, }, /* 14 */ {.name = "BSQ_active_entries", .desc = "Number of BSQ entries (clipped at 15) currently active " "(valid) which meet the subevent mask criteria during " "allocation in the BSQ. Active request entries are allocated " "on the BSQ until de-allocated. De-allocation of an entry " "does not necessarily imply the request is filled. This " "event must be programmed in conjunction with BSQ_allocation", .event_select = 0x6, .escr_select = 0x7, .allowed_escrs = { 30, -1 }, .perf_code = P4_EVENT_BSQ_ACTIVE_ENTRIES, .event_masks = { {.name = "REQ_TYPE0", .desc = "Along with REQ_TYPE1, request type encodings are: " "0 - Read (excludes read invalidate), 1 - Read " "invalidate, 2 - Write (other than writebacks), 3 - " "Writeback (evicted from cache)", .bit = 0, }, {.name = "REQ_TYPE1", .desc = "Along with REQ_TYPE0, request type encodings are: " "0 - Read (excludes read invalidate), 1 - Read " "invalidate, 2 - Write (other than writebacks), 3 - " "Writeback (evicted from cache)", .bit = 1, }, {.name = "REQ_LEN0", .desc = "Along with REQ_LEN1, request length encodings are: " "0 - zero chunks, 1 - one chunk, 3 - eight chunks", .bit = 2, }, {.name = "REQ_LEN1", .desc = "Along with REQ_LEN0, request length encodings are: " "0 - zero chunks, 1 - one chunk, 3 - eight chunks", .bit = 3, }, {.name = "REQ_IO_TYPE", .desc = "Request type is input or output", .bit = 5, }, {.name = "REQ_LOCK_TYPE", .desc = "Request type is bus lock", .bit = 6, }, {.name = "REQ_CACHE_TYPE", .desc = "Request type is cacheable", .bit = 7, }, {.name = "REQ_SPLIT_TYPE", .desc = "Request type is a bus 8-byte chunk split across " "an 8-byte boundary", .bit = 8, }, {.name = "REQ_DEM_TYPE", .desc = "0: Request type is HW.SW prefetch. " "1: Request type is a demand", .bit = 9, }, {.name = "REQ_ORD_TYPE", .desc = "Request is an ordered type", .bit = 10, }, {.name = "MEM_TYPE0", .desc = "Along with MEM_TYPE1 and MEM_TYPE2, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 11, }, {.name = "MEM_TYPE1", .desc = "Along with MEM_TYPE0 and MEM_TYPE2, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 12, }, {.name = "MEM_TYPE2", .desc = "Along with MEM_TYPE0 and MEM_TYPE1, " "memory type encodings are: 0 - UC, " "1 - USWC, 4- WT, 5 - WP, 6 - WB", .bit = 13, }, }, }, /* 15 */ {.name = "SSE_input_assist", .desc = "Number of times an assist is requested to handle problems " "with input operands for SSE/SSE2/SSE3 operations; most " "notably denormal source operands when the DAZ bit isn't set", .event_select = 0x34, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_SSE_INPUT_ASSIST, .event_masks = { {.name = "ALL", .desc = "Count assists for SSE/SSE2/SSE3 uops", .bit = 15, .flags = NETBURST_FL_DFL, }, }, }, /* 16 */ {.name = "packed_SP_uop", .desc = "Number of packed single-precision uops", .event_select = 0x8, .escr_select = 0x1, .perf_code = P4_EVENT_PACKED_SP_UOP, .allowed_escrs = { 12, 35 }, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on packed " "single-precisions operands", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 17 */ {.name = "packed_DP_uop", .desc = "Number of packed double-precision uops", .event_select = 0xC, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_PACKED_DP_UOP, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on packed " "double-precisions operands", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 18 */ {.name = "scalar_SP_uop", .desc = "Number of scalar single-precision uops", .event_select = 0xA, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_SCALAR_SP_UOP, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on scalar " "single-precisions operands", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 19 */ {.name = "scalar_DP_uop", .desc = "Number of scalar double-precision uops", .event_select = 0xE, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_SCALAR_DP_UOP, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on scalar " "double-precisions operands", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 20 */ {.name = "64bit_MMX_uop", .desc = "Number of MMX instructions which " "operate on 64-bit SIMD operands", .event_select = 0x2, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_64BIT_MMX_UOP, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on 64-bit SIMD integer " "operands in memory or MMX registers", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 21 */ {.name = "128bit_MMX_uop", .desc = "Number of MMX instructions which " "operate on 128-bit SIMD operands", .event_select = 0x1A, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_128BIT_MMX_UOP, .event_masks = { {.name = "ALL", .desc = "Count all uops operating on 128-bit SIMD integer " "operands in memory or MMX registers", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 22 */ {.name = "x87_FP_uop", .desc = "Number of x87 floating-point uops", .event_select = 0x4, .escr_select = 0x1, .allowed_escrs = { 12, 35 }, .perf_code = P4_EVENT_X87_FP_UOP, .event_masks = { {.name = "ALL", .desc = "Count all x87 FP uops", .bit = 15, .flags = NETBURST_FL_DFL, }, {.name = "TAG0", .desc = "Tag this event with tag bit 0 " "for retirement counting with execution_event", .bit = 16, }, {.name = "TAG1", .desc = "Tag this event with tag bit 1 " "for retirement counting with execution_event", .bit = 17, }, {.name = "TAG2", .desc = "Tag this event with tag bit 2 " "for retirement counting with execution_event", .bit = 18, }, {.name = "TAG3", .desc = "Tag this event with tag bit 3 " "for retirement counting with execution_event", .bit = 19, }, }, }, /* 23 */ {.name = "TC_misc", .desc = "Miscellaneous events detected by the TC. The counter will " "count twice for each occurrence", .event_select = 0x6, .escr_select = 0x1, .allowed_escrs = { 9, 32 }, .perf_code = P4_EVENT_TC_MISC, .event_masks = { {.name = "FLUSH", .desc = "Number of flushes", .bit = 4, .flags = NETBURST_FL_DFL, }, }, }, /* 24 */ {.name = "global_power_events", .desc = "Counts the time during which a processor is not stopped", .event_select = 0x13, .escr_select = 0x6, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_GLOBAL_POWER_EVENTS, .event_masks = { {.name = "RUNNING", .desc = "The processor is active (includes the " "handling of HLT STPCLK and throttling", .bit = 0, .flags = NETBURST_FL_DFL, }, }, }, /* 25 */ {.name = "tc_ms_xfer", .desc = "Number of times that uop delivery changed from TC to MS ROM", .event_select = 0x5, .escr_select = 0x0, .allowed_escrs = { 8, 31 }, .perf_code = P4_EVENT_TC_MS_XFER, .event_masks = { {.name = "CISC", .desc = "A TC to MS transfer occurred", .bit = 0, .flags = NETBURST_FL_DFL, }, }, }, /* 26 */ {.name = "uop_queue_writes", .desc = "Number of valid uops written to the uop queue", .event_select = 0x9, .escr_select = 0x0, .allowed_escrs = { 8, 31 }, .perf_code = P4_EVENT_UOP_QUEUE_WRITES, .event_masks = { {.name = "FROM_TC_BUILD", .desc = "The uops being written are from TC build mode", .bit = 0, }, {.name = "FROM_TC_DELIVER", .desc = "The uops being written are from TC deliver mode", .bit = 1, }, {.name = "FROM_ROM", .desc = "The uops being written are from microcode ROM", .bit = 2, }, }, }, /* 27 */ {.name = "retired_mispred_branch_type", .desc = "Number of retiring mispredicted branches by type", .event_select = 0x5, .escr_select = 0x2, .allowed_escrs = { 10, 33 }, .perf_code = P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, .event_masks = { {.name = "CONDITIONAL", .desc = "Conditional jumps", .bit = 1, }, {.name = "CALL", .desc = "Indirect call branches", .bit = 2, }, {.name = "RETURN", .desc = "Return branches", .bit = 3, }, {.name = "INDIRECT", .desc = "Returns, indirect calls, or indirect jumps", .bit = 4, }, }, }, /* 28 */ {.name = "retired_branch_type", .desc = "Number of retiring branches by type", .event_select = 0x4, .escr_select = 0x2, .allowed_escrs = { 10, 33 }, .perf_code = P4_EVENT_RETIRED_BRANCH_TYPE, .event_masks = { {.name = "CONDITIONAL", .desc = "Conditional jumps", .bit = 1, }, {.name = "CALL", .desc = "Indirect call branches", .bit = 2, }, {.name = "RETURN", .desc = "Return branches", .bit = 3, }, {.name = "INDIRECT", .desc = "Returns, indirect calls, or indirect jumps", .bit = 4, }, }, }, /* 29 */ {.name = "resource_stall", .desc = "Occurrences of latency or stalls in the Allocator", .event_select = 0x1, .escr_select = 0x1, .allowed_escrs = { 17, 40 }, .perf_code = P4_EVENT_RESOURCE_STALL, .event_masks = { {.name = "SBFULL", .desc = "A stall due to lack of store buffers", .bit = 5, .flags = NETBURST_FL_DFL, }, }, }, /* 30 */ {.name = "WC_Buffer", .desc = "Number of Write Combining Buffer operations", .event_select = 0x5, .escr_select = 0x5, .allowed_escrs = { 15, 38 }, .perf_code = P4_EVENT_WC_BUFFER, .event_masks = { {.name = "WCB_EVICTS", .desc = "WC Buffer evictions of all causes", .bit = 0, }, {.name = "WCB_FULL_EVICT", .desc = "WC Buffer eviction; no WC buffer is available", .bit = 1, }, }, }, /* 31 */ {.name = "b2b_cycles", .desc = "Number of back-to-back bus cycles", .event_select = 0x16, .escr_select = 0x3, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_B2B_CYCLES, .event_masks = { {.name = "BIT1", .desc = "bit 1", .bit = 1, }, {.name = "BIT2", .desc = "bit 2", .bit = 2, }, {.name = "BIT3", .desc = "bit 3", .bit = 3, }, {.name = "BIT4", .desc = "bit 4", .bit = 4, }, {.name = "BIT5", .desc = "bit 5", .bit = 4, }, {.name = "BIT6", .desc = "bit 6", .bit = 4, }, }, }, /* 32 */ {.name = "bnr", .desc = "Number of bus-not-ready conditions", .event_select = 0x8, .escr_select = 0x3, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_BNR, .event_masks = { {.name = "BIT0", .desc = "bit 0", .bit = 0, }, {.name = "BIT1", .desc = "bit 1", .bit = 1, }, {.name = "BIT2", .desc = "bit 2", .bit = 2, }, }, }, /* 33 */ {.name = "snoop", .desc = "Number of snoop hit modified bus traffic", .event_select = 0x6, .escr_select = 0x3, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_SNOOP, .event_masks = { {.name = "BIT2", .desc = "bit 2", .bit = 2, }, {.name = "BIT6", .desc = "bit 6", .bit = 6, }, {.name = "BIT7", .desc = "bit 7", .bit = 7, }, }, }, /* 34 */ {.name = "response", .desc = "Count of different types of responses", .event_select = 0x4, .escr_select = 0x3, .allowed_escrs = { 6, 29 }, .perf_code = P4_EVENT_RESPONSE, .event_masks = { {.name = "BIT1", .desc = "bit 1", .bit = 1, }, {.name = "BIT2", .desc = "bit 2", .bit = 2, }, {.name = "BIT8", .desc = "bit 8", .bit = 8, }, {.name = "BIT9", .desc = "bit 9", .bit = 9, }, }, }, /* 35 */ {.name = "front_end_event", .desc = "Number of retirements of tagged uops which are specified " "through the front-end tagging mechanism", .event_select = 0x8, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_FRONT_END_EVENT, .event_masks = { {.name = "NBOGUS", .desc = "The marked uops are not bogus", .bit = 0, }, {.name = "BOGUS", .desc = "The marked uops are bogus", .bit = 1, }, }, }, /* 36 */ {.name = "execution_event", .desc = "Number of retirements of tagged uops which are specified " "through the execution tagging mechanism. The event-mask " "allows from one to four types of uops to be tagged", .event_select = 0xC, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_EXECUTION_EVENT, .event_masks = { {.name = "NBOGUS0", .desc = "The marked uops are not bogus", .bit = 0, }, {.name = "NBOGUS1", .desc = "The marked uops are not bogus", .bit = 1, }, {.name = "NBOGUS2", .desc = "The marked uops are not bogus", .bit = 2, }, {.name = "NBOGUS3", .desc = "The marked uops are not bogus", .bit = 3, }, {.name = "BOGUS0", .desc = "The marked uops are bogus", .bit = 4, }, {.name = "BOGUS1", .desc = "The marked uops are bogus", .bit = 5, }, {.name = "BOGUS2", .desc = "The marked uops are bogus", .bit = 6, }, {.name = "BOGUS3", .desc = "The marked uops are bogus", .bit = 7, }, }, }, /* 37 */ {.name = "replay_event", .desc = "Number of retirements of tagged uops which are specified " "through the replay tagging mechanism", .event_select = 0x9, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_REPLAY_EVENT, .event_masks = { {.name = "NBOGUS", .desc = "The marked uops are not bogus", .bit = 0, }, {.name = "BOGUS", .desc = "The marked uops are bogus", .bit = 1, }, {.name = "L1_LD_MISS", .desc = "Virtual mask for L1 cache load miss replays", .bit = 2, }, {.name = "L2_LD_MISS", .desc = "Virtual mask for L2 cache load miss replays", .bit = 3, }, {.name = "DTLB_LD_MISS", .desc = "Virtual mask for DTLB load miss replays", .bit = 4, }, {.name = "DTLB_ST_MISS", .desc = "Virtual mask for DTLB store miss replays", .bit = 5, }, {.name = "DTLB_ALL_MISS", .desc = "Virtual mask for all DTLB miss replays", .bit = 6, }, {.name = "BR_MSP", .desc = "Virtual mask for tagged mispredicted branch replays", .bit = 7, }, {.name = "MOB_LD_REPLAY", .desc = "Virtual mask for MOB load replays", .bit = 8, }, {.name = "SP_LD_RET", .desc = "Virtual mask for split load replays. Use with load_port_replay event", .bit = 9, }, {.name = "SP_ST_RET", .desc = "Virtual mask for split store replays. Use with store_port_replay event", .bit = 10, }, }, }, /* 38 */ {.name = "instr_retired", .desc = "Number of instructions retired during a clock cycle", .event_select = 0x2, .escr_select = 0x4, .allowed_escrs = { 20, 42 }, .perf_code = P4_EVENT_INSTR_RETIRED, .event_masks = { {.name = "NBOGUSNTAG", .desc = "Non-bogus instructions that are not tagged", .bit = 0, }, {.name = "NBOGUSTAG", .desc = "Non-bogus instructions that are tagged", .bit = 1, }, {.name = "BOGUSNTAG", .desc = "Bogus instructions that are not tagged", .bit = 2, }, {.name = "BOGUSTAG", .desc = "Bogus instructions that are tagged", .bit = 3, }, }, }, /* 39 */ {.name = "uops_retired", .desc = "Number of uops retired during a clock cycle", .event_select = 0x1, .escr_select = 0x4, .allowed_escrs = { 20, 42 }, .perf_code = P4_EVENT_UOPS_RETIRED, .event_masks = { {.name = "NBOGUS", .desc = "The marked uops are not bogus", .bit = 0, }, {.name = "BOGUS", .desc = "The marked uops are bogus", .bit = 1, }, }, }, /* 40 */ {.name = "uops_type", .desc = "This event is used in conjunction with with the front-end " "mechanism to tag load and store uops", .event_select = 0x2, .escr_select = 0x2, .allowed_escrs = { 18, 41 }, .perf_code = P4_EVENT_UOP_TYPE, .event_masks = { {.name = "TAGLOADS", .desc = "The uop is a load operation", .bit = 1, }, {.name = "TAGSTORES", .desc = "The uop is a store operation", .bit = 2, }, }, }, /* 41 */ {.name = "branch_retired", .desc = "Number of retirements of a branch", .event_select = 0x6, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_BRANCH_RETIRED, .event_masks = { {.name = "MMNP", .desc = "Branch not-taken predicted", .bit = 0, }, {.name = "MMNM", .desc = "Branch not-taken mispredicted", .bit = 1, }, {.name = "MMTP", .desc = "Branch taken predicted", .bit = 2, }, {.name = "MMTM", .desc = "Branch taken mispredicted", .bit = 3, }, }, }, /* 42 */ {.name = "mispred_branch_retired", .desc = "Number of retirements of mispredicted " "IA-32 branch instructions", .event_select = 0x3, .escr_select = 0x4, .allowed_escrs = { 20, 42 }, .perf_code = P4_EVENT_MISPRED_BRANCH_RETIRED, .event_masks = { {.name = "BOGUS", .desc = "The retired instruction is not bogus", .bit = 0, .flags = NETBURST_FL_DFL, }, }, }, /* 43 */ {.name = "x87_assist", .desc = "Number of retirements of x87 instructions that required " "special handling", .event_select = 0x3, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_X87_ASSIST, .event_masks = { {.name = "FPSU", .desc = "Handle FP stack underflow", .bit = 0, }, {.name = "FPSO", .desc = "Handle FP stack overflow", .bit = 1, }, {.name = "POAO", .desc = "Handle x87 output overflow", .bit = 2, }, {.name = "POAU", .desc = "Handle x87 output underflow", .bit = 3, }, {.name = "PREA", .desc = "Handle x87 input assist", .bit = 4, }, }, }, /* 44 */ {.name = "machine_clear", .desc = "Number of occurances when the entire " "pipeline of the machine is cleared", .event_select = 0x2, .escr_select = 0x5, .allowed_escrs = { 21, 43 }, .perf_code = P4_EVENT_MACHINE_CLEAR, .event_masks = { {.name = "CLEAR", .desc = "Counts for a portion of the many cycles while the " "machine is cleared for any cause. Use edge-" "triggering for this bit only to get a count of " "occurances versus a duration", .bit = 0, }, {.name = "MOCLEAR", .desc = "Increments each time the machine is cleared due to " "memory ordering issues", .bit = 2, }, {.name = "SMCLEAR", .desc = "Increments each time the machine is cleared due to " "self-modifying code issues", .bit = 6, }, }, }, /* 45 */ {.name = "instr_completed", .desc = "Instructions that have completed and " "retired during a clock cycle (models 3, 4, 6 only)", .event_select = 0x7, .escr_select = 0x4, .allowed_escrs = { 21, 42 }, .perf_code = P4_EVENT_INSTR_COMPLETED, .event_masks = { {.name = "NBOGUS", .desc = "Non-bogus instructions", .bit = 0, }, {.name = "BOGUS", .desc = "Bogus instructions", .bit = 1, }, }, }, }; #define PME_REPLAY_EVENT 37 #define NETBURST_EVENT_COUNT (sizeof(netburst_events)/sizeof(netburst_entry_t)) #endif papi-5.3.0/src/libpfm4/lib/events/arm_cortex_a9_events.h0000600003276200002170000002035612247131123022633 0ustar ralphundrgrad/* * Copyright (c) 2010 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * the various event names are the same as those given in the * file linux-2.6/arch/arm/kernel/perf_event.c */ /* * Cortex A9 r2p2 Event Table * based on Table 11-7 from the "Cortex A9 Technical Reference Manual" */ static const arm_entry_t arm_cortex_a9_pe []={ /* * ARMv7 events */ {.name = "PMNC_SW_INCR", .code = 0x00, .desc = "Incremented by writes to the Software Increment Register" }, {.name = "IFETCH_MISS", .code = 0x01, .desc = "Instruction fetches that cause lowest-level cache miss" }, {.name = "ITLB_MISS", .code = 0x02, .desc = "Instruction fetches that cause lowest-level TLB miss" }, {.name = "DCACHE_REFILL", .code = 0x03, .desc = "Data read or writes that cause lowest-level cache miss" }, {.name = "DCACHE_ACCESS", .code = 0x04, .desc = "Data read or writes that cause lowest-level cache access" }, {.name = "DTLB_REFILL", .code = 0x05, .desc = "Data read or writes that cause lowest-level TLB refill" }, {.name = "DREAD", .code = 0x06, .desc = "Data read architecturally executed" }, {.name = "DWRITE", .code = 0x07, .desc = "Data write architecturally executed" }, {.name = "EXC_TAKEN", .code = 0x09, .desc = "Counts each exception taken" }, {.name = "EXC_EXECUTED", .code = 0x0a, .desc = "Exception returns architecturally executed" }, {.name = "CID_WRITE", .code = 0x0b, .desc = "Instruction writes to Context ID Register, architecturally executed" }, {.name = "PC_WRITE", .code = 0x0c, .desc = "Software change of PC. Equivelant to branches" }, {.name = "PC_IMM_BRANCH", .code = 0x0d, .desc = "Immedidate branches architecturally executed" }, {.name = "UNALIGNED_ACCESS", .code = 0x0f, .desc = "Unaligned accesses architecturally executed" }, {.name = "PC_BRANCH_MIS_PRED", .code = 0x10, .desc = "Branches mispredicted or not predicted" }, {.name = "CLOCK_CYCLES", .code = 0x11, .desc = "Clock cycles" }, {.name = "PC_BRANCH_MIS_USED", .code = 0x12, .desc = "Branches that could have been predicted" }, /* * Cortex A9 specific events */ {.name = "JAVA_HW_BYTECODE_EXEC", .code = 0x40, .desc = "Java bytecodes decoded, including speculative (approximate)" }, {.name = "JAVA_SW_BYTECODE_EXEC", .code = 0x41, .desc = "Software Java bytecodes decoded, including speculative (approximate)" }, {.name = "JAZELLE_BRANCH_EXEC", .code = 0x42, .desc = "Jazelle backward branches executed. Includes branches that are flushed because of previous load/store which abort late (approximate)" }, {.name = "COHERENT_LINE_MISS", .code = 0x50, .desc = "Coherent linefill misses which also miss on other processors" }, {.name = "COHERENT_LINE_HIT", .code = 0x51, .desc = "Coherent linefill requests that hit on another processor" }, {.name = "ICACHE_DEP_STALL_CYCLES", .code = 0x60, .desc = "Cycles processor is stalled waiting for instruction cache and the instruction cache is performing at least one linefill (approximate)" }, {.name = "DCACHE_DEP_STALL_CYCLES", .code = 0x61, .desc = "Cycles processor is stalled waiting for data cache" }, {.name = "TLB_MISS_DEP_STALL_CYCLES", .code = 0x62, .desc = "Cycles processor is stalled waiting for completion of TLB walk (approximate)" }, {.name = "STREX_EXECUTED_PASSED", .code = 0x63, .desc = "Number of STREX instructions executed and passed" }, {.name = "STREX_EXECUTED_FAILED", .code = 0x64, .desc = "Number of STREX instructions executed and failed" }, {.name = "DATA_EVICTION", .code = 0x65, .desc = "Data eviction requests due to linefill in data cache" }, {.name = "ISSUE_STAGE_NO_INST", .code = 0x66, .desc = "Cycles the issue stage does not dispatch any instructions" }, {.name = "ISSUE_STAGE_EMPTY", .code = 0x67, .desc = "Cycles where issue stage is empty" }, {.name = "INST_OUT_OF_RENAME_STAGE", .code = 0x68, .desc = "Number of instructions going through register renaming stage (approximate)" }, {.name = "PREDICTABLE_FUNCT_RETURNS", .code = 0x6e, .desc = "Number of predictable function returns whose condition codes do not fail (approximate)" }, {.name = "MAIN_UNIT_EXECUTED_INST", .code = 0x70, .desc = "Instructions executed in the main execution, multiply, ALU pipelines (approximate)" }, {.name = "SECOND_UNIT_EXECUTED_INST", .code = 0x71, .desc = "Instructions executed in the second execution pipeline" }, {.name = "LD_ST_UNIT_EXECUTED_INST", .code = 0x72, .desc = "Instructions executed in the Load/Store unit" }, {.name = "FP_EXECUTED_INST", .code = 0x73, .desc = "Floating point instructions going through register renaming stage" }, {.name = "NEON_EXECUTED_INST", .code = 0x74, .desc = "NEON instructions going through register renaming stage (approximate)" }, {.name = "PLD_FULL_DEP_STALL_CYCLES", .code = 0x80, .desc = "Cycles processor is stalled because PLD slots are full (approximate)" }, {.name = "DATA_WR_DEP_STALL_CYCLES", .code = 0x81, .desc = "Cycles processor is stalled due to writes to external memory (approximate)" }, {.name = "ITLB_MISS_DEP_STALL_CYCLES", .code = 0x82, .desc = "Cycles stalled due to main instruction TLB miss (approximate)" }, {.name = "DTLB_MISS_DEP_STALL_CYCLES", .code = 0x83, .desc = "Cycles stalled due to main data TLB miss (approximate)" }, {.name = "MICRO_ITLB_MISS_DEP_STALL_CYCLES", .code = 0x84, .desc = "Cycles stalled due to micro instruction TLB miss (approximate)" }, {.name = "MICRO_DTLB_MISS_DEP_STALL_CYCLES", .code = 0x85, .desc = "Cycles stalled due to micro data TLB miss (approximate)" }, {.name = "DMB_DEP_STALL_CYCLES", .code = 0x86, .desc = "Cycles stalled due to DMB memory barrier (approximate)" }, {.name = "INTGR_CLK_ENABLED_CYCLES", .code = 0x8a, .desc = "Cycles during which integer core clock is enabled (approximate)" }, {.name = "DATA_ENGINE_CLK_EN_CYCLES", .code = 0x8b, .desc = "Cycles during which Data Engine clock is enabled (approximate)" }, {.name = "ISB_INST", .code = 0x90, .desc = "Number of ISB instructions architecturally executed" }, {.name = "DSB_INST", .code = 0x91, .desc = "Number of DSB instructions architecturally executed" }, {.name = "DMB_INST", .code = 0x92, .desc = "Number of DMB instructions architecturally executed (approximate)" }, {.name = "EXT_INTERRUPTS", .code = 0x93, .desc = "Number of External interrupts (approximate)" }, {.name = "PLE_CACHE_LINE_RQST_COMPLETED", .code = 0xa0, .desc = "PLE cache line requests completed" }, {.name = "PLE_CACHE_LINE_RQST_SKIPPED", .code = 0xa1, .desc = "PLE cache line requests skipped" }, {.name = "PLE_FIFO_FLUSH", .code = 0xa2, .desc = "PLE FIFO flushes" }, {.name = "PLE_RQST_COMPLETED", .code = 0xa3, .desc = "PLE requests completed" }, {.name = "PLE_FIFO_OVERFLOW", .code = 0xa4, .desc = "PLE FIFO overflows" }, {.name = "PLE_RQST_PROG", .code = 0xa5, .desc = "PLE requests programmed" }, {.name = "CPU_CYCLES", .code = 0xff, .desc = "CPU cycles" }, }; #define ARM_CORTEX_A9_EVENT_COUNT (sizeof(arm_cortex_a9_pe)/sizeof(arm_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_ivb_events.h0000600003276200002170000021651512247131123022056 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: ivb (Intel Ivy Bridge) * PMU: ivb_ep (Intel Ivy Bridge EP) */ static const intel_x86_umask_t ivb_arith[]={ { .uname = "FPU_DIV_ACTIVE", .udesc = "Cycles that the divider is active, includes integer and floating point", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FPU_DIV", .udesc = "Number of cycles the divider is activated, includes integer and floating point", .uequiv = "FPU_DIV_ACTIVE:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_br_inst_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All macro conditional non-taken branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All macro conditional taken branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_JUMP", .udesc = "All macro unconditional taken branch instructions, excluding calls and indirects", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All non-taken indirect branches that are not calls nor returns", .ucode = 0x4400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_NEAR_RETURN", .udesc = "All taken indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All near executed branches instructions (not necessarily retired)", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_COND", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uequiv = "ALL_COND", .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_br_inst_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All taken and not taken macro branches including far branches (Precise Event)", .ucode = 0x0000, /* architectural encoding */ .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "COND", .udesc = "All taken and not taken macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "FAR_BRANCH", .udesc = "Number of far branch instructions retired (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "All macro direct and indirect near calls, does not count far calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_RETURN", .udesc = "Number of near ret instructions retired (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_TAKEN", .udesc = "Number of near branch taken instructions retired (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "All not taken macro branch instructions retired (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_br_misp_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All non-taken mispredicted macro conditional branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All taken mispredicted macro conditional branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken mispredicted indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_NEAR_RETURN", .udesc = "All taken mispredicted indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken mispredicted non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken mispredicted indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All mispredicted macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_RETURN_NEAR", .udesc = "All mispredicted indirect branches that have a return mnemonic", .ucode = 0xc800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All mispredicted non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All mispredicted indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All mispredicted branch instructions", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_br_misp_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All mispredicted macro branches (Precise Event)", .ucode = 0x0000, /* architectural encoding */ .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "COND", .udesc = "All mispredicted macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "CONDITIONAL", .udesc = "All mispredicted macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .uequiv = "COND", }, { .uname = "NEAR_TAKEN", .udesc = "Number of branch instructions retired that were mispredicted and taken (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_lock_cycles[]={ { .uname = "SPLIT_LOCK_UC_LOCK_DURATION", .udesc = "Cycles in which the L1D and L2 are locked, due to a UC lock or split lock", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CACHE_LOCK_DURATION", .udesc = "Cycles in which the L1D is locked", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_cpl_cycles[]={ { .uname = "RING0", .udesc = "Unhalted core cycles the thread was in ring 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING0_TRANS", .udesc = "Transitions from rings 1, 2, or 3 to ring 0", .uequiv = "RING0:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING123", .udesc = "Unhalted core cycles the thread was in rings 1, 2, or 3", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_cpu_clk_unhalted[]={ { .uname = "REF_P", .udesc = "Cycles when the core is unhalted (count at 100 Mhz)", .ucode = 0x100, .uequiv = "REF_XCLK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "REF_XCLK", .udesc = "Cycles when the core is unhalted (count at 100 Mhz)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_dsb2mite_switches[]={ { .uname = "COUNT", .udesc = "Number of DSB to MITE switches", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_dsb_fill[]={ { .uname = "EXCEED_DSB_LINES", .udesc = "DSB Fill encountered > 3 DSB lines", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_dtlb_load_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes a page walk of any page size", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Demand load miss in all TLB levels which causes a page walk that completes for any page size", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with a walk due to demand loads", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_LD_MISS_CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes a page walk of any page size", .ucode = 0x8100, .uequiv = "MISS_CAUSES_A_WALK", .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_LD_WALK_COMPLETED", .udesc = "Demand load miss in all TLB levels which causes a page walk that completes for any page size", .ucode = 0x8200, .uequiv = "WALK_COMPLETED", .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_LD_WALK_DURATION", .udesc = "Cycles PMH is busy with a walk due to demand loads", .ucode = 0x8400, .uequiv = "WALK_DURATION", .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "Number of load operations that missed L1TLB but hit L2TLB", .ucode = 0x45f, /* override event code */ .uflags= INTEL_X86_NCOMBO | INTEL_X86_CODE_OVERRIDE, }, { .uname = "LARGE_WALK_COMPLETED", .udesc = "Number of large page walks completed for demand loads", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_dtlb_store_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uequiv = "MISS_CAUSES_A_WALK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "First level miss but second level hit; no page walk. Only relevant if multiple levels", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Miss in all TLB levels that causes a page walk that completes of any page size (4K/2M/4M/1G)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with this walk", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_fp_assist[]={ { .uname = "ANY", .udesc = "Cycles with any input/output SSE or FP assists", .ucode = 0x1e00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "SIMD_INPUT", .udesc = "Number of SIMD FP assists due to input values", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SIMD_OUTPUT", .udesc = "Number of SIMD FP assists due to output values", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_INPUT", .udesc = "Number of X87 assists due to input value", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_OUTPUT", .udesc = "Number of X87 assists due to output value", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_icache[]={ { .uname = "MISSES", .udesc = "Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH_STALL", .udesc = "Number of cycles wher a code-fetch stalled due to L1 instruction cache miss or iTLB miss", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_idq[]={ { .uname = "EMPTY", .udesc = "Cycles IDQ is empty", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS", .udesc = "Number of uops delivered to IDQ from MITE path", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS", .udesc = "Number of uops delivered to IDQ from DSB path", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by DSB", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by MITE", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS", .udesc = "Number of uops were delivered to IDQ from MS by either DSB or MITE", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from MITE (MITE active)", .uequiv = "MITE_UOPS:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from DSB (DSB active)", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by DSB", .uequiv = "MS_DSB_UOPS:c=1", .ucode = 0x1000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by MITE", .uequiv = "MS_MITE_UOPS:c=1", .ucode = 0x2000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ from MS by either BSD or MITE", .uequiv = "MS_UOPS:c=1", .ucode = 0x3000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_UOPS", .udesc = "Number of uops delivered from either DSB paths", .ucode = 0x1800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_CYCLES", .udesc = "Cycles MITE/MS delivered anything", .ucode = 0x1800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_CYCLES_4_UOPS", .udesc = "Cycles MITE/MS delivered 4 uops", .ucode = 0x1800 | (0x4 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_UOPS", .udesc = "Number of uops delivered from either MITE paths", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_CYCLES", .udesc = "Cycles DSB/MS delivered anything", .ucode = 0x2400 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_UOPS", .udesc = "Number of uops delivered to IDQ from any path", .ucode = 0x3c00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_OCCUR", .udesc = "Occurences of DSB MS going active", .uequiv = "MS_DSB_UOPS:c=1:e=1", .ucode = 0x1000 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_idq_uops_not_delivered[]={ { .uname = "CORE", .udesc = "Number of non-delivered uops to RAT (use cmask to qualify further)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_ild_stall[]={ { .uname = "LCP", .udesc = "Stall caused by changing prefix length of the instruction", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IQ_FULL", .udesc = "Stall cycles due to IQ full", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_inst_retired[]={ { .uname = "ANY_P", .udesc = "Number of instructions retired", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL", .udesc = "Precise instruction retired event to reduce effect of PEBS shadow IP distribution (Precise Event)", .ucntmsk = 0x2, .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "PREC_DIST", .udesc = "Precise instruction retired event to reduce effect of PEBS shadow IP distribution (Precise Event)", .ucntmsk = 0x2, .uequiv = "ALL", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_itlb[]={ { .uname = "ITLB_FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uequiv = "ITLB_FLUSH", .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l1d[]={ { .uname = "REPLACEMENT", .udesc = "Number of cache lines brought into the L1D cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_move_elimination[]={ { .uname = "INT_NOT_ELIMINATED", .udesc = "Number of integer Move Elimination candidate uops that were not eliminated", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SIMD_NOT_ELIMINATED", .udesc = "Number of SIMD Move Elimination candidate uops that were not eliminated", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INT_ELIMINATED", .udesc = "Number of integer Move Elimination candidate uops that were eliminated", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SIMD_ELIMINATED", .udesc = "Number of SIMD Move Elimination candidate uops that were eliminated", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l1d_pend_miss[]={ { .uname = "OCCURRENCES", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "PENDING:e=1:c=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "EDGE", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "OCCURRENCES", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING", .udesc = "Number of L1D load misses outstanding every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING_CYCLES", .udesc = "Cycles with L1D load misses outstanding", .uequiv = "PENDING:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l2_l1d_wb_rqsts[]={ { .uname = "HIT_E", .udesc = "Non rejected writebacks from L1D to L2 cache lines in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "Non rejected writebacks from L1D to L2 cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "Not rejected writebacks that missed LLC", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "Not rejected writebacks from L1D to L2 cache lines in any state", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l2_lines_in[]={ { .uname = "ANY", .udesc = "L2 cache lines filling (counting does not cover rejects)", .ucode = 0x700, .uequiv = "ALL", .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "L2 cache lines filling (counting does not cover rejects)", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "E", .udesc = "L2 cache lines in E state (counting does not cover rejects)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I", .udesc = "L2 cache lines in I state (counting does not cover rejects)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S", .udesc = "L2 cache lines in S state (counting does not cover rejects)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l2_lines_out[]={ { .uname = "DEMAND_CLEAN", .udesc = "L2 clean line evicted by a demand", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "L2 dirty line evicted by a demand", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_CLEAN", .udesc = "L2 clean line evicted by a prefetch", .ucode = 0x400, .uequiv = "PF_CLEAN", .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_CLEAN", .udesc = "L2 clean line evicted by a prefetch", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_DIRTY", .udesc = "L2 dirty line evicted by an MLC Prefetch", .ucode = 0x800, .uequiv = "PF_DIRTY", .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_DIRTY", .udesc = "L2 dirty line evicted by an MLC Prefetch", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRTY_ANY", .udesc = "Any L2 dirty line evicted (does not cover rejects)", .ucode = 0xa00, .uequiv = "DIRTY_ALL", .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRTY_ALL", .udesc = "Any L2 dirty line evicted (does not cover rejects)", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l2_rqsts[]={ { .uname = "ALL_CODE_RD", .udesc = "Any code request to L2 cache", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_HIT", .udesc = "L2 cache hits when fetching instructions", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_MISS", .udesc = "L2 cache misses when fetching instructions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_DATA_RD", .udesc = "Demand data read requests to L2 cache", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_HIT", .udesc = "Demand data read requests that hit L2", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "Any L2 HW prefetch request to L2 cache", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_HIT", .udesc = "Requests from the L2 hardware prefetchers that hit L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_MISS", .udesc = "Requests from the L2 hardware prefetchers that miss L2 cache", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_RFO", .udesc = "Any RFO requests to L2 cache", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HIT", .udesc = "Store RFO requests that hit L2 cache", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MISS", .udesc = "RFO requests that miss L2 cache", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l2_store_lock_rqsts[]={ { .uname = "MISS", .udesc = "RFOs that miss cache (I state)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "RFOs that hit cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "RFOs that access cache lines in any state", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_l2_trans[]={ { .uname = "ALL", .udesc = "Transactions accessing the L2 pipe", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "CODE_RD", .udesc = "L2 cache accesses when fetching instructions", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writebacks that access the L2 cache", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DMND_DATA_RD", .udesc = "Demand Data Read requests that access the L2 cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_FILL", .udesc = "L2 fill requests that access the L2 cache", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_WB", .udesc = "L2 writebacks that access the L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PREFETCH", .udesc = "L2 or L3 HW prefetches that access the L2 cache (including rejects)", .ucode = 0x800, .uequiv = "ALL_PF", .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "L2 or L3 HW prefetches that access the L2 cache (including rejects)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "RFO requests that access the L2 cache", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_ld_blocks[]={ { .uname = "STORE_FORWARD", .udesc = "Loads blocked by overlapping with store buffer that cannot be forwarded", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NO_SR", .udesc = "Number of times that split load operations are temporarily blocked because all resources for handlding the split accesses are in use", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_ld_blocks_partial[]={ { .uname = "ADDRESS_ALIAS", .udesc = "False dependencies in MOB due to partial compare on address", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_load_hit_pre[]={ { .uname = "HW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for HW prefetch", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for SW prefetch", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_l3_lat_cache[]={ { .uname = "MISS", .udesc = "Core-originated cacheable demand requests missed L3", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCE", .udesc = "Core-originated cacheable demand requests that refer to L3", .ucode = 0x4f00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_machine_clears[]={ { .uname = "MASKMOV", .udesc = "The number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MEMORY_ORDERING", .udesc = "Number of Memory Ordering Machine Clears detected", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_mem_load_uops_llc_hit_retired[]={ { .uname = "XSNP_HIT", .udesc = "Load LLC Hit and a cross-core Snoop hits in on-pkg core cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_HITM", .udesc = "Load had HitM Response from a core on same socket (shared LLC) (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_MISS", .udesc = "Load LLC Hit and a cross-core Snoop missed in on-pkg core cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_NONE", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_mem_load_uops_llc_miss_retired[]={ { .uname = "LOCAL_DRAM", .udesc = "Number of retired load uops that missed L3 but were service by local RAM (Precise Event)", .ucode = 0x100, .umodel = PFM_PMU_INTEL_IVB, .uflags = INTEL_X86_DFL | INTEL_X86_PEBS, }, { .uname = "LOCAL_DRAM", .udesc = "Number of retired load uops that missed L3 but were service by local RAM, Snoop not needed, Snoop Miss, or Snoop Hit data not forwarded (Precise Event)", .ucode = 0x300, .umodel = PFM_PMU_INTEL_IVB_EP, .uflags = INTEL_X86_DFL | INTEL_X86_PEBS, }, { .uname = "REMOTE_DRAM", .udesc = "Number of retired load uops that missed L3 but were service by remote RAM, snoop not needed, snoop miss, snoop hit data not forwarded (Precise Event)", .ucode = 0xc00, .umodel = PFM_PMU_INTEL_IVB_EP, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_HITM", .udesc = "Number of retired load uops whose data sources was remote HITM (Precise Event)", .ucode = 0x1000, .umodel = PFM_PMU_INTEL_IVB_EP, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_FWD", .udesc = "Load uops that miss in the L3 whose data source was forwarded from a remote cache (Precise Event)", .ucode = 0x2000, .umodel = PFM_PMU_INTEL_IVB_EP, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_mem_load_uops_retired[]={ { .uname = "HIT_LFB", .udesc = "A load missed L1D but hit the Fill Buffer (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1_HIT", .udesc = "Load hit in nearest-level (L1D) cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Load hit in mid-level (L2) cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_MISS", .udesc = "Load misses in mid-level (L2) cache (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_HIT", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Load miss in last-level (L3) cache (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_mem_trans_retired[]={ { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory load instructions retired above programmed clocks, minimum threshold value is 3 (Precise Event and ldlat required)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT, }, { .uname = "PRECISE_STORE", .udesc = "Capture where stores occur, must use with PEBS (Precise Event required)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_mem_uops_retired[]={ { .uname = "ALL_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uequiv = "ALL_LOADS", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_LOADS", .udesc = "Locked retired loads (Precise Event)", .ucode = 0x2100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uequiv = "ALL_STORES", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_LOADS", .udesc = "Retired loads causing cacheline splits (Precise Event)", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_STORES", .udesc = "Retired stores causing cacheline splits (Precise Event)", .ucode = 0x4200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_LOADS", .udesc = "STLB misses dues to retired loads (Precise Event)", .ucode = 0x1100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_STORES", .udesc = "STLB misses dues to retired stores (Precise Event)", .ucode = 0x1200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_misalign_mem_ref[]={ { .uname = "LOADS", .udesc = "Speculative cache-line split load uops dispatched to the L1D", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORES", .udesc = "Speculative cache-line split Store-address uops dispatched to L1D", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_offcore_requests[]={ { .uname = "ALL_DATA_RD", .udesc = "Demand and prefetch read requests sent to uncore", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_READ", .udesc = "Demand and prefetch read requests sent to uncore", .uequiv = "ALL_DATA_RD", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Offcore code read requests, including cacheable and un-cacheables", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand Data Read requests sent to uncore", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Offcore Demand RFOs, includes regular RFO, Locks, ItoM", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_offcore_requests_outstanding[]={ { .uname = "ALL_DATA_RD_CYCLES", .udesc = "Cycles with cacheable data read transactions in the superQ", .uequiv = "ALL_DATA_RD:c=1", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD_CYCLES", .udesc = "Cycles with demand code reads transactions in the superQ", .uequiv = "DEMAND_CODE_RD:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_CYCLES", .udesc = "Cycles with demand data read transactions in the superQ", .uequiv = "DEMAND_DATA_RD:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_RD", .udesc = "Cacheable data read transactions in the superQ every cycle", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Code read transactions in the superQ every cycle", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand data read transactions in the superQ every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Outstanding RFO (store) transactions in the superQ every cycle", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_CYCLES", .udesc = "Cycles with outstanding RFO (store) transactions in the superQ", .uequiv = "DEMAND_RFO:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_other_assists[]={ { .uname = "AVX_TO_SSE", .udesc = "Number of transitions from AVX-256 to legacy SSE when penalty applicable", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_TO_AVX", .udesc = "Number of transitions from legacy SSE to AVX-256 when penalty applicable", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AVX_STORE", .udesc = "Number of assists associated with 25-bit AVX stores", .ucode = 0x0800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WB", .udesc = "Number of times the microcode assist is invoked by hardware upon uop writeback", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_resource_stalls[]={ { .uname = "ANY", .udesc = "Cycles stalled due to Resource Related reason", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "RS", .udesc = "Cycles stalled due to no eligible RS entry available", .ucode = 0x400, }, { .uname = "SB", .udesc = "Cycles stalled due to no store buffers available (not including draining from sync)", .ucode = 0x800, }, { .uname = "ROB", .udesc = "Cycles stalled due to re-order buffer full", .ucode = 0x1000, }, }; static const intel_x86_umask_t ivb_rob_misc_events[]={ { .uname = "LBR_INSERTS", .udesc = "Count each time an new LBR record is saved by HW", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_rs_events[]={ { .uname = "EMPTY_CYCLES", .udesc = "Cycles the RS is empty for this thread", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_tlb_access[]={ { .uname = "STLB_HIT", .udesc = "Number of load operations that missed L1TLB but hit L2TLB", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "LOAD_STLB_HIT", .udesc = "Number of load operations that missed L1TLB but hit L2TLB", .ucode = 0x400, .uequiv= "STLB_HIT", .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_tlb_flush[]={ { .uname = "DTLB_THREAD", .udesc = "Number of DTLB flushes of thread-specific entries", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_ANY", .udesc = "Number of STLB flushes", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_uops_executed[]={ { .uname = "CORE", .udesc = "Counts total number of uops executed from any thread per cycle", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD", .udesc = "Counts total number of uops executed per thread each cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_uops_dispatched_port[]={ { .uname = "PORT_0", .udesc = "Cycles which a Uop is dispatched on port 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_1", .udesc = "Cycles which a Uop is dispatched on port 1", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2", .udesc = "Cycles in which a uop is dispatched on port 2", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3", .udesc = "Cycles in which a uop is disptached on port 3", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_4", .udesc = "Cycles which a uop is dispatched on port 4", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_5", .udesc = "Cycles which a Uop is dispatched on port 5", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_uops_issued[]={ { .uname = "ANY", .udesc = "Number of uops issued by the RAT to the Reservation Station (RS)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles no uops issued on this core (by any thread)", .uequiv = "ANY:c=1:i=1:t=1", .ucode = 0x100 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no uops issued by this thread", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "FLAGS_MERGE", .udesc = "Number of flags-merge uops allocated. Such uops adds delay", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SLOW_LEA", .udesc = "Number of slow LEA or similar uops allocated", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SINGLE_MUL", .udesc = "Number of multiply packed/scalar single precision uops allocated", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_uops_retired[]={ { .uname = "ALL", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uequiv= "ALL", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "RETIRE_SLOTS", .udesc = "Number of retirement slots used (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no executable uop retired (Precise Event)", .uequiv = "ALL:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Total cycles using precise uop retired event (Precise Event)", .uequiv = "ALL:c=16", .ucode = 0x100 | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t ivb_offcore_response[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 1ULL << (0 + 8), .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO prefetches", .ucode = 1ULL << (1 + 8), .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 1ULL << (2 + 8), .grpid = 0, }, { .uname = "WB", .udesc = "Request: number of writebacks (modified to exclusive) transactions", .ucode = 1ULL << (3 + 8), .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: number of data cacheline reads generated by L2 prefetchers", .ucode = 1ULL << (4 + 8), .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetchers", .ucode = 1ULL << (5 + 8), .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: number of code reads generated by L2 prefetchers", .ucode = 1ULL << (6 + 8), .grpid = 0, }, { .uname = "PF_LLC_DATA_RD", .udesc = "Request: number of L3 prefetcher requests to L2 for loads", .ucode = 1ULL << (7 + 8), .grpid = 0, }, { .uname = "PF_LLC_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetcher", .ucode = 1ULL << (8 + 8), .grpid = 0, }, { .uname = "PF_LLC_IFETCH", .udesc = "Request: number of L2 prefetcher requests to L3 for instruction fetches", .ucode = 1ULL << (9 + 8), .grpid = 0, }, { .uname = "BUS_LOCKS", .udesc = "Request: number bus lock and split lock requests", .ucode = 1ULL << (10 + 8), .grpid = 0, }, { .uname = "STRM_ST", .udesc = "Request: number of streaming store requests", .ucode = 1ULL << (11 + 8), .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 1ULL << (15+8), .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH | PF_LLC_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH:PF_LLC_IFETCH", .ucode = 0x24400, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all request umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER", .ucode = 0x8fff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: combination of DMND_DATA | PF_DATA_RD | PF_LLC_DATA_RD", .uequiv = "DMND_DATA_RD:PF_DATA_RD:PF_LLC_DATA_RD", .ucode = 0x9100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO | PF_LLC_RFO", .uequiv = "DMND_RFO:PF_RFO:PF_LLC_RFO", .ucode = 0x12200, .grpid = 0, }, { .uname = "ANY_RESPONSE", .udesc = "Response: count any response type", .ucode = 1ULL << (16+8), .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_EXCL_GRP_GT, .grpid = 1, }, { .uname = "NO_SUPP", .udesc = "Supplier: counts number of times supplier information is not available", .ucode = 1ULL << (17+8), .grpid = 1, }, { .uname = "LLC_HITM", .udesc = "Supplier: counts L3 hits in M-state (initial lookup)", .ucode = 1ULL << (18+8), .grpid = 1, }, { .uname = "LLC_HITE", .udesc = "Supplier: counts L3 hits in E-state", .ucode = 1ULL << (19+8), .grpid = 1, }, { .uname = "LLC_HITS", .udesc = "Supplier: counts L3 hits in S-state", .ucode = 1ULL << (20+8), .grpid = 1, }, { .uname = "LLC_HITF", .udesc = "Supplier: counts L3 hits in F-state", .ucode = 1ULL << (21+8), .grpid = 1, }, { .uname = "LLC_MISS_LOCAL", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .grpid = 1, }, { .uname = "LLC_MISS_REMOTE", .udesc = "Supplier: counts L3 misses to remote DRAM", .ucode = 0xffULL << (23+8), .uequiv = "LLC_MISS_REMOTE_DRAM", .umodel = PFM_PMU_INTEL_IVB_EP, .grpid = 1, }, { .uname = "LLC_MISS_REMOTE_DRAM", .udesc = "Supplier: counts L3 misses to remote DRAM", .ucode = 0xffULL << (23+8), .umodel = PFM_PMU_INTEL_IVB_EP, .grpid = 1, }, { .uname = "LLC_HITMESF", .udesc = "Supplier: counts L3 hits in any state (M, E, S, F)", .ucode = 0xfULL << (18+8), .uequiv = "LLC_HITM:LLC_HITE:LLC_HITS:LLC_HITF", .grpid = 1, }, { .uname = "SNP_NONE", .udesc = "Snoop: counts number of times no snoop-related information is available", .ucode = 1ULL << (31+8), .grpid = 2, }, { .uname = "SNP_NOT_NEEDED", .udesc = "Snoop: counts the number of times no snoop was needed to satisfy the request", .ucode = 1ULL << (32+8), .grpid = 2, }, { .uname = "SNP_MISS", .udesc = "Snoop: counts number of times a snoop was needed and it missed all snooped caches", .ucode = 1ULL << (33+8), .grpid = 2, }, { .uname = "SNP_NO_FWD", .udesc = "Snoop: counts number of times a snoop was needed and it hit in at leas one snooped cache", .ucode = 1ULL << (34+8), .grpid = 2, }, { .uname = "SNP_FWD", .udesc = "Snoop: counts number of times a snoop was needed and data was forwarded from a remote socket", .ucode = 1ULL << (35+8), .grpid = 2, }, { .uname = "HITM", .udesc = "Snoop: counts number of times a snoop was needed and it hitM-ed in local or remote cache", .ucode = 1ULL << (36+8), .grpid = 2, }, { .uname = "NON_DRAM", .udesc = "Snoop: counts number of times target was a non-DRAM system address. This includes MMIO transactions", .ucode = 1ULL << (37+8), .grpid = 2, }, { .uname = "SNP_ANY", .udesc = "Snoop: any snoop reason", .ucode = 0x7fULL << (31+8), .uequiv = "SNP_NONE:SNP_NOT_NEEDED:SNP_MISS:SNP_NO_FWD:SNP_FWD:HITM:NON_DRAM", .uflags= INTEL_X86_DFL, .grpid = 2, }, }; static const intel_x86_umask_t ivb_baclears[]={ { .uname = "ANY", .udesc = "Counts the number of times the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t ivb_cycle_activity[]={ { .uname = "CYCLES_L2_PENDING", .udesc = "Cycles with pending L2 miss loads", .ucode = 0x0100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, .ucntmsk= 0xf, }, { .uname = "CYCLES_LDM_PENDING", .udesc = "Cycles with pending memory loads", .ucode = 0x0200 | (0x2 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, }, { .uname = "CYCLES_L1D_PENDING", .udesc = "Cycles with pending L1D load cache misses", .ucode = 0x0800 | (0x8 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CYCLES_NO_EXECUTE", .udesc = "Cycles of dispatch stalls", .ucode = 0x0400 | (0x4 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L2_PENDING", .udesc = "Execution stalls due to L2 pending loads", .ucode = 0x0500 | (0x5 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L1D_PENDING", .udesc = "Execution stalls due to L1D pending loads", .ucode = 0x0c00 | (0xc << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_LDM_PENDING", .udesc = "Execution stalls due to memory loads", .ucode = 0x0600 | (0x6 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_fp_comp_ops_exe[]={ { .uname = "X87", .udesc = "Number of X87 uops executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_PACKED_DOUBLE", .udesc = "Number of SSE or AVX-128 double precision FP packed uops executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_SCALAR_SINGLE", .udesc = "Number of SSE or AVX-128 single precision FP scalar uops executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_PACKED_SINGLE", .udesc = "Number of SSE or AVX-128 single precision FP packed uops executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SCALAR_DOUBLE", .udesc = "Number of SSE or AVX-128 double precision FP scalar uops executed", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_simd_fp_256[]={ { .uname = "PACKED_SINGLE", .udesc = "Counts 256-bit packed single-precision", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_DOUBLE", .udesc = "Counts 256-bit packed double-precision", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t ivb_lsd[]={ { .uname = "UOPS", .udesc = "Number of uops delivered by the Loop Stream Detector (LSD)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_entry_t intel_ivb_pe[]={ { .name = "ARITH", .desc = "Counts arithmetic multiply operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x14, .numasks = LIBPFM_ARRAY_SIZE(ivb_arith), .ngrp = 1, .umasks = ivb_arith, }, { .name = "BACLEARS", .desc = "Branch resteered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(ivb_baclears), .ngrp = 1, .umasks = ivb_baclears, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x88, .numasks = LIBPFM_ARRAY_SIZE(ivb_br_inst_exec), .ngrp = 1, .umasks = ivb_br_inst_exec, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_br_inst_retired), .ngrp = 1, .umasks = ivb_br_inst_retired, }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x89, .numasks = LIBPFM_ARRAY_SIZE(ivb_br_misp_exec), .ngrp = 1, .umasks = ivb_br_misp_exec, }, { .name = "BR_MISP_RETIRED", .desc = "Mispredicted retired branches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc5, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_br_misp_retired), .ngrp = 1, .umasks = ivb_br_misp_retired, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_MISP_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc5, }, { .name = "LOCK_CYCLES", .desc = "Locked cycles in L1D and L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(ivb_lock_cycles), .ngrp = 1, .umasks = ivb_lock_cycles, }, { .name = "CPL_CYCLES", .desc = "Unhalted core cycles at a specific ring level", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5c, .numasks = LIBPFM_ARRAY_SIZE(ivb_cpl_cycles), .ngrp = 1, .umasks = ivb_cpl_cycles, }, { .name = "CPU_CLK_UNHALTED", .desc = "Cycles when processor is not in halted state", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(ivb_cpu_clk_unhalted), .ngrp = 1, .umasks = ivb_cpu_clk_unhalted, }, { .name = "DSB2MITE_SWITCHES", .desc = "Number of DSB to MITE switches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xab, .numasks = LIBPFM_ARRAY_SIZE(ivb_dsb2mite_switches), .ngrp = 1, .umasks = ivb_dsb2mite_switches, }, { .name = "DSB_FILL", .desc = "DSB fills", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xac, .numasks = LIBPFM_ARRAY_SIZE(ivb_dsb_fill), .ngrp = 1, .umasks = ivb_dsb_fill, }, { .name = "DTLB_LOAD_MISSES", .desc = "Data TLB load misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(ivb_dtlb_load_misses), .ngrp = 1, .umasks = ivb_dtlb_load_misses, }, { .name = "DTLB_STORE_MISSES", .desc = "Data TLB store misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(ivb_dtlb_store_misses), .ngrp = 1, .umasks = ivb_dtlb_store_misses, }, { .name = "FP_ASSIST", .desc = "X87 Floating point assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xca, .numasks = LIBPFM_ARRAY_SIZE(ivb_fp_assist), .ngrp = 1, .umasks = ivb_fp_assist, }, { .name = "ICACHE", .desc = "Instruction Cache accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(ivb_icache), .ngrp = 1, .umasks = ivb_icache, }, { .name = "IDQ", .desc = "IDQ operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x79, .numasks = LIBPFM_ARRAY_SIZE(ivb_idq), .ngrp = 1, .umasks = ivb_idq, }, { .name = "IDQ_UOPS_NOT_DELIVERED", .desc = "Uops not delivered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x9c, .numasks = LIBPFM_ARRAY_SIZE(ivb_idq_uops_not_delivered), .ngrp = 1, .umasks = ivb_idq_uops_not_delivered, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(ivb_ild_stall), .ngrp = 1, .umasks = ivb_ild_stall, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_inst_retired), .ngrp = 1, .umasks = ivb_inst_retired, }, { .name = "INSTRUCTION_RETIRED", .desc = "Number of instructions at retirement", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "ITLB", .desc = "Instruction TLB", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xae, .numasks = LIBPFM_ARRAY_SIZE(ivb_itlb), .ngrp = 1, .umasks = ivb_itlb, }, { .name = "ITLB_MISSES", .desc = "Instruction TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(ivb_dtlb_store_misses), .ngrp = 1, .umasks = ivb_dtlb_store_misses, /* identical to actual umasks list for this event */ }, { .name = "L1D", .desc = "L1D cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x51, .numasks = LIBPFM_ARRAY_SIZE(ivb_l1d), .ngrp = 1, .umasks = ivb_l1d, }, { .name = "MOVE_ELIMINATION", .desc = "Move Elimination", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x58, .numasks = LIBPFM_ARRAY_SIZE(ivb_move_elimination), .ngrp = 1, .umasks = ivb_move_elimination, }, { .name = "L1D_PEND_MISS", .desc = "L1D pending misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x4, .code = 0x48, .numasks = LIBPFM_ARRAY_SIZE(ivb_l1d_pend_miss), .ngrp = 1, .umasks = ivb_l1d_pend_miss, }, { .name = "L2_L1D_WB_RQSTS", .desc = "Writeback requests from L1D to L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_l1d_wb_rqsts), .ngrp = 1, .umasks = ivb_l2_l1d_wb_rqsts, }, { .name = "L2_LINES_IN", .desc = "L2 lines alloacated", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf1, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_lines_in), .ngrp = 1, .umasks = ivb_l2_lines_in, }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf2, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_lines_out), .ngrp = 1, .umasks = ivb_l2_lines_out, }, { .name = "L2_RQSTS", .desc = "L2 requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_rqsts), .ngrp = 1, .umasks = ivb_l2_rqsts, }, { .name = "L2_STORE_LOCK_RQSTS", .desc = "L2 store lock requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_store_lock_rqsts), .ngrp = 1, .umasks = ivb_l2_store_lock_rqsts, }, { .name = "L2_TRANS", .desc = "L2 transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf0, .numasks = LIBPFM_ARRAY_SIZE(ivb_l2_trans), .ngrp = 1, .umasks = ivb_l2_trans, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for L3_LAT_CACHE:MISS", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:MISS", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LLC_MISSES", .desc = "Alias for LAST_LEVEL_CACHE_MISSES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_MISSES", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for L3_LAT_CACHE:REFERENCE", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:REFERENCE", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LLC_REFERENCES", .desc = "Alias for LAST_LEVEL_CACHE_REFERENCES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_REFERENCES", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LD_BLOCKS", .desc = "Blocking loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(ivb_ld_blocks), .ngrp = 1, .umasks = ivb_ld_blocks, }, { .name = "LD_BLOCKS_PARTIAL", .desc = "Partial load blocks", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(ivb_ld_blocks_partial), .ngrp = 1, .umasks = ivb_ld_blocks_partial, }, { .name = "LOAD_HIT_PRE", .desc = "Load dispatches that hit fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(ivb_load_hit_pre), .ngrp = 1, .umasks = ivb_load_hit_pre, }, { .name = "L3_LAT_CACHE", .desc = "Core-originated cacheable demand requests to L3", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(ivb_l3_lat_cache), .ngrp = 1, .umasks = ivb_l3_lat_cache, }, { .name = "LONGEST_LAT_CACHE", .desc = "Core-originated cacheable demand requests to L3", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(ivb_l3_lat_cache), .ngrp = 1, .equiv = "L3_LAT_CACHE", .umasks = ivb_l3_lat_cache, }, { .name = "MACHINE_CLEARS", .desc = "Machine clear asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(ivb_machine_clears), .ngrp = 1, .umasks = ivb_machine_clears, }, { .name = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .desc = "L3 hit loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = ivb_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_LLC_HIT_RETIRED", .desc = "L3 hit loads uops retired (deprecated use MEM_LOAD_UOPS_LLC_HIT_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .equiv = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = ivb_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_UOPS_LLC_MISS_RETIRED", .desc = "Load uops retired that missed the LLC", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd3, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_load_uops_llc_miss_retired), .ngrp = 1, .umasks = ivb_mem_load_uops_llc_miss_retired, }, { .name = "MEM_LOAD_UOPS_RETIRED", .desc = "Memory loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_load_uops_retired), .ngrp = 1, .umasks = ivb_mem_load_uops_retired, }, { .name = "MEM_LOAD_RETIRED", .desc = "Memory loads uops retired (deprecated use MEM_LOAD_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .equiv = "MEM_LOAD_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_load_uops_retired), .ngrp = 1, .umasks = ivb_mem_load_uops_retired, }, { .name = "MEM_TRANS_RETIRED", .desc = "Memory transactions retired", .modmsk = INTEL_V3_ATTRS | _INTEL_X86_ATTR_LDLAT, .cntmsk = 0x8, .code = 0xcd, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_trans_retired), .ngrp = 1, .umasks = ivb_mem_trans_retired, }, { .name = "MEM_UOPS_RETIRED", .desc = "Memory uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_uops_retired), .ngrp = 1, .umasks = ivb_mem_uops_retired, }, { .name = "MEM_UOP_RETIRED", .desc = "Memory uops retired (deprecated use MEM_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .equiv = "MEM_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_mem_uops_retired), .ngrp = 1, .umasks = ivb_mem_uops_retired, }, { .name = "MISALIGN_MEM_REF", .desc = "Misaligned memory references", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(ivb_misalign_mem_ref), .ngrp = 1, .umasks = ivb_misalign_mem_ref, }, { .name = "OFFCORE_REQUESTS", .desc = "Offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(ivb_offcore_requests), .ngrp = 1, .umasks = ivb_offcore_requests, }, { .name = "OFFCORE_REQUESTS_OUTSTANDING", .desc = "Outstanding offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(ivb_offcore_requests_outstanding), .ngrp = 1, .umasks = ivb_offcore_requests_outstanding, }, { .name = "OTHER_ASSISTS", .desc = "Count hardware assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc1, .numasks = LIBPFM_ARRAY_SIZE(ivb_other_assists), .ngrp = 1, .umasks = ivb_other_assists, }, { .name = "RESOURCE_STALLS", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa2, .numasks = LIBPFM_ARRAY_SIZE(ivb_resource_stalls), .ngrp = 1, .umasks = ivb_resource_stalls, }, { .name = "CYCLE_ACTIVITY", .desc = "Stalled cycles", .modmsk = INTEL_V3_ATTRS & ~_INTEL_X86_ATTR_C, .cntmsk = 0xff, .code = 0xa3, .numasks = LIBPFM_ARRAY_SIZE(ivb_cycle_activity), .ngrp = 1, .umasks = ivb_cycle_activity, }, { .name = "ROB_MISC_EVENTS", .desc = "Reorder buffer events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(ivb_rob_misc_events), .ngrp = 1, .umasks = ivb_rob_misc_events, }, { .name = "RS_EVENTS", .desc = "Reservation station events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5e, .numasks = LIBPFM_ARRAY_SIZE(ivb_rs_events), .ngrp = 1, .umasks = ivb_rs_events, }, { .name = "DTLB_LOAD_ACCESS", .desc = "TLB access", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5f, .numasks = LIBPFM_ARRAY_SIZE(ivb_tlb_access), .ngrp = 1, .umasks = ivb_tlb_access, }, { .name = "TLB_ACCESS", .desc = "TLB access", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5f, .numasks = LIBPFM_ARRAY_SIZE(ivb_tlb_access), .ngrp = 1, .equiv = "DTLB_LOAD_ACCESS", .umasks = ivb_tlb_access, }, { .name = "TLB_FLUSH", .desc = "TLB flushes", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xbd, .numasks = LIBPFM_ARRAY_SIZE(ivb_tlb_flush), .ngrp = 1, .umasks = ivb_tlb_flush, }, { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "UOPS_EXECUTED", .desc = "Uops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(ivb_uops_executed), .ngrp = 1, .umasks = ivb_uops_executed, }, { .name = "UOPS_DISPATCHED_PORT", .desc = "Uops dispatch to specific ports", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa1, .numasks = LIBPFM_ARRAY_SIZE(ivb_uops_dispatched_port), .ngrp = 1, .umasks = ivb_uops_dispatched_port, }, { .name = "UOPS_ISSUED", .desc = "Uops issued", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe, .numasks = LIBPFM_ARRAY_SIZE(ivb_uops_issued), .ngrp = 1, .umasks = ivb_uops_issued, }, { .name = "UOPS_RETIRED", .desc = "Uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(ivb_uops_retired), .ngrp = 1, .umasks = ivb_uops_retired, }, { .name = "FP_COMP_OPS_EXE", .desc = "Counts number of floating point events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(ivb_fp_comp_ops_exe), .ngrp = 1, .umasks = ivb_fp_comp_ops_exe, }, { .name = "SIMD_FP_256", .desc = "Counts 256-bit packed floating point instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x11, .numasks = LIBPFM_ARRAY_SIZE(ivb_simd_fp_256), .ngrp = 1, .umasks = ivb_simd_fp_256, }, { .name = "LSD", .desc = "Loop stream detector", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa8, .numasks = LIBPFM_ARRAY_SIZE(ivb_lsd), .ngrp = 1, .umasks = ivb_lsd, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(ivb_offcore_response), .ngrp = 3, .umasks = ivb_offcore_response, }, { .name = "OFFCORE_RESPONSE_1", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1bb, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(ivb_offcore_response), .ngrp = 3, .umasks = ivb_offcore_response, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/itanium_events.h0000600003276200002170000006674112247131123021555 0ustar ralphundrgrad/* * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux/ia64. */ /* * This file is generated automatically * !! DO NOT CHANGE !! */ /* * Events table for the Itanium PMU family */ static pme_ita_entry_t itanium_pe []={ #define PME_ITA_ALAT_INST_CHKA_LDC_ALL 0 { "ALAT_INST_CHKA_LDC_ALL", {0x30036} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_INST_CHKA_LDC_FP 1 { "ALAT_INST_CHKA_LDC_FP", {0x10036} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_INST_CHKA_LDC_INT 2 { "ALAT_INST_CHKA_LDC_INT", {0x20036} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_INST_FAILED_CHKA_LDC_ALL 3 { "ALAT_INST_FAILED_CHKA_LDC_ALL", {0x30037} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_INST_FAILED_CHKA_LDC_FP 4 { "ALAT_INST_FAILED_CHKA_LDC_FP", {0x10037} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_INST_FAILED_CHKA_LDC_INT 5 { "ALAT_INST_FAILED_CHKA_LDC_INT", {0x20037} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_ALAT_REPLACEMENT_ALL 6 { "ALAT_REPLACEMENT_ALL", {0x30038} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_ALAT_REPLACEMENT_FP 7 { "ALAT_REPLACEMENT_FP", {0x10038} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_ALAT_REPLACEMENT_INT 8 { "ALAT_REPLACEMENT_INT", {0x20038} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_ALL_STOPS_DISPERSED 9 { "ALL_STOPS_DISPERSED", {0x2f} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_BRANCH_EVENT 10 { "BRANCH_EVENT", {0x811} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_ALL_PATHS_ALL_PREDICTIONS 11 { "BRANCH_MULTIWAY_ALL_PATHS_ALL_PREDICTIONS", {0xe} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_ALL_PATHS_CORRECT_PREDICTIONS 12 { "BRANCH_MULTIWAY_ALL_PATHS_CORRECT_PREDICTIONS", {0x1000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_ALL_PATHS_WRONG_PATH 13 { "BRANCH_MULTIWAY_ALL_PATHS_WRONG_PATH", {0x2000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_ALL_PATHS_WRONG_TARGET 14 { "BRANCH_MULTIWAY_ALL_PATHS_WRONG_TARGET", {0x3000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_NOT_TAKEN_ALL_PREDICTIONS 15 { "BRANCH_MULTIWAY_NOT_TAKEN_ALL_PREDICTIONS", {0x8000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_NOT_TAKEN_CORRECT_PREDICTIONS 16 { "BRANCH_MULTIWAY_NOT_TAKEN_CORRECT_PREDICTIONS", {0x9000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_NOT_TAKEN_WRONG_PATH 17 { "BRANCH_MULTIWAY_NOT_TAKEN_WRONG_PATH", {0xa000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_NOT_TAKEN_WRONG_TARGET 18 { "BRANCH_MULTIWAY_NOT_TAKEN_WRONG_TARGET", {0xb000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_TAKEN_ALL_PREDICTIONS 19 { "BRANCH_MULTIWAY_TAKEN_ALL_PREDICTIONS", {0xc000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_TAKEN_CORRECT_PREDICTIONS 20 { "BRANCH_MULTIWAY_TAKEN_CORRECT_PREDICTIONS", {0xd000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_TAKEN_WRONG_PATH 21 { "BRANCH_MULTIWAY_TAKEN_WRONG_PATH", {0xe000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_MULTIWAY_TAKEN_WRONG_TARGET 22 { "BRANCH_MULTIWAY_TAKEN_WRONG_TARGET", {0xf000e} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_NOT_TAKEN 23 { "BRANCH_NOT_TAKEN", {0x8000d} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_1ST_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED 24 { "BRANCH_PATH_1ST_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED", {0x6000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_1ST_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED 25 { "BRANCH_PATH_1ST_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED", {0x4000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_1ST_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED 26 { "BRANCH_PATH_1ST_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED", {0x7000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_1ST_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED 27 { "BRANCH_PATH_1ST_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED", {0x5000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_2ND_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED 28 { "BRANCH_PATH_2ND_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED", {0xa000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_2ND_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED 29 { "BRANCH_PATH_2ND_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED", {0x8000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_2ND_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED 30 { "BRANCH_PATH_2ND_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED", {0xb000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_2ND_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED 31 { "BRANCH_PATH_2ND_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED", {0x9000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_3RD_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED 32 { "BRANCH_PATH_3RD_STAGE_NT_OUTCOMES_CORRECTLY_PREDICTED", {0xe000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_3RD_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED 33 { "BRANCH_PATH_3RD_STAGE_NT_OUTCOMES_INCORRECTLY_PREDICTED", {0xc000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_3RD_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED 34 { "BRANCH_PATH_3RD_STAGE_TK_OUTCOMES_CORRECTLY_PREDICTED", {0xf000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_3RD_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED 35 { "BRANCH_PATH_3RD_STAGE_TK_OUTCOMES_INCORRECTLY_PREDICTED", {0xd000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_ALL_NT_OUTCOMES_CORRECTLY_PREDICTED 36 { "BRANCH_PATH_ALL_NT_OUTCOMES_CORRECTLY_PREDICTED", {0x2000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_ALL_NT_OUTCOMES_INCORRECTLY_PREDICTED 37 { "BRANCH_PATH_ALL_NT_OUTCOMES_INCORRECTLY_PREDICTED", {0xf} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_ALL_TK_OUTCOMES_CORRECTLY_PREDICTED 38 { "BRANCH_PATH_ALL_TK_OUTCOMES_CORRECTLY_PREDICTED", {0x3000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PATH_ALL_TK_OUTCOMES_INCORRECTLY_PREDICTED 39 { "BRANCH_PATH_ALL_TK_OUTCOMES_INCORRECTLY_PREDICTED", {0x1000f} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_1ST_STAGE_ALL_PREDICTIONS 40 { "BRANCH_PREDICTOR_1ST_STAGE_ALL_PREDICTIONS", {0x40010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_1ST_STAGE_CORRECT_PREDICTIONS 41 { "BRANCH_PREDICTOR_1ST_STAGE_CORRECT_PREDICTIONS", {0x50010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_1ST_STAGE_WRONG_PATH 42 { "BRANCH_PREDICTOR_1ST_STAGE_WRONG_PATH", {0x60010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_1ST_STAGE_WRONG_TARGET 43 { "BRANCH_PREDICTOR_1ST_STAGE_WRONG_TARGET", {0x70010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_2ND_STAGE_ALL_PREDICTIONS 44 { "BRANCH_PREDICTOR_2ND_STAGE_ALL_PREDICTIONS", {0x80010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_2ND_STAGE_CORRECT_PREDICTIONS 45 { "BRANCH_PREDICTOR_2ND_STAGE_CORRECT_PREDICTIONS", {0x90010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_2ND_STAGE_WRONG_PATH 46 { "BRANCH_PREDICTOR_2ND_STAGE_WRONG_PATH", {0xa0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_2ND_STAGE_WRONG_TARGET 47 { "BRANCH_PREDICTOR_2ND_STAGE_WRONG_TARGET", {0xb0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_3RD_STAGE_ALL_PREDICTIONS 48 { "BRANCH_PREDICTOR_3RD_STAGE_ALL_PREDICTIONS", {0xc0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_3RD_STAGE_CORRECT_PREDICTIONS 49 { "BRANCH_PREDICTOR_3RD_STAGE_CORRECT_PREDICTIONS", {0xd0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_3RD_STAGE_WRONG_PATH 50 { "BRANCH_PREDICTOR_3RD_STAGE_WRONG_PATH", {0xe0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_3RD_STAGE_WRONG_TARGET 51 { "BRANCH_PREDICTOR_3RD_STAGE_WRONG_TARGET", {0xf0010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_ALL_ALL_PREDICTIONS 52 { "BRANCH_PREDICTOR_ALL_ALL_PREDICTIONS", {0x10} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_ALL_CORRECT_PREDICTIONS 53 { "BRANCH_PREDICTOR_ALL_CORRECT_PREDICTIONS", {0x10010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_ALL_WRONG_PATH 54 { "BRANCH_PREDICTOR_ALL_WRONG_PATH", {0x20010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_PREDICTOR_ALL_WRONG_TARGET 55 { "BRANCH_PREDICTOR_ALL_WRONG_TARGET", {0x30010} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_TAKEN_SLOT_0 56 { "BRANCH_TAKEN_SLOT_0", {0x1000d} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_TAKEN_SLOT_1 57 { "BRANCH_TAKEN_SLOT_1", {0x2000d} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BRANCH_TAKEN_SLOT_2 58 { "BRANCH_TAKEN_SLOT_2", {0x4000d} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_BUS_ALL_ANY 59 { "BUS_ALL_ANY", {0x10047} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_ALL_IO 60 { "BUS_ALL_IO", {0x40047} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_ALL_SELF 61 { "BUS_ALL_SELF", {0x20047} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_BRQ_LIVE_REQ_HI 62 { "BUS_BRQ_LIVE_REQ_HI", {0x5c} , 0xf0, 2, {0xffff0000}, NULL}, #define PME_ITA_BUS_BRQ_LIVE_REQ_LO 63 { "BUS_BRQ_LIVE_REQ_LO", {0x5b} , 0xf0, 2, {0xffff0000}, NULL}, #define PME_ITA_BUS_BRQ_REQ_INSERTED 64 { "BUS_BRQ_REQ_INSERTED", {0x5d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_BURST_ANY 65 { "BUS_BURST_ANY", {0x10049} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_BURST_IO 66 { "BUS_BURST_IO", {0x40049} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_BURST_SELF 67 { "BUS_BURST_SELF", {0x20049} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_HITM 68 { "BUS_HITM", {0x44} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_IO_ANY 69 { "BUS_IO_ANY", {0x10050} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_IOQ_LIVE_REQ_HI 70 { "BUS_IOQ_LIVE_REQ_HI", {0x58} , 0xf0, 3, {0xffff0000}, NULL}, #define PME_ITA_BUS_IOQ_LIVE_REQ_LO 71 { "BUS_IOQ_LIVE_REQ_LO", {0x57} , 0xf0, 3, {0xffff0000}, NULL}, #define PME_ITA_BUS_IO_SELF 72 { "BUS_IO_SELF", {0x20050} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_LOCK_ANY 73 { "BUS_LOCK_ANY", {0x10053} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_LOCK_CYCLES_ANY 74 { "BUS_LOCK_CYCLES_ANY", {0x10054} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_LOCK_CYCLES_SELF 75 { "BUS_LOCK_CYCLES_SELF", {0x20054} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_LOCK_SELF 76 { "BUS_LOCK_SELF", {0x20053} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_MEMORY_ANY 77 { "BUS_MEMORY_ANY", {0x1004a} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_MEMORY_IO 78 { "BUS_MEMORY_IO", {0x4004a} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_MEMORY_SELF 79 { "BUS_MEMORY_SELF", {0x2004a} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_PARTIAL_ANY 80 { "BUS_PARTIAL_ANY", {0x10048} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_PARTIAL_IO 81 { "BUS_PARTIAL_IO", {0x40048} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_PARTIAL_SELF 82 { "BUS_PARTIAL_SELF", {0x20048} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_ALL_ANY 83 { "BUS_RD_ALL_ANY", {0x1004b} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_ALL_IO 84 { "BUS_RD_ALL_IO", {0x4004b} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_ALL_SELF 85 { "BUS_RD_ALL_SELF", {0x2004b} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_DATA_ANY 86 { "BUS_RD_DATA_ANY", {0x1004c} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_DATA_IO 87 { "BUS_RD_DATA_IO", {0x4004c} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_DATA_SELF 88 { "BUS_RD_DATA_SELF", {0x2004c} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_HIT 89 { "BUS_RD_HIT", {0x40} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_HITM 90 { "BUS_RD_HITM", {0x41} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_ANY 91 { "BUS_RD_INVAL_ANY", {0x1004e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_BST_ANY 92 { "BUS_RD_INVAL_BST_ANY", {0x1004f} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_BST_HITM 93 { "BUS_RD_INVAL_BST_HITM", {0x43} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_BST_IO 94 { "BUS_RD_INVAL_BST_IO", {0x4004f} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_BST_SELF 95 { "BUS_RD_INVAL_BST_SELF", {0x2004f} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_HITM 96 { "BUS_RD_INVAL_HITM", {0x42} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_IO 97 { "BUS_RD_INVAL_IO", {0x4004e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_INVAL_SELF 98 { "BUS_RD_INVAL_SELF", {0x2004e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_IO_ANY 99 { "BUS_RD_IO_ANY", {0x10051} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_IO_SELF 100 { "BUS_RD_IO_SELF", {0x20051} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_PRTL_ANY 101 { "BUS_RD_PRTL_ANY", {0x1004d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_PRTL_IO 102 { "BUS_RD_PRTL_IO", {0x4004d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_RD_PRTL_SELF 103 { "BUS_RD_PRTL_SELF", {0x2004d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_SNOOPQ_REQ 104 { "BUS_SNOOPQ_REQ", {0x56} , 0x30, 3, {0xffff0000}, NULL}, #define PME_ITA_BUS_SNOOPS_ANY 105 { "BUS_SNOOPS_ANY", {0x10046} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_SNOOPS_HITM_ANY 106 { "BUS_SNOOPS_HITM_ANY", {0x10045} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_SNOOP_STALL_CYCLES_ANY 107 { "BUS_SNOOP_STALL_CYCLES_ANY", {0x10055} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_SNOOP_STALL_CYCLES_SELF 108 { "BUS_SNOOP_STALL_CYCLES_SELF", {0x20055} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_WR_WB_ANY 109 { "BUS_WR_WB_ANY", {0x10052} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_WR_WB_IO 110 { "BUS_WR_WB_IO", {0x40052} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_BUS_WR_WB_SELF 111 { "BUS_WR_WB_SELF", {0x20052} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_CPU_CPL_CHANGES 112 { "CPU_CPL_CHANGES", {0x34} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_CPU_CYCLES 113 { "CPU_CYCLES", {0x12} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_DATA_ACCESS_CYCLE 114 { "DATA_ACCESS_CYCLE", {0x3} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT1024 115 { "DATA_EAR_CACHE_LAT1024", {0x90367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT128 116 { "DATA_EAR_CACHE_LAT128", {0x50367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT16 117 { "DATA_EAR_CACHE_LAT16", {0x20367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT2048 118 { "DATA_EAR_CACHE_LAT2048", {0xa0367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT256 119 { "DATA_EAR_CACHE_LAT256", {0x60367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT32 120 { "DATA_EAR_CACHE_LAT32", {0x30367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT4 121 { "DATA_EAR_CACHE_LAT4", {0x367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT512 122 { "DATA_EAR_CACHE_LAT512", {0x80367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT64 123 { "DATA_EAR_CACHE_LAT64", {0x40367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT8 124 { "DATA_EAR_CACHE_LAT8", {0x10367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_CACHE_LAT_NONE 125 { "DATA_EAR_CACHE_LAT_NONE", {0xf0367} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_EVENTS 126 { "DATA_EAR_EVENTS", {0x67} , 0xf0, 1, {0xffff0007}, NULL}, #define PME_ITA_DATA_EAR_TLB_L2 127 { "DATA_EAR_TLB_L2", {0x20767} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_TLB_SW 128 { "DATA_EAR_TLB_SW", {0x80767} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_EAR_TLB_VHPT 129 { "DATA_EAR_TLB_VHPT", {0x40767} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_DATA_REFERENCES_RETIRED 130 { "DATA_REFERENCES_RETIRED", {0x63} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_DEPENDENCY_ALL_CYCLE 131 { "DEPENDENCY_ALL_CYCLE", {0x6} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_DEPENDENCY_SCOREBOARD_CYCLE 132 { "DEPENDENCY_SCOREBOARD_CYCLE", {0x2} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_DTC_MISSES 133 { "DTC_MISSES", {0x60} , 0xf0, 1, {0xffff0007}, NULL}, #define PME_ITA_DTLB_INSERTS_HPW 134 { "DTLB_INSERTS_HPW", {0x62} , 0xf0, 1, {0xffff0007}, NULL}, #define PME_ITA_DTLB_MISSES 135 { "DTLB_MISSES", {0x61} , 0xf0, 1, {0xffff0007}, NULL}, #define PME_ITA_EXPL_STOPBITS 136 { "EXPL_STOPBITS", {0x2e} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_FP_FLUSH_TO_ZERO 137 { "FP_FLUSH_TO_ZERO", {0xb} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_FP_OPS_RETIRED_HI 138 { "FP_OPS_RETIRED_HI", {0xa} , 0xf0, 3, {0xffff0003}, NULL}, #define PME_ITA_FP_OPS_RETIRED_LO 139 { "FP_OPS_RETIRED_LO", {0x9} , 0xf0, 3, {0xffff0003}, NULL}, #define PME_ITA_FP_SIR_FLUSH 140 { "FP_SIR_FLUSH", {0xc} , 0xf0, 2, {0xffff0003}, NULL}, #define PME_ITA_IA32_INST_RETIRED 141 { "IA32_INST_RETIRED", {0x15} , 0xf0, 2, {0xffff0000}, NULL}, #define PME_ITA_IA64_INST_RETIRED 142 { "IA64_INST_RETIRED", {0x8} , 0x30, 6, {0xffff0003}, NULL}, #define PME_ITA_IA64_TAGGED_INST_RETIRED_PMC8 143 { "IA64_TAGGED_INST_RETIRED_PMC8", {0x30008} , 0x30, 6, {0xffff0003}, NULL}, #define PME_ITA_IA64_TAGGED_INST_RETIRED_PMC9 144 { "IA64_TAGGED_INST_RETIRED_PMC9", {0x20008} , 0x30, 6, {0xffff0003}, NULL}, #define PME_ITA_INST_ACCESS_CYCLE 145 { "INST_ACCESS_CYCLE", {0x1} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_INST_DISPERSED 146 { "INST_DISPERSED", {0x2d} , 0x30, 6, {0xffff0001}, NULL}, #define PME_ITA_INST_FAILED_CHKS_RETIRED_ALL 147 { "INST_FAILED_CHKS_RETIRED_ALL", {0x30035} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_INST_FAILED_CHKS_RETIRED_FP 148 { "INST_FAILED_CHKS_RETIRED_FP", {0x20035} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_INST_FAILED_CHKS_RETIRED_INT 149 { "INST_FAILED_CHKS_RETIRED_INT", {0x10035} , 0xf0, 1, {0xffff0003}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT1024 150 { "INSTRUCTION_EAR_CACHE_LAT1024", {0x80123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT128 151 { "INSTRUCTION_EAR_CACHE_LAT128", {0x50123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT16 152 { "INSTRUCTION_EAR_CACHE_LAT16", {0x20123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT2048 153 { "INSTRUCTION_EAR_CACHE_LAT2048", {0x90123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT256 154 { "INSTRUCTION_EAR_CACHE_LAT256", {0x60123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT32 155 { "INSTRUCTION_EAR_CACHE_LAT32", {0x30123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT4096 156 { "INSTRUCTION_EAR_CACHE_LAT4096", {0xa0123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT4 157 { "INSTRUCTION_EAR_CACHE_LAT4", {0x123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT512 158 { "INSTRUCTION_EAR_CACHE_LAT512", {0x70123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT64 159 { "INSTRUCTION_EAR_CACHE_LAT64", {0x40123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT8 160 { "INSTRUCTION_EAR_CACHE_LAT8", {0x10123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_CACHE_LAT_NONE 161 { "INSTRUCTION_EAR_CACHE_LAT_NONE", {0xf0123} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_EVENTS 162 { "INSTRUCTION_EAR_EVENTS", {0x23} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_TLB_SW 163 { "INSTRUCTION_EAR_TLB_SW", {0x80523} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_INSTRUCTION_EAR_TLB_VHPT 164 { "INSTRUCTION_EAR_TLB_VHPT", {0x40523} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_ISA_TRANSITIONS 165 { "ISA_TRANSITIONS", {0x14} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_ISB_LINES_IN 166 { "ISB_LINES_IN", {0x26} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_ITLB_INSERTS_HPW 167 { "ITLB_INSERTS_HPW", {0x28} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_ITLB_MISSES_FETCH 168 { "ITLB_MISSES_FETCH", {0x27} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_L1D_READ_FORCED_MISSES_RETIRED 169 { "L1D_READ_FORCED_MISSES_RETIRED", {0x6b} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L1D_READ_MISSES_RETIRED 170 { "L1D_READ_MISSES_RETIRED", {0x66} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L1D_READS_RETIRED 171 { "L1D_READS_RETIRED", {0x64} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L1I_DEMAND_READS 172 { "L1I_DEMAND_READS", {0x20} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_L1I_FILLS 173 { "L1I_FILLS", {0x21} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L1I_PREFETCH_READS 174 { "L1I_PREFETCH_READS", {0x24} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_L1_OUTSTANDING_REQ_HI 175 { "L1_OUTSTANDING_REQ_HI", {0x79} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L1_OUTSTANDING_REQ_LO 176 { "L1_OUTSTANDING_REQ_LO", {0x78} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_DATA_REFERENCES_ALL 177 { "L2_DATA_REFERENCES_ALL", {0x30069} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L2_DATA_REFERENCES_READS 178 { "L2_DATA_REFERENCES_READS", {0x10069} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L2_DATA_REFERENCES_WRITES 179 { "L2_DATA_REFERENCES_WRITES", {0x20069} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L2_FLUSH_DETAILS_ADDR_CONFLICT 180 { "L2_FLUSH_DETAILS_ADDR_CONFLICT", {0x20077} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_FLUSH_DETAILS_ALL 181 { "L2_FLUSH_DETAILS_ALL", {0xf0077} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_FLUSH_DETAILS_BUS_REJECT 182 { "L2_FLUSH_DETAILS_BUS_REJECT", {0x40077} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_FLUSH_DETAILS_FULL_FLUSH 183 { "L2_FLUSH_DETAILS_FULL_FLUSH", {0x80077} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_FLUSH_DETAILS_ST_BUFFER 184 { "L2_FLUSH_DETAILS_ST_BUFFER", {0x10077} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_FLUSHES 185 { "L2_FLUSHES", {0x76} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L2_INST_DEMAND_READS 186 { "L2_INST_DEMAND_READS", {0x22} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_L2_INST_PREFETCH_READS 187 { "L2_INST_PREFETCH_READS", {0x25} , 0xf0, 1, {0xffff0001}, NULL}, #define PME_ITA_L2_MISSES 188 { "L2_MISSES", {0x6a} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_L2_REFERENCES 189 { "L2_REFERENCES", {0x68} , 0xf0, 3, {0xffff0007}, NULL}, #define PME_ITA_L3_LINES_REPLACED 190 { "L3_LINES_REPLACED", {0x7f} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_MISSES 191 { "L3_MISSES", {0x7c} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_ALL_READS_ALL 192 { "L3_READS_ALL_READS_ALL", {0xf007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_ALL_READS_HIT 193 { "L3_READS_ALL_READS_HIT", {0xd007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_ALL_READS_MISS 194 { "L3_READS_ALL_READS_MISS", {0xe007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_DATA_READS_ALL 195 { "L3_READS_DATA_READS_ALL", {0xb007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_DATA_READS_HIT 196 { "L3_READS_DATA_READS_HIT", {0x9007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_DATA_READS_MISS 197 { "L3_READS_DATA_READS_MISS", {0xa007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_INST_READS_ALL 198 { "L3_READS_INST_READS_ALL", {0x7007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_INST_READS_HIT 199 { "L3_READS_INST_READS_HIT", {0x5007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_READS_INST_READS_MISS 200 { "L3_READS_INST_READS_MISS", {0x6007d} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_REFERENCES 201 { "L3_REFERENCES", {0x7b} , 0xf0, 1, {0xffff0007}, NULL}, #define PME_ITA_L3_WRITES_ALL_WRITES_ALL 202 { "L3_WRITES_ALL_WRITES_ALL", {0xf007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_ALL_WRITES_HIT 203 { "L3_WRITES_ALL_WRITES_HIT", {0xd007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_ALL_WRITES_MISS 204 { "L3_WRITES_ALL_WRITES_MISS", {0xe007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_DATA_WRITES_ALL 205 { "L3_WRITES_DATA_WRITES_ALL", {0x7007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_DATA_WRITES_HIT 206 { "L3_WRITES_DATA_WRITES_HIT", {0x5007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_DATA_WRITES_MISS 207 { "L3_WRITES_DATA_WRITES_MISS", {0x6007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_L2_WRITEBACK_ALL 208 { "L3_WRITES_L2_WRITEBACK_ALL", {0xb007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_L2_WRITEBACK_HIT 209 { "L3_WRITES_L2_WRITEBACK_HIT", {0x9007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_L3_WRITES_L2_WRITEBACK_MISS 210 { "L3_WRITES_L2_WRITEBACK_MISS", {0xa007e} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_LOADS_RETIRED 211 { "LOADS_RETIRED", {0x6c} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_MEMORY_CYCLE 212 { "MEMORY_CYCLE", {0x7} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_MISALIGNED_LOADS_RETIRED 213 { "MISALIGNED_LOADS_RETIRED", {0x70} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_MISALIGNED_STORES_RETIRED 214 { "MISALIGNED_STORES_RETIRED", {0x71} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_NOPS_RETIRED 215 { "NOPS_RETIRED", {0x30} , 0x30, 6, {0xffff0003}, NULL}, #define PME_ITA_PIPELINE_ALL_FLUSH_CYCLE 216 { "PIPELINE_ALL_FLUSH_CYCLE", {0x4} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_BACKEND_FLUSH_CYCLE 217 { "PIPELINE_BACKEND_FLUSH_CYCLE", {0x0} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_FLUSH_ALL 218 { "PIPELINE_FLUSH_ALL", {0xf0033} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_FLUSH_DTC_FLUSH 219 { "PIPELINE_FLUSH_DTC_FLUSH", {0x40033} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_FLUSH_IEU_FLUSH 220 { "PIPELINE_FLUSH_IEU_FLUSH", {0x80033} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_FLUSH_L1D_WAYMP_FLUSH 221 { "PIPELINE_FLUSH_L1D_WAYMP_FLUSH", {0x20033} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PIPELINE_FLUSH_OTHER_FLUSH 222 { "PIPELINE_FLUSH_OTHER_FLUSH", {0x10033} , 0xf0, 1, {0xffff0000}, NULL}, #define PME_ITA_PREDICATE_SQUASHED_RETIRED 223 { "PREDICATE_SQUASHED_RETIRED", {0x31} , 0x30, 6, {0xffff0003}, NULL}, #define PME_ITA_RSE_LOADS_RETIRED 224 { "RSE_LOADS_RETIRED", {0x72} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_RSE_REFERENCES_RETIRED 225 { "RSE_REFERENCES_RETIRED", {0x65} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_STORES_RETIRED 226 { "STORES_RETIRED", {0x6d} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_UC_LOADS_RETIRED 227 { "UC_LOADS_RETIRED", {0x6e} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_UC_STORES_RETIRED 228 { "UC_STORES_RETIRED", {0x6f} , 0xf0, 2, {0xffff0007}, NULL}, #define PME_ITA_UNSTALLED_BACKEND_CYCLE 229 { "UNSTALLED_BACKEND_CYCLE", {0x5} , 0xf0, 1, {0xffff0000}, NULL}}; #define PME_ITA_EVENT_COUNT 230 papi-5.3.0/src/libpfm4/lib/events/intel_nhm_unc_events.h0000600003276200002170000010477712247131123022733 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: nhm_unc (Intel Nehalem uncore) */ static const intel_x86_umask_t nhm_unc_unc_dram_open[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 open comamnds issued for read or write", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 open comamnds issued for read or write", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 open comamnds issued for read or write", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_page_close[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 page close", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 page close", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 page close", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_page_miss[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 page miss", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 page miss", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 page miss", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_pre_all[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 precharge all commands", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 precharge all commands", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 precharge all commands", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_read_cas[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 read CAS commands", .ucode = 0x100, }, { .uname = "AUTOPRE_CH0", .udesc = "DRAM Channel 0 read CAS auto page close commands", .ucode = 0x200, }, { .uname = "CH1", .udesc = "DRAM Channel 1 read CAS commands", .ucode = 0x400, }, { .uname = "AUTOPRE_CH1", .udesc = "DRAM Channel 1 read CAS auto page close commands", .ucode = 0x800, }, { .uname = "CH2", .udesc = "DRAM Channel 2 read CAS commands", .ucode = 0x1000, }, { .uname = "AUTOPRE_CH2", .udesc = "DRAM Channel 2 read CAS auto page close commands", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_refresh[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 refresh commands", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 refresh commands", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 refresh commands", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_dram_write_cas[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 write CAS commands", .ucode = 0x100, }, { .uname = "AUTOPRE_CH0", .udesc = "DRAM Channel 0 write CAS auto page close commands", .ucode = 0x200, }, { .uname = "CH1", .udesc = "DRAM Channel 1 write CAS commands", .ucode = 0x400, }, { .uname = "AUTOPRE_CH1", .udesc = "DRAM Channel 1 write CAS auto page close commands", .ucode = 0x800, }, { .uname = "CH2", .udesc = "DRAM Channel 2 write CAS commands", .ucode = 0x1000, }, { .uname = "AUTOPRE_CH2", .udesc = "DRAM Channel 2 write CAS auto page close commands", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_gq_alloc[]={ { .uname = "READ_TRACKER", .udesc = "GQ read tracker requests", .ucode = 0x100, }, { .uname = "RT_LLC_MISS", .udesc = "GQ read tracker LLC misses", .ucode = 0x200, }, { .uname = "RT_TO_LLC_RESP", .udesc = "GQ read tracker LLC requests", .ucode = 0x400, }, { .uname = "RT_TO_RTID_ACQUIRED", .udesc = "GQ read tracker LLC miss to RTID acquired", .ucode = 0x800, }, { .uname = "WT_TO_RTID_ACQUIRED", .udesc = "GQ write tracker LLC miss to RTID acquired", .ucode = 0x1000, }, { .uname = "WRITE_TRACKER", .udesc = "GQ write tracker LLC misses", .ucode = 0x2000, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "GQ peer probe tracker requests", .ucode = 0x4000, }, }; static const intel_x86_umask_t nhm_unc_unc_gq_cycles_full[]={ { .uname = "READ_TRACKER", .udesc = "Cycles GQ read tracker is full.", .ucode = 0x100, }, { .uname = "WRITE_TRACKER", .udesc = "Cycles GQ write tracker is full.", .ucode = 0x200, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "Cycles GQ peer probe tracker is full.", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_gq_cycles_not_empty[]={ { .uname = "READ_TRACKER", .udesc = "Cycles GQ read tracker is busy", .ucode = 0x100, }, { .uname = "WRITE_TRACKER", .udesc = "Cycles GQ write tracker is busy", .ucode = 0x200, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "Cycles GQ peer probe tracker is busy", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_gq_data_from[]={ { .uname = "QPI", .udesc = "Cycles GQ data is imported from Quickpath interface", .ucode = 0x100, }, { .uname = "QMC", .udesc = "Cycles GQ data is imported from Quickpath memory interface", .ucode = 0x200, }, { .uname = "LLC", .udesc = "Cycles GQ data is imported from LLC", .ucode = 0x400, }, { .uname = "CORES_02", .udesc = "Cycles GQ data is imported from Cores 0 and 2", .ucode = 0x800, }, { .uname = "CORES_13", .udesc = "Cycles GQ data is imported from Cores 1 and 3", .ucode = 0x1000, }, }; static const intel_x86_umask_t nhm_unc_unc_gq_data_to[]={ { .uname = "QPI_QMC", .udesc = "Cycles GQ data sent to the QPI or QMC", .ucode = 0x100, }, { .uname = "LLC", .udesc = "Cycles GQ data sent to LLC", .ucode = 0x200, }, { .uname = "CORES", .udesc = "Cycles GQ data sent to cores", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_llc_hits[]={ { .uname = "READ", .udesc = "Number of LLC read hits", .ucode = 0x100, }, { .uname = "WRITE", .udesc = "Number of LLC write hits", .ucode = 0x200, }, { .uname = "PROBE", .udesc = "Number of LLC peer probe hits", .ucode = 0x400, }, { .uname = "ANY", .udesc = "Number of LLC hits", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_llc_lines_in[]={ { .uname = "M_STATE", .udesc = "LLC lines allocated in M state", .ucode = 0x100, }, { .uname = "E_STATE", .udesc = "LLC lines allocated in E state", .ucode = 0x200, }, { .uname = "S_STATE", .udesc = "LLC lines allocated in S state", .ucode = 0x400, }, { .uname = "F_STATE", .udesc = "LLC lines allocated in F state", .ucode = 0x800, }, { .uname = "ANY", .udesc = "LLC lines allocated", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_llc_lines_out[]={ { .uname = "M_STATE", .udesc = "LLC lines victimized in M state", .ucode = 0x100, }, { .uname = "E_STATE", .udesc = "LLC lines victimized in E state", .ucode = 0x200, }, { .uname = "S_STATE", .udesc = "LLC lines victimized in S state", .ucode = 0x400, }, { .uname = "I_STATE", .udesc = "LLC lines victimized in I state", .ucode = 0x800, }, { .uname = "F_STATE", .udesc = "LLC lines victimized in F state", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "LLC lines victimized", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_llc_miss[]={ { .uname = "READ", .udesc = "Number of LLC read misses", .ucode = 0x100, }, { .uname = "WRITE", .udesc = "Number of LLC write misses", .ucode = 0x200, }, { .uname = "PROBE", .udesc = "Number of LLC peer probe misses", .ucode = 0x400, }, { .uname = "ANY", .udesc = "Number of LLC misses", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_address_conflicts[]={ { .uname = "2WAY", .udesc = "QHL 2 way address conflicts", .ucode = 0x200, }, { .uname = "3WAY", .udesc = "QHL 3 way address conflicts", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_conflict_cycles[]={ { .uname = "IOH", .udesc = "QHL IOH Tracker conflict cycles", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "QHL Remote Tracker conflict cycles", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "QHL Local Tracker conflict cycles", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_cycles_full[]={ { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker is full", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker is full", .ucode = 0x400, }, { .uname = "IOH", .udesc = "Cycles QHL IOH Tracker is full", .ucode = 0x100, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_cycles_not_empty[]={ { .uname = "IOH", .udesc = "Cycles QHL IOH is busy", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker is busy", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker is busy", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_frc_ack_cnflts[]={ { .uname = "LOCAL", .udesc = "QHL FrcAckCnflts sent to local home", .ucode = 0x400, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_occupancy[]={ { .uname = "IOH", .udesc = "Cycles QHL IOH Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_qhl_requests[]={ { .uname = "LOCAL_READS", .udesc = "Quickpath Home Logic local read requests", .ucode = 0x1000, }, { .uname = "LOCAL_WRITES", .udesc = "Quickpath Home Logic local write requests", .ucode = 0x2000, }, { .uname = "REMOTE_READS", .udesc = "Quickpath Home Logic remote read requests", .ucode = 0x400, }, { .uname = "IOH_READS", .udesc = "Quickpath Home Logic IOH read requests", .ucode = 0x100, }, { .uname = "IOH_WRITES", .udesc = "Quickpath Home Logic IOH write requests", .ucode = 0x200, }, { .uname = "REMOTE_WRITES", .udesc = "Quickpath Home Logic remote write requests", .ucode = 0x800, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_busy[]={ { .uname = "READ_CH0", .udesc = "Cycles QMC channel 0 busy with a read request", .ucode = 0x100, }, { .uname = "READ_CH1", .udesc = "Cycles QMC channel 1 busy with a read request", .ucode = 0x200, }, { .uname = "READ_CH2", .udesc = "Cycles QMC channel 2 busy with a read request", .ucode = 0x400, }, { .uname = "WRITE_CH0", .udesc = "Cycles QMC channel 0 busy with a write request", .ucode = 0x800, }, { .uname = "WRITE_CH1", .udesc = "Cycles QMC channel 1 busy with a write request", .ucode = 0x1000, }, { .uname = "WRITE_CH2", .udesc = "Cycles QMC channel 2 busy with a write request", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_cancel[]={ { .uname = "CH0", .udesc = "QMC channel 0 cancels", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 cancels", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 cancels", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC cancels", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_critical_priority_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 critical priority read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 critical priority read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 critical priority read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC critical priority read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_high_priority_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 high priority read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 high priority read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 high priority read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC high priority read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_isoc_full[]={ { .uname = "READ_CH0", .udesc = "Cycles DRAM channel 0 full with isochronous read requests", .ucode = 0x100, }, { .uname = "READ_CH1", .udesc = "Cycles DRAM channel 1 full with isochronous read requests", .ucode = 0x200, }, { .uname = "READ_CH2", .udesc = "Cycles DRAM channel 2 full with isochronous read requests", .ucode = 0x400, }, { .uname = "WRITE_CH0", .udesc = "Cycles DRAM channel 0 full with isochronous write requests", .ucode = 0x800, }, { .uname = "WRITE_CH1", .udesc = "Cycles DRAM channel 1 full with isochronous write requests", .ucode = 0x1000, }, { .uname = "WRITE_CH2", .udesc = "Cycles DRAM channel 2 full with isochronous write requests", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_imc_isoc_occupancy[]={ { .uname = "CH0", .udesc = "IMC channel 0 isochronous read request occupancy", .ucode = 0x100, }, { .uname = "CH1", .udesc = "IMC channel 1 isochronous read request occupancy", .ucode = 0x200, }, { .uname = "CH2", .udesc = "IMC channel 2 isochronous read request occupancy", .ucode = 0x400, }, { .uname = "ANY", .udesc = "IMC isochronous read request occupancy", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_normal_full[]={ { .uname = "READ_CH0", .udesc = "Cycles DRAM channel 0 full with normal read requests", .ucode = 0x100, }, { .uname = "READ_CH1", .udesc = "Cycles DRAM channel 1 full with normal read requests", .ucode = 0x200, }, { .uname = "READ_CH2", .udesc = "Cycles DRAM channel 2 full with normal read requests", .ucode = 0x400, }, { .uname = "WRITE_CH0", .udesc = "Cycles DRAM channel 0 full with normal write requests", .ucode = 0x800, }, { .uname = "WRITE_CH1", .udesc = "Cycles DRAM channel 1 full with normal write requests", .ucode = 0x1000, }, { .uname = "WRITE_CH2", .udesc = "Cycles DRAM channel 2 full with normal write requests", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_normal_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 normal read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 normal read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 normal read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC normal read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_occupancy[]={ { .uname = "CH0", .udesc = "IMC channel 0 normal read request occupancy", .ucode = 0x100, }, { .uname = "CH1", .udesc = "IMC channel 1 normal read request occupancy", .ucode = 0x200, }, { .uname = "CH2", .udesc = "IMC channel 2 normal read request occupancy", .ucode = 0x400, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_priority_updates[]={ { .uname = "CH0", .udesc = "QMC channel 0 priority updates", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 priority updates", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 priority updates", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC priority updates", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_unc_unc_qmc_writes[]={ { .uname = "FULL_CH0", .udesc = "QMC channel 0 full cache line writes", .ucode = 0x100, .grpid = 0, }, { .uname = "FULL_CH1", .udesc = "QMC channel 1 full cache line writes", .ucode = 0x200, .grpid = 0, }, { .uname = "FULL_CH2", .udesc = "QMC channel 2 full cache line writes", .ucode = 0x400, .grpid = 0, }, { .uname = "FULL_ANY", .udesc = "QMC full cache line writes", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "PARTIAL_CH0", .udesc = "QMC channel 0 partial cache line writes", .ucode = 0x800, .grpid = 1, }, { .uname = "PARTIAL_CH1", .udesc = "QMC channel 1 partial cache line writes", .ucode = 0x1000, .grpid = 1, }, { .uname = "PARTIAL_CH2", .udesc = "QMC channel 2 partial cache line writes", .ucode = 0x2000, .grpid = 1, }, { .uname = "PARTIAL_ANY", .udesc = "QMC partial cache line writes", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, }; static const intel_x86_umask_t nhm_unc_unc_qpi_rx_no_ppt_credit[]={ { .uname = "STALLS_LINK_0", .udesc = "Link 0 snoop stalls due to no PPT entry", .ucode = 0x100, }, { .uname = "STALLS_LINK_1", .udesc = "Link 1 snoop stalls due to no PPT entry", .ucode = 0x200, }, }; static const intel_x86_umask_t nhm_unc_unc_qpi_tx_header[]={ { .uname = "BUSY_LINK_0", .udesc = "Cycles link 0 outbound header busy", .ucode = 0x200, }, { .uname = "BUSY_LINK_1", .udesc = "Cycles link 1 outbound header busy", .ucode = 0x800, }, }; static const intel_x86_umask_t nhm_unc_unc_qpi_tx_stalled_multi_flit[]={ { .uname = "DRS_LINK_0", .udesc = "Cycles QPI outbound link 0 DRS stalled", .ucode = 0x100, }, { .uname = "NCB_LINK_0", .udesc = "Cycles QPI outbound link 0 NCB stalled", .ucode = 0x200, }, { .uname = "NCS_LINK_0", .udesc = "Cycles QPI outbound link 0 NCS stalled", .ucode = 0x400, }, { .uname = "DRS_LINK_1", .udesc = "Cycles QPI outbound link 1 DRS stalled", .ucode = 0x800, }, { .uname = "NCB_LINK_1", .udesc = "Cycles QPI outbound link 1 NCB stalled", .ucode = 0x1000, }, { .uname = "NCS_LINK_1", .udesc = "Cycles QPI outbound link 1 NCS stalled", .ucode = 0x2000, }, { .uname = "LINK_0", .udesc = "Cycles QPI outbound link 0 multi flit stalled", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LINK_1", .udesc = "Cycles QPI outbound link 1 multi flit stalled", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_unc_unc_qpi_tx_stalled_single_flit[]={ { .uname = "HOME_LINK_0", .udesc = "Cycles QPI outbound link 0 HOME stalled", .ucode = 0x100, }, { .uname = "SNOOP_LINK_0", .udesc = "Cycles QPI outbound link 0 SNOOP stalled", .ucode = 0x200, }, { .uname = "NDR_LINK_0", .udesc = "Cycles QPI outbound link 0 NDR stalled", .ucode = 0x400, }, { .uname = "HOME_LINK_1", .udesc = "Cycles QPI outbound link 1 HOME stalled", .ucode = 0x800, }, { .uname = "SNOOP_LINK_1", .udesc = "Cycles QPI outbound link 1 SNOOP stalled", .ucode = 0x1000, }, { .uname = "NDR_LINK_1", .udesc = "Cycles QPI outbound link 1 NDR stalled", .ucode = 0x2000, }, { .uname = "LINK_0", .udesc = "Cycles QPI outbound link 0 single flit stalled", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LINK_1", .udesc = "Cycles QPI outbound link 1 single flit stalled", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_unc_unc_snp_resp_to_local_home[]={ { .uname = "I_STATE", .udesc = "Local home snoop response - LLC does not have cache line", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Local home snoop response - LLC has cache line in S state", .ucode = 0x200, }, { .uname = "FWD_S_STATE", .udesc = "Local home snoop response - LLC forwarding cache line in S state.", .ucode = 0x400, }, { .uname = "FWD_I_STATE", .udesc = "Local home snoop response - LLC has forwarded a modified cache line", .ucode = 0x800, }, { .uname = "CONFLICT", .udesc = "Local home conflict snoop response", .ucode = 0x1000, }, { .uname = "WB", .udesc = "Local home snoop response - LLC has cache line in the M state", .ucode = 0x2000, }, }; static const intel_x86_umask_t nhm_unc_unc_snp_resp_to_remote_home[]={ { .uname = "I_STATE", .udesc = "Remote home snoop response - LLC does not have cache line", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Remote home snoop response - LLC has cache line in S state", .ucode = 0x200, }, { .uname = "FWD_S_STATE", .udesc = "Remote home snoop response - LLC forwarding cache line in S state.", .ucode = 0x400, }, { .uname = "FWD_I_STATE", .udesc = "Remote home snoop response - LLC has forwarded a modified cache line", .ucode = 0x800, }, { .uname = "CONFLICT", .udesc = "Remote home conflict snoop response", .ucode = 0x1000, }, { .uname = "WB", .udesc = "Remote home snoop response - LLC has cache line in the M state", .ucode = 0x2000, }, { .uname = "HITM", .udesc = "Remote home snoop response - LLC HITM", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_nhm_unc_pe[]={ { .name = "UNC_CLK_UNHALTED", .desc = "Uncore clockticks.", .modmsk =0x0, .cntmsk = 0x100000, .code = 0xff, .flags = INTEL_X86_FIXED, }, { .name = "UNC_DRAM_OPEN", .desc = "DRAM open comamnds issued for read or write", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_open), .ngrp = 1, .umasks = nhm_unc_unc_dram_open, }, { .name = "UNC_DRAM_PAGE_CLOSE", .desc = "DRAM page close due to idle timer expiration", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x61, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_page_close), .ngrp = 1, .umasks = nhm_unc_unc_dram_page_close, }, { .name = "UNC_DRAM_PAGE_MISS", .desc = "DRAM Channel 0 page miss", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_page_miss), .ngrp = 1, .umasks = nhm_unc_unc_dram_page_miss, }, { .name = "UNC_DRAM_PRE_ALL", .desc = "DRAM Channel 0 precharge all commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_pre_all), .ngrp = 1, .umasks = nhm_unc_unc_dram_pre_all, }, { .name = "UNC_DRAM_READ_CAS", .desc = "DRAM Channel 0 read CAS commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_read_cas), .ngrp = 1, .umasks = nhm_unc_unc_dram_read_cas, }, { .name = "UNC_DRAM_REFRESH", .desc = "DRAM Channel 0 refresh commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_refresh), .ngrp = 1, .umasks = nhm_unc_unc_dram_refresh, }, { .name = "UNC_DRAM_WRITE_CAS", .desc = "DRAM Channel 0 write CAS commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x64, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_dram_write_cas), .ngrp = 1, .umasks = nhm_unc_unc_dram_write_cas, }, { .name = "UNC_GQ_ALLOC", .desc = "GQ read tracker requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_gq_alloc), .ngrp = 1, .umasks = nhm_unc_unc_gq_alloc, }, { .name = "UNC_GQ_CYCLES_FULL", .desc = "Cycles GQ read tracker is full.", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_gq_cycles_full), .ngrp = 1, .umasks = nhm_unc_unc_gq_cycles_full, }, { .name = "UNC_GQ_CYCLES_NOT_EMPTY", .desc = "Cycles GQ read tracker is busy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x1, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_gq_cycles_not_empty), .ngrp = 1, .umasks = nhm_unc_unc_gq_cycles_not_empty, }, { .name = "UNC_GQ_DATA_FROM", .desc = "Cycles GQ data is imported", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_gq_data_from), .ngrp = 1, .umasks = nhm_unc_unc_gq_data_from, }, { .name = "UNC_GQ_DATA_TO", .desc = "Cycles GQ data is exported", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_gq_data_to), .ngrp = 1, .umasks = nhm_unc_unc_gq_data_to, }, { .name = "UNC_LLC_HITS", .desc = "Number of LLC read hits", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_llc_hits), .ngrp = 1, .umasks = nhm_unc_unc_llc_hits, }, { .name = "UNC_LLC_LINES_IN", .desc = "LLC lines allocated in M state", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0xa, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_llc_lines_in), .ngrp = 1, .umasks = nhm_unc_unc_llc_lines_in, }, { .name = "UNC_LLC_LINES_OUT", .desc = "LLC lines victimized in M state", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0xb, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_llc_lines_out), .ngrp = 1, .umasks = nhm_unc_unc_llc_lines_out, }, { .name = "UNC_LLC_MISS", .desc = "Number of LLC read misses", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x9, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_llc_miss), .ngrp = 1, .umasks = nhm_unc_unc_llc_miss, }, { .name = "UNC_QHL_ADDRESS_CONFLICTS", .desc = "QHL 2 way address conflicts", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_address_conflicts), .ngrp = 1, .umasks = nhm_unc_unc_qhl_address_conflicts, }, { .name = "UNC_QHL_CONFLICT_CYCLES", .desc = "QHL IOH Tracker conflict cycles", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x25, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_conflict_cycles), .ngrp = 1, .umasks = nhm_unc_unc_qhl_conflict_cycles, }, { .name = "UNC_QHL_CYCLES_FULL", .desc = "Cycles QHL Remote Tracker is full", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x21, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_cycles_full), .ngrp = 1, .umasks = nhm_unc_unc_qhl_cycles_full, }, { .name = "UNC_QHL_CYCLES_NOT_EMPTY", .desc = "Cycles QHL Tracker is not empty", .modmsk =0x0, .cntmsk = 0x1fe00000, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_cycles_not_empty), .ngrp = 1, .umasks = nhm_unc_unc_qhl_cycles_not_empty, }, { .name = "UNC_QHL_FRC_ACK_CNFLTS", .desc = "QHL FrcAckCnflts sent to local home", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x33, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_frc_ack_cnflts), .ngrp = 1, .umasks = nhm_unc_unc_qhl_frc_ack_cnflts, }, { .name = "UNC_QHL_OCCUPANCY", .desc = "Cycles QHL Tracker Allocate to Deallocate Read Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x23, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_occupancy), .ngrp = 1, .umasks = nhm_unc_unc_qhl_occupancy, }, { .name = "UNC_QHL_REQUESTS", .desc = "Quickpath Home Logic local read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qhl_requests), .ngrp = 1, .umasks = nhm_unc_unc_qhl_requests, }, { .name = "UNC_QHL_TO_QMC_BYPASS", .desc = "Number of requests to QMC that bypass QHL", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x26, }, { .name = "UNC_QMC_BUSY", .desc = "Cycles QMC busy with a read request", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_busy), .ngrp = 1, .umasks = nhm_unc_unc_qmc_busy, }, { .name = "UNC_QMC_CANCEL", .desc = "QMC cancels", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x30, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_cancel), .ngrp = 1, .umasks = nhm_unc_unc_qmc_cancel, }, { .name = "UNC_QMC_CRITICAL_PRIORITY_READS", .desc = "QMC critical priority read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_critical_priority_reads), .ngrp = 1, .umasks = nhm_unc_unc_qmc_critical_priority_reads, }, { .name = "UNC_QMC_HIGH_PRIORITY_READS", .desc = "QMC high priority read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2d, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_high_priority_reads), .ngrp = 1, .umasks = nhm_unc_unc_qmc_high_priority_reads, }, { .name = "UNC_QMC_ISOC_FULL", .desc = "Cycles DRAM full with isochronous (ISOC) read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_isoc_full), .ngrp = 1, .umasks = nhm_unc_unc_qmc_isoc_full, }, { .name = "UNC_IMC_ISOC_OCCUPANCY", .desc = "IMC isochronous (ISOC) Read Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2b, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_imc_isoc_occupancy), .ngrp = 1, .umasks = nhm_unc_unc_imc_isoc_occupancy, }, { .name = "UNC_QMC_NORMAL_FULL", .desc = "Cycles DRAM full with normal read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_normal_full), .ngrp = 1, .umasks = nhm_unc_unc_qmc_normal_full, }, { .name = "UNC_QMC_NORMAL_READS", .desc = "QMC normal read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2c, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_normal_reads), .ngrp = 1, .umasks = nhm_unc_unc_qmc_normal_reads, }, { .name = "UNC_QMC_OCCUPANCY", .desc = "QMC Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_occupancy), .ngrp = 1, .umasks = nhm_unc_unc_qmc_occupancy, }, { .name = "UNC_QMC_PRIORITY_UPDATES", .desc = "QMC priority updates", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x31, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_priority_updates), .ngrp = 1, .umasks = nhm_unc_unc_qmc_priority_updates, }, { .name = "UNC_QMC_WRITES", .desc = "QMC cache line writes", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2f, .flags= INTEL_X86_GRP_EXCL, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qmc_writes), .ngrp = 2, .umasks = nhm_unc_unc_qmc_writes, }, { .name = "UNC_QPI_RX_NO_PPT_CREDIT", .desc = "Link 0 snoop stalls due to no PPT entry", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qpi_rx_no_ppt_credit), .ngrp = 1, .umasks = nhm_unc_unc_qpi_rx_no_ppt_credit, }, { .name = "UNC_QPI_TX_HEADER", .desc = "Cycles link 0 outbound header busy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qpi_tx_header), .ngrp = 1, .umasks = nhm_unc_unc_qpi_tx_header, }, { .name = "UNC_QPI_TX_STALLED_MULTI_FLIT", .desc = "Cycles QPI outbound stalls", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qpi_tx_stalled_multi_flit), .ngrp = 1, .umasks = nhm_unc_unc_qpi_tx_stalled_multi_flit, }, { .name = "UNC_QPI_TX_STALLED_SINGLE_FLIT", .desc = "Cycles QPI outbound link stalls", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_qpi_tx_stalled_single_flit), .ngrp = 1, .umasks = nhm_unc_unc_qpi_tx_stalled_single_flit, }, { .name = "UNC_SNP_RESP_TO_LOCAL_HOME", .desc = "Local home snoop response", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_snp_resp_to_local_home), .ngrp = 1, .umasks = nhm_unc_unc_snp_resp_to_local_home, }, { .name = "UNC_SNP_RESP_TO_REMOTE_HOME", .desc = "Remote home snoop response", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(nhm_unc_unc_snp_resp_to_remote_home), .ngrp = 1, .umasks = nhm_unc_unc_snp_resp_to_remote_home, }, }; papi-5.3.0/src/libpfm4/lib/events/sparc_ultra3plus_events.h0000600003276200002170000003001612247131124023400 0ustar ralphundrgradstatic const sparc_entry_t ultra3plus_pe[] = { /* These two must always be first. */ { .name = "Cycle_cnt", .desc = "Accumulated cycles", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x1, }, /* PIC0 events common to all UltraSPARC processors */ { .name = "Dispatch0_IC_miss", .desc = "I-buffer is empty from I-Cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, { .name = "IC_ref", .desc = "I-cache refrences", .ctrl = PME_CTRL_S0, .code = 0x8, }, { .name = "DC_rd", .desc = "D-cache read references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0x9, }, { .name = "DC_wr", .desc = "D-cache store accesses (including cacheable stores that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0xa, }, { .name = "EC_ref", .desc = "E-cache references", .ctrl = PME_CTRL_S0, .code = 0xc, }, { .name = "EC_snoop_inv", .desc = "L2-cache invalidates generated from a snoop by a remote processor", .ctrl = PME_CTRL_S0, .code = 0xe, }, /* PIC1 events common to all UltraSPARC processors */ { .name = "Dispatch0_mispred", .desc = "I-buffer is empty from Branch misprediction", .ctrl = PME_CTRL_S1, .code = 0x2, }, { .name = "EC_wb", .desc = "Dirty sub-blocks that produce writebacks due to L2-cache miss events", .ctrl = PME_CTRL_S1, .code = 0xd, }, { .name = "EC_snoop_cb", .desc = "L2-cache copybacks generated from a snoop by a remote processor", .ctrl = PME_CTRL_S1, .code = 0xe, }, /* PIC0 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "Dispatch0_br_target", .desc = "I-buffer is empty due to a branch target address calculation", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "Dispatch0_2nd_br", .desc = "Stall cycles due to having two branch instructions line-up in one 4-instruction group causing the second branch in the group to be re-fetched, delaying it's entrance into the I-buffer", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "Rstall_storeQ", .desc = "R-stage stall for a store instruction which is the next instruction to be executed, but it stailled due to the store queue being full", .ctrl = PME_CTRL_S0, .code = 0x5, }, { .name = "Rstall_IU_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding integer instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S0, .code = 0x6, }, { .name = "EC_write_hit_RTO", .desc = "W-cache exclusive requests that hit L2-cache in S, O, or Os state and thus, do a read-to-own bus transaction", .ctrl = PME_CTRL_S0, .code = 0xd, }, { .name = "EC_rd_miss", .desc = "L2-cache miss events (including atomics) from D-cache events", .ctrl = PME_CTRL_S0, .code = 0xf, }, { .name = "PC_port0_rd", .desc = "P-cache cacheable FP loads to the first port (general purpose load path to D-cache and P-cache via MS pipeline)", .ctrl = PME_CTRL_S0, .code = 0x10, }, { .name = "SI_snoop", .desc = "Counts snoops from remote processor(s) including RTS, RTSR, RTO, RTOR, RS, RSR, RTSM, and WS", .ctrl = PME_CTRL_S0, .code = 0x11, }, { .name = "SI_ciq_flow", .desc = "Counts system clock cycles when the flow control (PauseOut) signal is asserted", .ctrl = PME_CTRL_S0, .code = 0x12, }, { .name = "SI_owned", .desc = "Counts events where owned_in is asserted on bus requests from the local processor", .ctrl = PME_CTRL_S0, .code = 0x13, }, { .name = "SW_count0", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S0, .code = 0x14, }, { .name = "IU_Stat_Br_miss_taken", .desc = "Retired branches that were predicted to be taken, but in fact were not taken", .ctrl = PME_CTRL_S0, .code = 0x15, }, { .name = "IU_Stat_Br_Count_taken", .desc = "Retired taken branches", .ctrl = PME_CTRL_S0, .code = 0x16, }, { .name = "Dispatch0_rs_mispred", .desc = "I-buffer is empty due to a Return Address Stack misprediction", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "FA_pipe_completion", .desc = "Instructions that complete execution on the FPG ALU pipelines", .ctrl = PME_CTRL_S0, .code = 0x18, }, /* PIC1 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "IC_miss_cancelled", .desc = "I-cache misses cancelled due to mis-speculation, recycle, or other events", .ctrl = PME_CTRL_S1, .code = 0x3, }, { .name = "Re_FPU_bypass", .desc = "Stall due to recirculation when an FPU bypass condition that does not have a direct bypass path occurs", .ctrl = PME_CTRL_S1, .code = 0x5, }, { .name = "Re_DC_miss", .desc = "Stall due to loads that miss D-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x6, }, { .name = "Re_EC_miss", .desc = "Stall due to loads that miss L2-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x7, }, { .name = "IC_miss", .desc = "I-cache misses, including fetches from mis-speculated execution paths which are later cancelled", .ctrl = PME_CTRL_S1, .code = 0x8, }, { .name = "DC_rd_miss", .desc = "Recirculated loads that miss the D-cache", .ctrl = PME_CTRL_S1, .code = 0x9, }, { .name = "DC_wr_miss", .desc = "D-cache store accesses that miss D-cache", .ctrl = PME_CTRL_S1, .code = 0xa, }, { .name = "Rstall_FP_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding floating-point instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S1, .code = 0xb, }, { .name = "EC_misses", .desc = "E-cache misses", .ctrl = PME_CTRL_S1, .code = 0xc, }, { .name = "EC_ic_miss", .desc = "L2-cache read misses from I-cache requests", .ctrl = PME_CTRL_S1, .code = 0xf, }, { .name = "Re_PC_miss", .desc = "Stall due to recirculation when a prefetch cache miss occurs on a prefetch predicted second load", .ctrl = PME_CTRL_S1, .code = 0x10, }, { .name = "ITLB_miss", .desc = "I-TLB miss traps taken", .ctrl = PME_CTRL_S1, .code = 0x11, }, { .name = "DTLB_miss", .desc = "Memory reference instructions which trap due to D-TLB miss", .ctrl = PME_CTRL_S1, .code = 0x12, }, { .name = "WC_miss", .desc = "W-cache misses", .ctrl = PME_CTRL_S1, .code = 0x13, }, { .name = "WC_snoop_cb", .desc = "W-cache copybacks generated by a snoop from a remote processor", .ctrl = PME_CTRL_S1, .code = 0x14, }, { .name = "WC_scrubbed", .desc = "W-cache hits to clean lines", .ctrl = PME_CTRL_S1, .code = 0x15, }, { .name = "WC_wb_wo_read", .desc = "W-cache writebacks not requiring a read", .ctrl = PME_CTRL_S1, .code = 0x16, }, { .name = "PC_soft_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a software-prefetch instruction", .ctrl = PME_CTRL_S1, .code = 0x18, }, { .name = "PC_snoop_inv", .desc = "P-cache invalidates that were generated by a snoop from a remote processor and stores by a local processor", .ctrl = PME_CTRL_S1, .code = 0x19, }, { .name = "PC_hard_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a hardware prefetch", .ctrl = PME_CTRL_S1, .code = 0x1a, }, { .name = "PC_port1_rd", .desc = "P-cache cacheable FP loads to the second port (memory and out-of-pipeline instruction execution loads via the A0 and A1 pipelines)", .ctrl = PME_CTRL_S1, .code = 0x1b, }, { .name = "SW_count1", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S1, .code = 0x1c, }, { .name = "IU_Stat_Br_miss_untaken", .desc = "Retired branches that were predicted to be untaken, but in fact were taken", .ctrl = PME_CTRL_S1, .code = 0x1d, }, { .name = "IU_Stat_Br_Count_untaken", .desc = "Retired untaken branches", .ctrl = PME_CTRL_S1, .code = 0x1e, }, { .name = "PC_MS_miss", .desc = "FP loads through the MS pipeline that miss P-cache", .ctrl = PME_CTRL_S1, .code = 0x1f, }, { .name = "Re_RAW_miss", .desc = "Stall due to recirculation when there is a load in the E-stage which has a non-bypassable read-after-write hazard with an earlier store instruction", .ctrl = PME_CTRL_S1, .code = 0x26, }, { .name = "FM_pipe_completion", .desc = "Instructions that complete execution on the FPG Multiply pipelines", .ctrl = PME_CTRL_S0, .code = 0x27, }, /* PIC0 memory controller events common to UltraSPARC-III/III+ processors */ { .name = "MC_reads_0", .desc = "Read requests completed to memory bank 0", .ctrl = PME_CTRL_S0, .code = 0x20, }, { .name = "MC_reads_1", .desc = "Read requests completed to memory bank 1", .ctrl = PME_CTRL_S0, .code = 0x21, }, { .name = "MC_reads_2", .desc = "Read requests completed to memory bank 2", .ctrl = PME_CTRL_S0, .code = 0x22, }, { .name = "MC_reads_3", .desc = "Read requests completed to memory bank 3", .ctrl = PME_CTRL_S0, .code = 0x23, }, { .name = "MC_stalls_0", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 0 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x24, }, { .name = "MC_stalls_2", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 2 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x25, }, /* PIC1 memory controller events common to all UltraSPARC-III/III+ processors */ { .name = "MC_writes_0", .desc = "Write requests completed to memory bank 0", .ctrl = PME_CTRL_S1, .code = 0x20, }, { .name = "MC_writes_1", .desc = "Write requests completed to memory bank 1", .ctrl = PME_CTRL_S1, .code = 0x21, }, { .name = "MC_writes_2", .desc = "Write requests completed to memory bank 2", .ctrl = PME_CTRL_S1, .code = 0x22, }, { .name = "MC_writes_3", .desc = "Write requests completed to memory bank 3", .ctrl = PME_CTRL_S1, .code = 0x23, }, { .name = "MC_stalls_1", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 1 was busy with a previous request", .ctrl = PME_CTRL_S1, .code = 0x24, }, { .name = "MC_stalls_3", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 3 was busy with a previous request", .ctrl = PME_CTRL_S1, .code = 0x25, }, /* PIC0 events specific to UltraSPARC-III+ processors */ { .name = "EC_wb_remote", .desc = "Counts the retry event when any victimization for which the processor generates an R_WB transaction to non_LPA address region", .ctrl = PME_CTRL_S0, .code = 0x19, }, { .name = "EC_miss_local", .desc = "Counts any transaction to an LPA for which the processor issues an RTS/RTO/RS transaction", .ctrl = PME_CTRL_S0, .code = 0x1a, }, { .name = "EC_miss_mtag_remote", .desc = "Counts any transaction to an LPA in which the processor is required to generate a retry transaction", .ctrl = PME_CTRL_S0, .code = 0x1b, }, /* PIC1 events specific to UltraSPARC-III+/IIIi processors */ { .name = "Re_DC_missovhd", .desc = "Used to measure D-cache stall counts seperatedly for L2-cache hits and misses. This counter is used with the recirculation and cache access events to seperately calculate the D-cache loads that hit and miss the L2-cache", .ctrl = PME_CTRL_S1, .code = 0x4, }, /* PIC1 events specific to UltraSPARC-III+ processors */ { .name = "EC_miss_mtag_remote", .desc = "Counts any transaction to an LPA in which the processor is required to generate a retry transaction", .ctrl = PME_CTRL_S1, .code = 0x28, }, { .name = "EC_miss_remote", .desc = "Counts the events triggered whenever the processor generates a remote (R_*) transaction and the address is to a non-LPA portion (remote) of the physical address space, or an R_WS transaction due to block-store/block-store-commit to any address space (LPA or non-LPA), or an R-RTO due to store/swap request on Os state to LPA space", .ctrl = PME_CTRL_S1, .code = 0x29, }, }; #define PME_SPARC_ULTRA3PLUS_EVENT_COUNT (sizeof(ultra3plus_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/arm_cortex_a8_events.h0000600003276200002170000001460312247131123022630 0ustar ralphundrgrad/* * Copyright (c) 2010 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * the various event names are the same as those given in the * file linux-2.6/arch/arm/kernel/perf_event.c */ /* * Cortex A8 Event Table */ static const arm_entry_t arm_cortex_a8_pe []={ {.name = "PMNC_SW_INCR", .code = 0x00, .desc = "Incremented by writes to the Software Increment Register" }, {.name = "IFETCH_MISS", .code = 0x01, .desc = "Instruction fetches that cause lowest-level cache miss" }, {.name = "ITLB_MISS", .code = 0x02, .desc = "Instruction fetches that cause lowest-level TLB miss" }, {.name = "DCACHE_REFILL", .code = 0x03, .desc = "Data read or writes that cause lowest-level cache miss" }, {.name = "DCACHE_ACCESS", .code = 0x04, .desc = "Data read or writes that cause lowest-level cache access" }, {.name = "DTLB_REFILL", .code = 0x05, .desc = "Data read or writes that cause lowest-level TLB refill" }, {.name = "DREAD", .code = 0x06, .desc = "Data read architecturally executed" }, {.name = "DWRITE", .code = 0x07, .desc = "Data write architecturally executed" }, {.name = "INSTR_EXECUTED", .code = 0x08, .desc = "Instructions architecturally executed" }, {.name = "EXC_TAKEN", .code = 0x09, .desc = "Counts each exception taken" }, {.name = "EXC_EXECUTED", .code = 0x0a, .desc = "Exception returns architecturally executed" }, {.name = "CID_WRITE", .code = 0x0b, .desc = "Instruction writes to Context ID Register, architecturally executed" }, {.name = "PC_WRITE", .code = 0x0c, .desc = "Software change of PC. Equivelant to branches" }, {.name = "PC_IMM_BRANCH", .code = 0x0d, .desc = "Immedidate branches architecturally executed" }, {.name = "PC_PROC_RETURN", .code = 0x0e, .desc = "Procedure returns architecturally executed" }, {.name = "UNALIGNED_ACCESS", .code = 0x0f, .desc = "Unaligned accesses architecturally executed" }, {.name = "PC_BRANCH_MIS_PRED", .code = 0x10, .desc = "Branches mispredicted or not predicted" }, {.name = "CLOCK_CYCLES", /* this isn't in the Cortex-A8 tech doc */ .code = 0x11, /* but is in linux kernel */ .desc = "Clock cycles" }, {.name = "PC_BRANCH_MIS_USED", .code = 0x12, .desc = "Branches that could have been predicted" }, {.name = "WRITE_BUFFER_FULL", .code = 0x40, .desc = "Cycles Write buffer full" }, {.name = "L2_STORE_MERGED", .code = 0x41, .desc = "Stores merged in L2" }, {.name = "L2_STORE_BUFF", .code = 0x42, .desc = "Bufferable store transactions to L2" }, {.name = "L2_ACCESS", .code = 0x43, .desc = "Accesses to L2 cache" }, {.name = "L2_CACHE_MISS", .code = 0x44, .desc = "L2 cache misses" }, {.name = "AXI_READ_CYCLES", .code = 0x45, .desc = "Cycles with active AXI read channel transactions" }, {.name = "AXI_WRITE_CYCLES", .code = 0x46, .desc = "Cycles with Active AXI write channel transactions" }, {.name = "MEMORY_REPLAY", .code = 0x47, .desc = "Memory replay events" }, {.name = "UNALIGNED_ACCESS_REPLAY", .code = 0x48, .desc = "Unaligned accesses causing replays" }, {.name = "L1_DATA_MISS", .code = 0x49, .desc = "L1 data misses due to hashing algorithm" }, {.name = "L1_INST_MISS", .code = 0x4a, .desc = "L1 instruction misses due to hashing algorithm" }, {.name = "L1_DATA_COLORING", .code = 0x4b, .desc = "L1 data access where page color alias occurs" }, {.name = "L1_NEON_DATA", .code = 0x4c, .desc = "NEON accesses that hit in L1 cache" }, {.name = "L1_NEON_CACH_DATA", .code = 0x4d, .desc = "NEON cache accesses for L1 cache" }, {.name = "L2_NEON", .code = 0x4e, .desc = "L2 accesses caused by NEON" }, {.name = "L2_NEON_HIT", .code = 0x4f, .desc = "L2 hits caused by NEON" }, {.name = "L1_INST", .code = 0x50, .desc = "L1 instruction cache accesses" }, {.name = "PC_RETURN_MIS_PRED", .code = 0x51, .desc = "Return stack mispredictions" }, {.name = "PC_BRANCH_FAILED", .code = 0x52, .desc = "Branch prediction failures" }, {.name = "PC_BRANCH_TAKEN", .code = 0x53, .desc = "Branches predicted taken" }, {.name = "PC_BRANCH_EXECUTED", .code = 0x54, .desc = "Taken branches executed" }, {.name = "OP_EXECUTED", .code = 0x55, .desc = "Operations excuted (includes sub-ops in multi-cycle instructions)" }, {.name = "CYCLES_INST_STALL", .code = 0x56, .desc = "Cycles no instruction is available for issue" }, {.name = "CYCLES_INST", .code = 0x57, .desc = "Number of instructions issued in cycle" }, {.name = "CYCLES_NEON_DATA_STALL", .code = 0x58, .desc = "Cycles stalled waiting on NEON MRC data" }, {.name = "CYCLES_NEON_INST_STALL", .code = 0x59, .desc = "Cycles stalled due to full NEON queues" }, {.name = "NEON_CYCLES", .code = 0x5a, .desc = "Cycles NEON and integer processors both not idle" }, {.name = "PMU0_EVENTS", .code = 0x70, .desc = "External PMUEXTIN[0] event" }, {.name = "PMU1_EVENTS", .code = 0x71, .desc = "External PMUEXTIN[1] event" }, {.name = "PMU_EVENTS", .code = 0x72, .desc = "External PMUEXTIN[0] or PMUEXTIN[1] event" }, {.name = "CPU_CYCLES", .code = 0xff, .desc = "CPU cycles" }, }; #define ARM_CORTEX_A8_EVENT_COUNT (sizeof(arm_cortex_a8_pe)/sizeof(arm_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_x86_arch_events.h0000600003276200002170000000642212247131123022712 0ustar ralphundrgrad/* * Copyright (c) 2006-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * architected events for architectural perfmon v1 and v2 as defined by the IA-32 developer's manual * Vol 3B, table 18-6 (May 2007) */ static intel_x86_entry_t intel_x86_arch_pe[]={ {.name = "UNHALTED_CORE_CYCLES", .code = 0x003c, .cntmsk = 0x200000000ull, /* temporary */ .desc = "count core clock cycles whenever the clock signal on the specific core is running (not halted)" }, {.name = "INSTRUCTION_RETIRED", .code = 0x00c0, .cntmsk = 0x100000000ull, /* temporary */ .desc = "count the number of instructions at retirement. For instructions that consists of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction", }, {.name = "UNHALTED_REFERENCE_CYCLES", .code = 0x013c, .cntmsk = 0x400000000ull, /* temporary */ .desc = "count reference clock cycles while the clock signal on the specific core is running. The reference clock operates at a fixed frequency, irrespective of core freqeuncy changes due to performance state transitions", }, {.name = "LLC_REFERENCES", .code = 0x4f2e, .desc = "count each request originating from the core to reference a cache line in the last level cache. The count may include speculation, but excludes cache line fills due to hardware prefetch", }, {.name = "LLC_MISSES", .code = 0x412e, .desc = "count each cache miss condition for references to the last level cache. The event count may include speculation, but excludes cache line fills due to hardware prefetch", }, {.name = "BRANCH_INSTRUCTIONS_RETIRED", .code = 0x00c4, .desc = "count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction", }, {.name = "MISPREDICTED_BRANCH_RETIRED", .code = 0x00c5, .desc = "count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", } }; papi-5.3.0/src/libpfm4/lib/events/power5+_events.h0000600003276200002170000053745112247131124021405 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER5p_EVENTS_H__ #define __POWER5p_EVENTS_H__ /* * File: power5+_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define POWER5p_PME_PM_LSU_REJECT_RELOAD_CDF 0 #define POWER5p_PME_PM_FPU1_SINGLE 1 #define POWER5p_PME_PM_L3SB_REF 2 #define POWER5p_PME_PM_THRD_PRIO_DIFF_3or4_CYC 3 #define POWER5p_PME_PM_INST_FROM_L275_SHR 4 #define POWER5p_PME_PM_MRK_DATA_FROM_L375_MOD 5 #define POWER5p_PME_PM_DTLB_MISS_4K 6 #define POWER5p_PME_PM_CLB_FULL_CYC 7 #define POWER5p_PME_PM_MRK_ST_CMPL 8 #define POWER5p_PME_PM_LSU_FLUSH_LRQ_FULL 9 #define POWER5p_PME_PM_MRK_DATA_FROM_L275_SHR 10 #define POWER5p_PME_PM_1INST_CLB_CYC 11 #define POWER5p_PME_PM_MEM_SPEC_RD_CANCEL 12 #define POWER5p_PME_PM_MRK_DTLB_MISS_16M 13 #define POWER5p_PME_PM_FPU_FDIV 14 #define POWER5p_PME_PM_FPU_SINGLE 15 #define POWER5p_PME_PM_FPU0_FMA 16 #define POWER5p_PME_PM_SLB_MISS 17 #define POWER5p_PME_PM_LSU1_FLUSH_LRQ 18 #define POWER5p_PME_PM_L2SA_ST_HIT 19 #define POWER5p_PME_PM_DTLB_MISS 20 #define POWER5p_PME_PM_BR_PRED_TA 21 #define POWER5p_PME_PM_MRK_DATA_FROM_L375_MOD_CYC 22 #define POWER5p_PME_PM_CMPLU_STALL_FXU 23 #define POWER5p_PME_PM_EXT_INT 24 #define POWER5p_PME_PM_MRK_LSU1_FLUSH_LRQ 25 #define POWER5p_PME_PM_MRK_ST_GPS 26 #define POWER5p_PME_PM_LSU1_LDF 27 #define POWER5p_PME_PM_FAB_CMD_ISSUED 28 #define POWER5p_PME_PM_LSU0_SRQ_STFWD 29 #define POWER5p_PME_PM_CR_MAP_FULL_CYC 30 #define POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_RC_FULL 31 #define POWER5p_PME_PM_MRK_LSU0_FLUSH_ULD 32 #define POWER5p_PME_PM_LSU_FLUSH_SRQ_FULL 33 #define POWER5p_PME_PM_MEM_RQ_DISP_Q16to19 34 #define POWER5p_PME_PM_FLUSH_IMBAL 35 #define POWER5p_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC 36 #define POWER5p_PME_PM_DATA_FROM_L35_MOD 37 #define POWER5p_PME_PM_MEM_HI_PRIO_WR_CMPL 38 #define POWER5p_PME_PM_FPU1_FDIV 39 #define POWER5p_PME_PM_MEM_RQ_DISP 40 #define POWER5p_PME_PM_FPU0_FRSP_FCONV 41 #define POWER5p_PME_PM_LWSYNC_HELD 42 #define POWER5p_PME_PM_FXU_FIN 43 #define POWER5p_PME_PM_DSLB_MISS 44 #define POWER5p_PME_PM_DATA_FROM_L275_SHR 45 #define POWER5p_PME_PM_FXLS1_FULL_CYC 46 #define POWER5p_PME_PM_THRD_SEL_T0 47 #define POWER5p_PME_PM_PTEG_RELOAD_VALID 48 #define POWER5p_PME_PM_MRK_STCX_FAIL 49 #define POWER5p_PME_PM_LSU_LMQ_LHR_MERGE 50 #define POWER5p_PME_PM_2INST_CLB_CYC 51 #define POWER5p_PME_PM_FAB_PNtoVN_DIRECT 52 #define POWER5p_PME_PM_PTEG_FROM_L2MISS 53 #define POWER5p_PME_PM_CMPLU_STALL_LSU 54 #define POWER5p_PME_PM_MRK_DSLB_MISS 55 #define POWER5p_PME_PM_LSU_FLUSH_ULD 56 #define POWER5p_PME_PM_PTEG_FROM_LMEM 57 #define POWER5p_PME_PM_MRK_BRU_FIN 58 #define POWER5p_PME_PM_MEM_WQ_DISP_WRITE 59 #define POWER5p_PME_PM_MRK_DATA_FROM_L275_MOD_CYC 60 #define POWER5p_PME_PM_LSU1_NCLD 61 #define POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_OTHER 62 #define POWER5p_PME_PM_SNOOP_PW_RETRY_WQ_PWQ 63 #define POWER5p_PME_PM_FPU1_FULL_CYC 64 #define POWER5p_PME_PM_FPR_MAP_FULL_CYC 65 #define POWER5p_PME_PM_L3SA_ALL_BUSY 66 #define POWER5p_PME_PM_3INST_CLB_CYC 67 #define POWER5p_PME_PM_MEM_PWQ_DISP_Q2or3 68 #define POWER5p_PME_PM_L2SA_SHR_INV 69 #define POWER5p_PME_PM_THRESH_TIMEO 70 #define POWER5p_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL 71 #define POWER5p_PME_PM_THRD_SEL_OVER_GCT_IMBAL 72 #define POWER5p_PME_PM_FPU_FSQRT 73 #define POWER5p_PME_PM_PMC1_OVERFLOW 74 #define POWER5p_PME_PM_MRK_LSU0_FLUSH_LRQ 75 #define POWER5p_PME_PM_L3SC_SNOOP_RETRY 76 #define POWER5p_PME_PM_DATA_TABLEWALK_CYC 77 #define POWER5p_PME_PM_THRD_PRIO_6_CYC 78 #define POWER5p_PME_PM_FPU_FEST 79 #define POWER5p_PME_PM_FAB_M1toP1_SIDECAR_EMPTY 80 #define POWER5p_PME_PM_MRK_DATA_FROM_RMEM 81 #define POWER5p_PME_PM_MRK_DATA_FROM_L35_MOD_CYC 82 #define POWER5p_PME_PM_MEM_PWQ_DISP 83 #define POWER5p_PME_PM_FAB_P1toM1_SIDECAR_EMPTY 84 #define POWER5p_PME_PM_LD_MISS_L1_LSU0 85 #define POWER5p_PME_PM_SNOOP_PARTIAL_RTRY_QFULL 86 #define POWER5p_PME_PM_FPU1_STALL3 87 #define POWER5p_PME_PM_GCT_USAGE_80to99_CYC 88 #define POWER5p_PME_PM_WORK_HELD 89 #define POWER5p_PME_PM_INST_CMPL 90 #define POWER5p_PME_PM_LSU1_FLUSH_UST 91 #define POWER5p_PME_PM_FXU_IDLE 92 #define POWER5p_PME_PM_LSU0_FLUSH_ULD 93 #define POWER5p_PME_PM_LSU1_REJECT_LMQ_FULL 94 #define POWER5p_PME_PM_GRP_DISP_REJECT 95 #define POWER5p_PME_PM_PTEG_FROM_L25_SHR 96 #define POWER5p_PME_PM_L2SA_MOD_INV 97 #define POWER5p_PME_PM_FAB_CMD_RETRIED 98 #define POWER5p_PME_PM_L3SA_SHR_INV 99 #define POWER5p_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL 100 #define POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_ADDR 101 #define POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_RC_FULL 102 #define POWER5p_PME_PM_PTEG_FROM_L375_MOD 103 #define POWER5p_PME_PM_MRK_LSU1_FLUSH_UST 104 #define POWER5p_PME_PM_BR_ISSUED 105 #define POWER5p_PME_PM_MRK_GRP_BR_REDIR 106 #define POWER5p_PME_PM_EE_OFF 107 #define POWER5p_PME_PM_IERAT_XLATE_WR_LP 108 #define POWER5p_PME_PM_DTLB_REF_64K 109 #define POWER5p_PME_PM_MEM_RQ_DISP_Q4to7 110 #define POWER5p_PME_PM_MEM_FAST_PATH_RD_DISP 111 #define POWER5p_PME_PM_INST_FROM_L3 112 #define POWER5p_PME_PM_ITLB_MISS 113 #define POWER5p_PME_PM_FXU1_BUSY_FXU0_IDLE 114 #define POWER5p_PME_PM_DTLB_REF_4K 115 #define POWER5p_PME_PM_FXLS_FULL_CYC 116 #define POWER5p_PME_PM_GRP_DISP_VALID 117 #define POWER5p_PME_PM_LSU_FLUSH_UST 118 #define POWER5p_PME_PM_FXU1_FIN 119 #define POWER5p_PME_PM_THRD_PRIO_4_CYC 120 #define POWER5p_PME_PM_MRK_DATA_FROM_L35_MOD 121 #define POWER5p_PME_PM_4INST_CLB_CYC 122 #define POWER5p_PME_PM_MRK_DTLB_REF_16M 123 #define POWER5p_PME_PM_INST_FROM_L375_MOD 124 #define POWER5p_PME_PM_GRP_CMPL 125 #define POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_ADDR 126 #define POWER5p_PME_PM_FPU1_1FLOP 127 #define POWER5p_PME_PM_FPU_FRSP_FCONV 128 #define POWER5p_PME_PM_L3SC_REF 129 #define POWER5p_PME_PM_5INST_CLB_CYC 130 #define POWER5p_PME_PM_THRD_L2MISS_BOTH_CYC 131 #define POWER5p_PME_PM_MEM_PW_GATH 132 #define POWER5p_PME_PM_DTLB_REF_16G 133 #define POWER5p_PME_PM_FAB_DCLAIM_ISSUED 134 #define POWER5p_PME_PM_FAB_PNtoNN_SIDECAR 135 #define POWER5p_PME_PM_GRP_IC_MISS 136 #define POWER5p_PME_PM_INST_FROM_L35_SHR 137 #define POWER5p_PME_PM_LSU_LMQ_FULL_CYC 138 #define POWER5p_PME_PM_MRK_DATA_FROM_L2_CYC 139 #define POWER5p_PME_PM_LSU_SRQ_SYNC_CYC 140 #define POWER5p_PME_PM_LSU0_BUSY_REJECT 141 #define POWER5p_PME_PM_LSU_REJECT_ERAT_MISS 142 #define POWER5p_PME_PM_MRK_DATA_FROM_RMEM_CYC 143 #define POWER5p_PME_PM_DATA_FROM_L375_SHR 144 #define POWER5p_PME_PM_PTEG_FROM_L25_MOD 145 #define POWER5p_PME_PM_FPU0_FMOV_FEST 146 #define POWER5p_PME_PM_THRD_PRIO_7_CYC 147 #define POWER5p_PME_PM_LSU1_FLUSH_SRQ 148 #define POWER5p_PME_PM_LD_REF_L1_LSU0 149 #define POWER5p_PME_PM_L2SC_RCST_DISP 150 #define POWER5p_PME_PM_CMPLU_STALL_DIV 151 #define POWER5p_PME_PM_MEM_RQ_DISP_Q12to15 152 #define POWER5p_PME_PM_INST_FROM_L375_SHR 153 #define POWER5p_PME_PM_ST_REF_L1 154 #define POWER5p_PME_PM_L3SB_ALL_BUSY 155 #define POWER5p_PME_PM_FAB_P1toVNorNN_SIDECAR_EMPTY 156 #define POWER5p_PME_PM_MRK_DATA_FROM_L275_SHR_CYC 157 #define POWER5p_PME_PM_FAB_HOLDtoNN_EMPTY 158 #define POWER5p_PME_PM_DATA_FROM_LMEM 159 #define POWER5p_PME_PM_RUN_CYC 160 #define POWER5p_PME_PM_PTEG_FROM_RMEM 161 #define POWER5p_PME_PM_L2SC_RCLD_DISP 162 #define POWER5p_PME_PM_LSU_LRQ_S0_VALID 163 #define POWER5p_PME_PM_LSU0_LDF 164 #define POWER5p_PME_PM_PMC3_OVERFLOW 165 #define POWER5p_PME_PM_MRK_IMR_RELOAD 166 #define POWER5p_PME_PM_MRK_GRP_TIMEO 167 #define POWER5p_PME_PM_ST_MISS_L1 168 #define POWER5p_PME_PM_STOP_COMPLETION 169 #define POWER5p_PME_PM_LSU_BUSY_REJECT 170 #define POWER5p_PME_PM_ISLB_MISS 171 #define POWER5p_PME_PM_CYC 172 #define POWER5p_PME_PM_THRD_ONE_RUN_CYC 173 #define POWER5p_PME_PM_GRP_BR_REDIR_NONSPEC 174 #define POWER5p_PME_PM_LSU1_SRQ_STFWD 175 #define POWER5p_PME_PM_L3SC_MOD_INV 176 #define POWER5p_PME_PM_L2_PREF 177 #define POWER5p_PME_PM_GCT_NOSLOT_BR_MPRED 178 #define POWER5p_PME_PM_MRK_DATA_FROM_L25_MOD 179 #define POWER5p_PME_PM_L2SB_ST_REQ 180 #define POWER5p_PME_PM_L2SB_MOD_INV 181 #define POWER5p_PME_PM_MRK_L1_RELOAD_VALID 182 #define POWER5p_PME_PM_L3SB_HIT 183 #define POWER5p_PME_PM_L2SB_SHR_MOD 184 #define POWER5p_PME_PM_EE_OFF_EXT_INT 185 #define POWER5p_PME_PM_1PLUS_PPC_CMPL 186 #define POWER5p_PME_PM_L2SC_SHR_MOD 187 #define POWER5p_PME_PM_PMC6_OVERFLOW 188 #define POWER5p_PME_PM_IC_PREF_INSTALL 189 #define POWER5p_PME_PM_LSU_LRQ_FULL_CYC 190 #define POWER5p_PME_PM_TLB_MISS 191 #define POWER5p_PME_PM_GCT_FULL_CYC 192 #define POWER5p_PME_PM_FXU_BUSY 193 #define POWER5p_PME_PM_MRK_DATA_FROM_L3_CYC 194 #define POWER5p_PME_PM_LSU_REJECT_LMQ_FULL 195 #define POWER5p_PME_PM_LSU_SRQ_S0_ALLOC 196 #define POWER5p_PME_PM_GRP_MRK 197 #define POWER5p_PME_PM_INST_FROM_L25_SHR 198 #define POWER5p_PME_PM_DC_PREF_STREAM_ALLOC 199 #define POWER5p_PME_PM_FPU1_FIN 200 #define POWER5p_PME_PM_BR_MPRED_TA 201 #define POWER5p_PME_PM_MRK_DTLB_REF_64K 202 #define POWER5p_PME_PM_RUN_INST_CMPL 203 #define POWER5p_PME_PM_CRQ_FULL_CYC 204 #define POWER5p_PME_PM_L2SA_RCLD_DISP 205 #define POWER5p_PME_PM_SNOOP_WR_RETRY_QFULL 206 #define POWER5p_PME_PM_MRK_DTLB_REF_4K 207 #define POWER5p_PME_PM_LSU_SRQ_S0_VALID 208 #define POWER5p_PME_PM_LSU0_FLUSH_LRQ 209 #define POWER5p_PME_PM_INST_FROM_L275_MOD 210 #define POWER5p_PME_PM_GCT_EMPTY_CYC 211 #define POWER5p_PME_PM_LARX_LSU0 212 #define POWER5p_PME_PM_THRD_PRIO_DIFF_5or6_CYC 213 #define POWER5p_PME_PM_SNOOP_RETRY_1AHEAD 214 #define POWER5p_PME_PM_FPU1_FSQRT 215 #define POWER5p_PME_PM_MRK_LD_MISS_L1_LSU1 216 #define POWER5p_PME_PM_MRK_FPU_FIN 217 #define POWER5p_PME_PM_THRD_PRIO_5_CYC 218 #define POWER5p_PME_PM_MRK_DATA_FROM_LMEM 219 #define POWER5p_PME_PM_SNOOP_TLBIE 220 #define POWER5p_PME_PM_FPU1_FRSP_FCONV 221 #define POWER5p_PME_PM_DTLB_MISS_16G 222 #define POWER5p_PME_PM_L3SB_SNOOP_RETRY 223 #define POWER5p_PME_PM_FAB_VBYPASS_EMPTY 224 #define POWER5p_PME_PM_MRK_DATA_FROM_L275_MOD 225 #define POWER5p_PME_PM_L2SB_RCST_DISP 226 #define POWER5p_PME_PM_6INST_CLB_CYC 227 #define POWER5p_PME_PM_FLUSH 228 #define POWER5p_PME_PM_L2SC_MOD_INV 229 #define POWER5p_PME_PM_FPU_DENORM 230 #define POWER5p_PME_PM_L3SC_HIT 231 #define POWER5p_PME_PM_SNOOP_WR_RETRY_RQ 232 #define POWER5p_PME_PM_LSU1_REJECT_SRQ 233 #define POWER5p_PME_PM_L3SC_ALL_BUSY 234 #define POWER5p_PME_PM_IC_PREF_REQ 235 #define POWER5p_PME_PM_MRK_GRP_IC_MISS 236 #define POWER5p_PME_PM_GCT_NOSLOT_IC_MISS 237 #define POWER5p_PME_PM_MRK_DATA_FROM_L3 238 #define POWER5p_PME_PM_GCT_NOSLOT_SRQ_FULL 239 #define POWER5p_PME_PM_CMPLU_STALL_DCACHE_MISS 240 #define POWER5p_PME_PM_THRD_SEL_OVER_ISU_HOLD 241 #define POWER5p_PME_PM_LSU_FLUSH_LRQ 242 #define POWER5p_PME_PM_THRD_PRIO_2_CYC 243 #define POWER5p_PME_PM_L3SA_MOD_INV 244 #define POWER5p_PME_PM_LSU_FLUSH_SRQ 245 #define POWER5p_PME_PM_MRK_LSU_SRQ_INST_VALID 246 #define POWER5p_PME_PM_L3SA_REF 247 #define POWER5p_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL 248 #define POWER5p_PME_PM_FPU0_STALL3 249 #define POWER5p_PME_PM_TB_BIT_TRANS 250 #define POWER5p_PME_PM_GPR_MAP_FULL_CYC 251 #define POWER5p_PME_PM_MRK_LSU_FLUSH_LRQ 252 #define POWER5p_PME_PM_FPU0_STF 253 #define POWER5p_PME_PM_MRK_DTLB_MISS 254 #define POWER5p_PME_PM_FPU1_FMA 255 #define POWER5p_PME_PM_L2SA_MOD_TAG 256 #define POWER5p_PME_PM_LSU1_FLUSH_ULD 257 #define POWER5p_PME_PM_MRK_INST_FIN 258 #define POWER5p_PME_PM_MRK_LSU0_FLUSH_UST 259 #define POWER5p_PME_PM_FPU0_FULL_CYC 260 #define POWER5p_PME_PM_LSU_LRQ_S0_ALLOC 261 #define POWER5p_PME_PM_MRK_LSU1_FLUSH_ULD 262 #define POWER5p_PME_PM_MRK_DTLB_REF 263 #define POWER5p_PME_PM_BR_UNCOND 264 #define POWER5p_PME_PM_THRD_SEL_OVER_L2MISS 265 #define POWER5p_PME_PM_L2SB_SHR_INV 266 #define POWER5p_PME_PM_MEM_LO_PRIO_WR_CMPL 267 #define POWER5p_PME_PM_MRK_DTLB_MISS_64K 268 #define POWER5p_PME_PM_MRK_ST_MISS_L1 269 #define POWER5p_PME_PM_L3SC_MOD_TAG 270 #define POWER5p_PME_PM_GRP_DISP_SUCCESS 271 #define POWER5p_PME_PM_THRD_PRIO_DIFF_1or2_CYC 272 #define POWER5p_PME_PM_IC_DEMAND_L2_BHT_REDIRECT 273 #define POWER5p_PME_PM_LSU_DERAT_MISS 274 #define POWER5p_PME_PM_MEM_WQ_DISP_Q8to15 275 #define POWER5p_PME_PM_FPU0_SINGLE 276 #define POWER5p_PME_PM_THRD_PRIO_1_CYC 277 #define POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_OTHER 278 #define POWER5p_PME_PM_SNOOP_RD_RETRY_RQ 279 #define POWER5p_PME_PM_FAB_HOLDtoVN_EMPTY 280 #define POWER5p_PME_PM_FPU1_FEST 281 #define POWER5p_PME_PM_SNOOP_DCLAIM_RETRY_QFULL 282 #define POWER5p_PME_PM_MRK_DATA_FROM_L25_SHR_CYC 283 #define POWER5p_PME_PM_MRK_ST_CMPL_INT 284 #define POWER5p_PME_PM_FLUSH_BR_MPRED 285 #define POWER5p_PME_PM_MRK_DTLB_MISS_16G 286 #define POWER5p_PME_PM_FPU_STF 287 #define POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_ADDR 288 #define POWER5p_PME_PM_CMPLU_STALL_FPU 289 #define POWER5p_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC 290 #define POWER5p_PME_PM_GCT_NOSLOT_CYC 291 #define POWER5p_PME_PM_FXU0_BUSY_FXU1_IDLE 292 #define POWER5p_PME_PM_PTEG_FROM_L35_SHR 293 #define POWER5p_PME_PM_MRK_DTLB_REF_16G 294 #define POWER5p_PME_PM_MRK_LSU_FLUSH_UST 295 #define POWER5p_PME_PM_MRK_DATA_FROM_L25_SHR 296 #define POWER5p_PME_PM_L3SA_HIT 297 #define POWER5p_PME_PM_MRK_DATA_FROM_L35_SHR 298 #define POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_ADDR 299 #define POWER5p_PME_PM_IERAT_XLATE_WR 300 #define POWER5p_PME_PM_L2SA_ST_REQ 301 #define POWER5p_PME_PM_INST_FROM_LMEM 302 #define POWER5p_PME_PM_THRD_SEL_T1 303 #define POWER5p_PME_PM_IC_DEMAND_L2_BR_REDIRECT 304 #define POWER5p_PME_PM_MRK_DATA_FROM_L35_SHR_CYC 305 #define POWER5p_PME_PM_FPU0_1FLOP 306 #define POWER5p_PME_PM_PTEG_FROM_L2 307 #define POWER5p_PME_PM_MEM_PW_CMPL 308 #define POWER5p_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC 309 #define POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_OTHER 310 #define POWER5p_PME_PM_MRK_DTLB_MISS_4K 311 #define POWER5p_PME_PM_FPU0_FIN 312 #define POWER5p_PME_PM_L3SC_SHR_INV 313 #define POWER5p_PME_PM_GRP_BR_REDIR 314 #define POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_RC_FULL 315 #define POWER5p_PME_PM_MRK_LSU_FLUSH_SRQ 316 #define POWER5p_PME_PM_PTEG_FROM_L275_SHR 317 #define POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_RC_FULL 318 #define POWER5p_PME_PM_SNOOP_RD_RETRY_WQ 319 #define POWER5p_PME_PM_FAB_DCLAIM_RETRIED 320 #define POWER5p_PME_PM_LSU0_NCLD 321 #define POWER5p_PME_PM_LSU1_BUSY_REJECT 322 #define POWER5p_PME_PM_FXLS0_FULL_CYC 323 #define POWER5p_PME_PM_DTLB_REF_16M 324 #define POWER5p_PME_PM_FPU0_FEST 325 #define POWER5p_PME_PM_GCT_USAGE_60to79_CYC 326 #define POWER5p_PME_PM_DATA_FROM_L25_MOD 327 #define POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_ADDR 328 #define POWER5p_PME_PM_LSU0_REJECT_ERAT_MISS 329 #define POWER5p_PME_PM_DATA_FROM_L375_MOD 330 #define POWER5p_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 331 #define POWER5p_PME_PM_DTLB_MISS_64K 332 #define POWER5p_PME_PM_LSU0_REJECT_RELOAD_CDF 333 #define POWER5p_PME_PM_0INST_FETCH 334 #define POWER5p_PME_PM_LSU1_REJECT_RELOAD_CDF 335 #define POWER5p_PME_PM_MEM_WQ_DISP_Q0to7 336 #define POWER5p_PME_PM_L1_PREF 337 #define POWER5p_PME_PM_MRK_DATA_FROM_LMEM_CYC 338 #define POWER5p_PME_PM_BRQ_FULL_CYC 339 #define POWER5p_PME_PM_GRP_IC_MISS_NONSPEC 340 #define POWER5p_PME_PM_PTEG_FROM_L275_MOD 341 #define POWER5p_PME_PM_MRK_LD_MISS_L1_LSU0 342 #define POWER5p_PME_PM_MRK_DATA_FROM_L375_SHR_CYC 343 #define POWER5p_PME_PM_DATA_FROM_L3 344 #define POWER5p_PME_PM_INST_FROM_L2 345 #define POWER5p_PME_PM_LSU_FLUSH 346 #define POWER5p_PME_PM_PMC2_OVERFLOW 347 #define POWER5p_PME_PM_FPU0_DENORM 348 #define POWER5p_PME_PM_FPU1_FMOV_FEST 349 #define POWER5p_PME_PM_INST_FETCH_CYC 350 #define POWER5p_PME_PM_INST_DISP 351 #define POWER5p_PME_PM_LSU_LDF 352 #define POWER5p_PME_PM_DATA_FROM_L25_SHR 353 #define POWER5p_PME_PM_L1_DCACHE_RELOAD_VALID 354 #define POWER5p_PME_PM_MEM_WQ_DISP_DCLAIM 355 #define POWER5p_PME_PM_MRK_GRP_ISSUED 356 #define POWER5p_PME_PM_FPU_FULL_CYC 357 #define POWER5p_PME_PM_INST_FROM_L35_MOD 358 #define POWER5p_PME_PM_FPU_FMA 359 #define POWER5p_PME_PM_THRD_PRIO_3_CYC 360 #define POWER5p_PME_PM_MRK_CRU_FIN 361 #define POWER5p_PME_PM_SNOOP_WR_RETRY_WQ 362 #define POWER5p_PME_PM_CMPLU_STALL_REJECT 363 #define POWER5p_PME_PM_MRK_FXU_FIN 364 #define POWER5p_PME_PM_LSU1_REJECT_ERAT_MISS 365 #define POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_OTHER 366 #define POWER5p_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY 367 #define POWER5p_PME_PM_PMC4_OVERFLOW 368 #define POWER5p_PME_PM_L3SA_SNOOP_RETRY 369 #define POWER5p_PME_PM_PTEG_FROM_L35_MOD 370 #define POWER5p_PME_PM_INST_FROM_L25_MOD 371 #define POWER5p_PME_PM_THRD_SMT_HANG 372 #define POWER5p_PME_PM_CMPLU_STALL_ERAT_MISS 373 #define POWER5p_PME_PM_L3SA_MOD_TAG 374 #define POWER5p_PME_PM_INST_FROM_L2MISS 375 #define POWER5p_PME_PM_FLUSH_SYNC 376 #define POWER5p_PME_PM_MRK_GRP_DISP 377 #define POWER5p_PME_PM_MEM_RQ_DISP_Q8to11 378 #define POWER5p_PME_PM_L2SC_ST_HIT 379 #define POWER5p_PME_PM_L2SB_MOD_TAG 380 #define POWER5p_PME_PM_CLB_EMPTY_CYC 381 #define POWER5p_PME_PM_L2SB_ST_HIT 382 #define POWER5p_PME_PM_MEM_NONSPEC_RD_CANCEL 383 #define POWER5p_PME_PM_BR_PRED_CR_TA 384 #define POWER5p_PME_PM_MRK_LSU0_FLUSH_SRQ 385 #define POWER5p_PME_PM_MRK_LSU_FLUSH_ULD 386 #define POWER5p_PME_PM_INST_DISP_ATTEMPT 387 #define POWER5p_PME_PM_INST_FROM_RMEM 388 #define POWER5p_PME_PM_ST_REF_L1_LSU0 389 #define POWER5p_PME_PM_LSU0_DERAT_MISS 390 #define POWER5p_PME_PM_FPU_STALL3 391 #define POWER5p_PME_PM_L2SB_RCLD_DISP 392 #define POWER5p_PME_PM_BR_PRED_CR 393 #define POWER5p_PME_PM_MRK_DATA_FROM_L2 394 #define POWER5p_PME_PM_LSU0_FLUSH_SRQ 395 #define POWER5p_PME_PM_FAB_PNtoNN_DIRECT 396 #define POWER5p_PME_PM_IOPS_CMPL 397 #define POWER5p_PME_PM_L2SA_RCST_DISP 398 #define POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_OTHER 399 #define POWER5p_PME_PM_L2SC_SHR_INV 400 #define POWER5p_PME_PM_SNOOP_RETRY_AB_COLLISION 401 #define POWER5p_PME_PM_FAB_PNtoVN_SIDECAR 402 #define POWER5p_PME_PM_LSU0_REJECT_LMQ_FULL 403 #define POWER5p_PME_PM_LSU_LMQ_S0_ALLOC 404 #define POWER5p_PME_PM_SNOOP_PW_RETRY_RQ 405 #define POWER5p_PME_PM_DTLB_REF 406 #define POWER5p_PME_PM_PTEG_FROM_L3 407 #define POWER5p_PME_PM_FAB_M1toVNorNN_SIDECAR_EMPTY 408 #define POWER5p_PME_PM_LSU_SRQ_EMPTY_CYC 409 #define POWER5p_PME_PM_FPU1_STF 410 #define POWER5p_PME_PM_LSU_LMQ_S0_VALID 411 #define POWER5p_PME_PM_GCT_USAGE_00to59_CYC 412 #define POWER5p_PME_PM_FPU_FMOV_FEST 413 #define POWER5p_PME_PM_DATA_FROM_L2MISS 414 #define POWER5p_PME_PM_XER_MAP_FULL_CYC 415 #define POWER5p_PME_PM_GRP_DISP_BLK_SB_CYC 416 #define POWER5p_PME_PM_FLUSH_SB 417 #define POWER5p_PME_PM_MRK_DATA_FROM_L375_SHR 418 #define POWER5p_PME_PM_MRK_GRP_CMPL 419 #define POWER5p_PME_PM_SUSPENDED 420 #define POWER5p_PME_PM_SNOOP_RD_RETRY_QFULL 421 #define POWER5p_PME_PM_GRP_IC_MISS_BR_REDIR_NONSPEC 422 #define POWER5p_PME_PM_DATA_FROM_L35_SHR 423 #define POWER5p_PME_PM_L3SB_MOD_INV 424 #define POWER5p_PME_PM_STCX_FAIL 425 #define POWER5p_PME_PM_LD_MISS_L1_LSU1 426 #define POWER5p_PME_PM_GRP_DISP 427 #define POWER5p_PME_PM_DC_PREF_DST 428 #define POWER5p_PME_PM_FPU1_DENORM 429 #define POWER5p_PME_PM_FPU0_FPSCR 430 #define POWER5p_PME_PM_DATA_FROM_L2 431 #define POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_ADDR 432 #define POWER5p_PME_PM_FPU_1FLOP 433 #define POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_OTHER 434 #define POWER5p_PME_PM_FPU0_FSQRT 435 #define POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_RC_FULL 436 #define POWER5p_PME_PM_LD_REF_L1 437 #define POWER5p_PME_PM_INST_FROM_L1 438 #define POWER5p_PME_PM_TLBIE_HELD 439 #define POWER5p_PME_PM_DC_PREF_OUT_OF_STREAMS 440 #define POWER5p_PME_PM_MRK_DATA_FROM_L25_MOD_CYC 441 #define POWER5p_PME_PM_MRK_LSU1_FLUSH_SRQ 442 #define POWER5p_PME_PM_MEM_RQ_DISP_Q0to3 443 #define POWER5p_PME_PM_ST_REF_L1_LSU1 444 #define POWER5p_PME_PM_MRK_LD_MISS_L1 445 #define POWER5p_PME_PM_L1_WRITE_CYC 446 #define POWER5p_PME_PM_L2SC_ST_REQ 447 #define POWER5p_PME_PM_CMPLU_STALL_FDIV 448 #define POWER5p_PME_PM_THRD_SEL_OVER_CLB_EMPTY 449 #define POWER5p_PME_PM_BR_MPRED_CR 450 #define POWER5p_PME_PM_L3SB_MOD_TAG 451 #define POWER5p_PME_PM_MRK_DATA_FROM_L2MISS 452 #define POWER5p_PME_PM_LSU_REJECT_SRQ 453 #define POWER5p_PME_PM_LD_MISS_L1 454 #define POWER5p_PME_PM_INST_FROM_PREF 455 #define POWER5p_PME_PM_STCX_PASS 456 #define POWER5p_PME_PM_DC_INV_L2 457 #define POWER5p_PME_PM_LSU_SRQ_FULL_CYC 458 #define POWER5p_PME_PM_FPU_FIN 459 #define POWER5p_PME_PM_LSU_SRQ_STFWD 460 #define POWER5p_PME_PM_L2SA_SHR_MOD 461 #define POWER5p_PME_PM_0INST_CLB_CYC 462 #define POWER5p_PME_PM_FXU0_FIN 463 #define POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_RC_FULL 464 #define POWER5p_PME_PM_THRD_GRP_CMPL_BOTH_CYC 465 #define POWER5p_PME_PM_PMC5_OVERFLOW 466 #define POWER5p_PME_PM_FPU0_FDIV 467 #define POWER5p_PME_PM_PTEG_FROM_L375_SHR 468 #define POWER5p_PME_PM_HV_CYC 469 #define POWER5p_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY 470 #define POWER5p_PME_PM_THRD_PRIO_DIFF_0_CYC 471 #define POWER5p_PME_PM_LR_CTR_MAP_FULL_CYC 472 #define POWER5p_PME_PM_L3SB_SHR_INV 473 #define POWER5p_PME_PM_DATA_FROM_RMEM 474 #define POWER5p_PME_PM_DATA_FROM_L275_MOD 475 #define POWER5p_PME_PM_LSU0_REJECT_SRQ 476 #define POWER5p_PME_PM_LSU1_DERAT_MISS 477 #define POWER5p_PME_PM_MRK_LSU_FIN 478 #define POWER5p_PME_PM_DTLB_MISS_16M 479 #define POWER5p_PME_PM_LSU0_FLUSH_UST 480 #define POWER5p_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY 481 #define POWER5p_PME_PM_L2SC_MOD_TAG 482 static const pme_power_entry_t power5p_pe[] = { [ POWER5p_PME_PM_LSU_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU_REJECT_RELOAD_CDF", .pme_code = 0x2c4090, .pme_short_desc = "LSU reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0x20e7, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "FPU1 has executed a single precision instruction.", }, [ POWER5p_PME_PM_L3SB_REF ] = { .pme_name = "PM_L3SB_REF", .pme_code = 0x701c4, .pme_short_desc = "L3 slice B references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_3or4_CYC", .pme_code = 0x430e5, .pme_short_desc = "Cycles thread priority difference is 3 or 4", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 3 or 4.", }, [ POWER5p_PME_PM_INST_FROM_L275_SHR ] = { .pme_name = "PM_INST_FROM_L275_SHR", .pme_code = 0x322096, .pme_short_desc = "Instruction fetched from L2.75 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (T) data from the L2 on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L375_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L375_MOD", .pme_code = 0x1c70a7, .pme_short_desc = "Marked data loaded from L3.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on a different module than this processor is located due to a marked load.", }, [ POWER5p_PME_PM_DTLB_MISS_4K ] = { .pme_name = "PM_DTLB_MISS_4K", .pme_code = 0x1c208d, .pme_short_desc = "Data TLB miss for 4K page", .pme_long_desc = "Data TLB references to 4KB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_CLB_FULL_CYC ] = { .pme_name = "PM_CLB_FULL_CYC", .pme_code = 0x220e5, .pme_short_desc = "Cycles CLB full", .pme_long_desc = "Cycles when both thread's CLB is full.", }, [ POWER5p_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x100003, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ POWER5p_PME_PM_LSU_FLUSH_LRQ_FULL ] = { .pme_name = "PM_LSU_FLUSH_LRQ_FULL", .pme_code = 0x320e7, .pme_short_desc = "Flush caused by LRQ full", .pme_long_desc = "This thread was flushed at dispatch because its Load Request Queue was full. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L275_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L275_SHR", .pme_code = 0x3c7097, .pme_short_desc = "Marked data loaded from L2.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T) data from the L2 on a different module than this processor is located due to a marked load.", }, [ POWER5p_PME_PM_1INST_CLB_CYC ] = { .pme_name = "PM_1INST_CLB_CYC", .pme_code = 0x400c1, .pme_short_desc = "Cycles 1 instruction in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_MEM_SPEC_RD_CANCEL ] = { .pme_name = "PM_MEM_SPEC_RD_CANCEL", .pme_code = 0x721e6, .pme_short_desc = "Speculative memory read cancelled", .pme_long_desc = "Speculative memory read cancelled (i.e. cresp = sourced by L2/L3)", }, [ POWER5p_PME_PM_MRK_DTLB_MISS_16M ] = { .pme_name = "PM_MRK_DTLB_MISS_16M", .pme_code = 0x3c608d, .pme_short_desc = "Marked Data TLB misses for 16M page", .pme_long_desc = "Marked Data TLB misses for 16M page", }, [ POWER5p_PME_PM_FPU_FDIV ] = { .pme_name = "PM_FPU_FDIV", .pme_code = 0x100088, .pme_short_desc = "FPU executed FDIV instruction", .pme_long_desc = "The floating point unit has executed a divide instruction. This could be fdiv, fdivs, fdiv., fdivs.. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x102090, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0xc1, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "The floating point unit has executed a multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_SLB_MISS ] = { .pme_name = "PM_SLB_MISS", .pme_code = 0x280088, .pme_short_desc = "SLB misses", .pme_long_desc = "Total of all Segment Lookaside Buffer (SLB) misses, Instructions + Data.", }, [ POWER5p_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0xc00c6, .pme_short_desc = "LSU1 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5p_PME_PM_L2SA_ST_HIT ] = { .pme_name = "PM_L2SA_ST_HIT", .pme_code = 0x733e0, .pme_short_desc = "L2 slice A store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A, B, and C.", }, [ POWER5p_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x800c4, .pme_short_desc = "Data TLB misses", .pme_long_desc = "Data TLB misses, all page sizes.", }, [ POWER5p_PME_PM_BR_PRED_TA ] = { .pme_name = "PM_BR_PRED_TA", .pme_code = 0x230e3, .pme_short_desc = "A conditional branch was predicted, target prediction", .pme_long_desc = "The target address of a branch instruction was predicted.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L375_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L375_MOD_CYC", .pme_code = 0x4c70a7, .pme_short_desc = "Marked load latency from L3.75 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_CMPLU_STALL_FXU ] = { .pme_name = "PM_CMPLU_STALL_FXU", .pme_code = 0x211099, .pme_short_desc = "Completion stall caused by FXU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point instruction.", }, [ POWER5p_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x400003, .pme_short_desc = "External interrupts", .pme_long_desc = "An interrupt due to an external exception occurred", }, [ POWER5p_PME_PM_MRK_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_LRQ", .pme_code = 0x810c6, .pme_short_desc = "LSU1 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5p_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x200003, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ POWER5p_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0xc50c4, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed by LSU1", }, [ POWER5p_PME_PM_FAB_CMD_ISSUED ] = { .pme_name = "PM_FAB_CMD_ISSUED", .pme_code = 0x700c7, .pme_short_desc = "Fabric command issued", .pme_long_desc = "Incremented when a chip issues a command on its SnoopA address bus. Each of the two address busses (SnoopA and SnoopB) is capable of one transaction per fabric cycle (one fabric cycle = 2 cpu cycles in normal 2:1 mode), but each chip can only drive the SnoopA bus, and can only drive one transaction every two fabric cycles (i.e., every four cpu cycles). In MCM-based systems, two chips interleave their accesses to each of the two fabric busses (SnoopA, SnoopB) to reach a peak capability of one transaction per cpu clock cycle. The two chips that drive SnoopB are wired so that the chips refer to the bus as SnoopA but it is connected to the other two chips as SnoopB. Note that this event will only be recorded by the FBC on the chip that sourced the operation. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5p_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0xc60e1, .pme_short_desc = "LSU0 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER5p_PME_PM_CR_MAP_FULL_CYC ] = { .pme_name = "PM_CR_MAP_FULL_CYC", .pme_code = 0x100c4, .pme_short_desc = "Cycles CR logical operation mapper full", .pme_long_desc = "The Conditional Register mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5p_PME_PM_MRK_LSU0_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU0_FLUSH_ULD", .pme_code = 0x810c1, .pme_short_desc = "LSU0 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5p_PME_PM_LSU_FLUSH_SRQ_FULL ] = { .pme_name = "PM_LSU_FLUSH_SRQ_FULL", .pme_code = 0x330e0, .pme_short_desc = "Flush caused by SRQ full", .pme_long_desc = "This thread was flushed at dispatch because its Store Request Queue was full. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5p_PME_PM_MEM_RQ_DISP_Q16to19 ] = { .pme_name = "PM_MEM_RQ_DISP_Q16to19", .pme_code = 0x727e6, .pme_short_desc = "Memory read queue dispatched to queues 16-19", .pme_long_desc = "A memory operation was dispatched to read queue 16,17,18 or 19. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FLUSH_IMBAL ] = { .pme_name = "PM_FLUSH_IMBAL", .pme_code = 0x330e3, .pme_short_desc = "Flush caused by thread GCT imbalance", .pme_long_desc = "This thread has been flushed at dispatch because it is stalled and a GCT imbalance exists. GCT thresholds are set in the TSCR register. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus3or4_CYC", .pme_code = 0x430e1, .pme_short_desc = "Cycles thread priority difference is -3 or -4", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 3 or 4.", }, [ POWER5p_PME_PM_DATA_FROM_L35_MOD ] = { .pme_name = "PM_DATA_FROM_L35_MOD", .pme_code = 0x2c309e, .pme_short_desc = "Data loaded from L3.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_MEM_HI_PRIO_WR_CMPL ] = { .pme_name = "PM_MEM_HI_PRIO_WR_CMPL", .pme_code = 0x726e6, .pme_short_desc = "High priority write completed", .pme_long_desc = "A memory write, which was upgraded to high priority, completed. Writes can be upgraded to high priority to ensure that read traffic does not lock out writes. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU1_FDIV ] = { .pme_name = "PM_FPU1_FDIV", .pme_code = 0xc4, .pme_short_desc = "FPU1 executed FDIV instruction", .pme_long_desc = "FPU1 has executed a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER5p_PME_PM_MEM_RQ_DISP ] = { .pme_name = "PM_MEM_RQ_DISP", .pme_code = 0x701c6, .pme_short_desc = "Memory read queue dispatched", .pme_long_desc = "A memory read was dispatched. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU0_FRSP_FCONV ] = { .pme_name = "PM_FPU0_FRSP_FCONV", .pme_code = 0x10c1, .pme_short_desc = "FPU0 executed FRSP or FCONV instructions", .pme_long_desc = "FPU0 has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_LWSYNC_HELD ] = { .pme_name = "PM_LWSYNC_HELD", .pme_code = 0x130e0, .pme_short_desc = "LWSYNC held at dispatch", .pme_long_desc = "Cycles a LWSYNC instruction was held at dispatch. LWSYNC instructions are held at dispatch until all previous loads are done and all previous stores have issued. LWSYNC enters the Store Request Queue and is sent to the storage subsystem but does not wait for a response.", }, [ POWER5p_PME_PM_FXU_FIN ] = { .pme_name = "PM_FXU_FIN", .pme_code = 0x313088, .pme_short_desc = "FXU produced a result", .pme_long_desc = "The fixed point unit (Unit 0 + Unit 1) finished an instruction. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x800c5, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve.", }, [ POWER5p_PME_PM_DATA_FROM_L275_SHR ] = { .pme_name = "PM_DATA_FROM_L275_SHR", .pme_code = 0x3c3097, .pme_short_desc = "Data loaded from L2.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T) data from the L2 on a different module than this processor is located due to a demand load.", }, [ POWER5p_PME_PM_FXLS1_FULL_CYC ] = { .pme_name = "PM_FXLS1_FULL_CYC", .pme_code = 0x110c4, .pme_short_desc = "Cycles FXU1/LS1 queue full", .pme_long_desc = "The issue queue that feeds the Fixed Point unit 1 / Load Store Unit 1 is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_THRD_SEL_T0 ] = { .pme_name = "PM_THRD_SEL_T0", .pme_code = 0x410c0, .pme_short_desc = "Decode selected thread 0", .pme_long_desc = "Thread selection picked thread 0 for decode.", }, [ POWER5p_PME_PM_PTEG_RELOAD_VALID ] = { .pme_name = "PM_PTEG_RELOAD_VALID", .pme_code = 0x830e4, .pme_short_desc = "PTEG reload valid", .pme_long_desc = "A Page Table Entry was loaded into the TLB.", }, [ POWER5p_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x820e6, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ POWER5p_PME_PM_LSU_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU_LMQ_LHR_MERGE", .pme_code = 0xc70e5, .pme_short_desc = "LMQ LHR merges", .pme_long_desc = "A data cache miss occurred for the same real cache line address as an earlier request already in the Load Miss Queue and was merged into the LMQ entry.", }, [ POWER5p_PME_PM_2INST_CLB_CYC ] = { .pme_name = "PM_2INST_CLB_CYC", .pme_code = 0x400c2, .pme_short_desc = "Cycles 2 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_FAB_PNtoVN_DIRECT ] = { .pme_name = "PM_FAB_PNtoVN_DIRECT", .pme_code = 0x723e7, .pme_short_desc = "PN to VN beat went straight to its destination", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and passes it through to the outbound VN bus without going into a sidecar. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_PTEG_FROM_L2MISS ] = { .pme_name = "PM_PTEG_FROM_L2MISS", .pme_code = 0x38309b, .pme_short_desc = "PTEG loaded from L2 miss", .pme_long_desc = "A Page Table Entry was loaded into the TLB but not from the local L2.", }, [ POWER5p_PME_PM_CMPLU_STALL_LSU ] = { .pme_name = "PM_CMPLU_STALL_LSU", .pme_code = 0x211098, .pme_short_desc = "Completion stall caused by LSU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a load/store instruction.", }, [ POWER5p_PME_PM_MRK_DSLB_MISS ] = { .pme_name = "PM_MRK_DSLB_MISS", .pme_code = 0xc50c7, .pme_short_desc = "Marked Data SLB misses", .pme_long_desc = "A Data SLB miss was caused by a marked instruction.", }, [ POWER5p_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0x1c0088, .pme_short_desc = "LRQ unaligned load flushes", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1). Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_PTEG_FROM_LMEM ] = { .pme_name = "PM_PTEG_FROM_LMEM", .pme_code = 0x283087, .pme_short_desc = "PTEG loaded from local memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to the same module this proccessor is located on.", }, [ POWER5p_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x200005, .pme_short_desc = "Marked instruction BRU processing finished", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_MEM_WQ_DISP_WRITE ] = { .pme_name = "PM_MEM_WQ_DISP_WRITE", .pme_code = 0x703c6, .pme_short_desc = "Memory write queue dispatched due to write", .pme_long_desc = "A memory write was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L275_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L275_MOD_CYC", .pme_code = 0x4c70a3, .pme_short_desc = "Marked load latency from L2.75 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_LSU1_NCLD ] = { .pme_name = "PM_LSU1_NCLD", .pme_code = 0xc50c5, .pme_short_desc = "LSU1 non-cacheable loads", .pme_long_desc = "A non-cacheable load was executed by Unit 0.", }, [ POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5p_PME_PM_SNOOP_PW_RETRY_WQ_PWQ ] = { .pme_name = "PM_SNOOP_PW_RETRY_WQ_PWQ", .pme_code = 0x717c6, .pme_short_desc = "Snoop partial-write retry due to collision with active write or partial-write queue", .pme_long_desc = "A snoop request for a partial write to memory was retried because it matched the cache line of an active write or partial write. When this happens the snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU1_FULL_CYC ] = { .pme_name = "PM_FPU1_FULL_CYC", .pme_code = 0x100c7, .pme_short_desc = "Cycles FPU1 issue queue full", .pme_long_desc = "The issue queue for FPU1 cannot accept any more instructions. Dispatch to this issue queue is stopped", }, [ POWER5p_PME_PM_FPR_MAP_FULL_CYC ] = { .pme_name = "PM_FPR_MAP_FULL_CYC", .pme_code = 0x100c1, .pme_short_desc = "Cycles FPR mapper full", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5p_PME_PM_L3SA_ALL_BUSY ] = { .pme_name = "PM_L3SA_ALL_BUSY", .pme_code = 0x721e3, .pme_short_desc = "L3 slice A active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5p_PME_PM_3INST_CLB_CYC ] = { .pme_name = "PM_3INST_CLB_CYC", .pme_code = 0x400c3, .pme_short_desc = "Cycles 3 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_MEM_PWQ_DISP_Q2or3 ] = { .pme_name = "PM_MEM_PWQ_DISP_Q2or3", .pme_code = 0x734e6, .pme_short_desc = "Memory partial-write queue dispatched to Write Queue 2 or 3", .pme_long_desc = "Memory partial-write queue dispatched to Write Queue 2 or 3. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_L2SA_SHR_INV ] = { .pme_name = "PM_L2SA_SHR_INV", .pme_code = 0x710c0, .pme_short_desc = "L2 slice A transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5p_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x30000b, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ POWER5p_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c0, .pme_short_desc = "L2 slice A RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5p_PME_PM_THRD_SEL_OVER_GCT_IMBAL ] = { .pme_name = "PM_THRD_SEL_OVER_GCT_IMBAL", .pme_code = 0x410c4, .pme_short_desc = "Thread selection overrides caused by GCT imbalance", .pme_long_desc = "Thread selection was overridden because of a GCT imbalance.", }, [ POWER5p_PME_PM_FPU_FSQRT ] = { .pme_name = "PM_FPU_FSQRT", .pme_code = 0x200090, .pme_short_desc = "FPU executed FSQRT instruction", .pme_long_desc = "The floating point unit has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x20000a, .pme_short_desc = "PMC1 Overflow", .pme_long_desc = "Overflows from PMC1 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_MRK_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_LRQ", .pme_code = 0x810c2, .pme_short_desc = "LSU0 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5p_PME_PM_L3SC_SNOOP_RETRY ] = { .pme_name = "PM_L3SC_SNOOP_RETRY", .pme_code = 0x731e5, .pme_short_desc = "L3 slice C snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5p_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x800c7, .pme_short_desc = "Cycles doing data tablewalks", .pme_long_desc = "Cycles a translation tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ POWER5p_PME_PM_THRD_PRIO_6_CYC ] = { .pme_name = "PM_THRD_PRIO_6_CYC", .pme_code = 0x420e5, .pme_short_desc = "Cycles thread running at priority level 6", .pme_long_desc = "Cycles this thread was running at priority level 6.", }, [ POWER5p_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x1010a8, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "The floating point unit has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_FAB_M1toP1_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_M1toP1_SIDECAR_EMPTY", .pme_code = 0x702c7, .pme_short_desc = "M1 to P1 sidecar empty", .pme_long_desc = "Fabric cycles when the Minus-1 hip/hop sidecars (sidecars for chip to chip data transfer) are empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_RMEM ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM", .pme_code = 0x1c70a1, .pme_short_desc = "Marked data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to a different module than this proccessor is located on.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L35_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L35_MOD_CYC", .pme_code = 0x4c70a6, .pme_short_desc = "Marked load latency from L3.5 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_MEM_PWQ_DISP ] = { .pme_name = "PM_MEM_PWQ_DISP", .pme_code = 0x704c6, .pme_short_desc = "Memory partial-write queue dispatched", .pme_long_desc = "Number of Partial Writes dispatched. The MC provides resources to gather partial cacheline writes (Partial line DMA writes & CI-stores) to up to four different cachelines at a time. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FAB_P1toM1_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_P1toM1_SIDECAR_EMPTY", .pme_code = 0x701c7, .pme_short_desc = "P1 to M1 sidecar empty", .pme_long_desc = "Fabric cycles when the Plus-1 hip/hop sidecars (sidecars for chip to chip data transfer) are empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_LD_MISS_L1_LSU0", .pme_code = 0xc10c2, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by unit 0.", }, [ POWER5p_PME_PM_SNOOP_PARTIAL_RTRY_QFULL ] = { .pme_name = "PM_SNOOP_PARTIAL_RTRY_QFULL", .pme_code = 0x730e6, .pme_short_desc = "Snoop partial write retry due to partial-write queues full", .pme_long_desc = "A snoop request for a partial write to memory was retried because the write queues that handle partial writes were full. When this happens the active writes are changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU1_STALL3 ] = { .pme_name = "PM_FPU1_STALL3", .pme_code = 0x20e5, .pme_short_desc = "FPU1 stalled in pipe3", .pme_long_desc = "FPU1 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always).", }, [ POWER5p_PME_PM_GCT_USAGE_80to99_CYC ] = { .pme_name = "PM_GCT_USAGE_80to99_CYC", .pme_code = 0x30001f, .pme_short_desc = "Cycles GCT 80-99% full", .pme_long_desc = "Cycles when the Global Completion Table has between 80% and 99% of its slots used. The GCT has 20 entries shared between threads", }, [ POWER5p_PME_PM_WORK_HELD ] = { .pme_name = "PM_WORK_HELD", .pme_code = 0x40000c, .pme_short_desc = "Work held", .pme_long_desc = "RAS Unit has signaled completion to stop and there are groups waiting to complete", }, [ POWER5p_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x100009, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of PowerPC instructions that completed.", }, [ POWER5p_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0xc00c5, .pme_short_desc = "LSU1 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4K boundary)", }, [ POWER5p_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x100012, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle.", }, [ POWER5p_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0xc00c0, .pme_short_desc = "LSU0 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1)", }, [ POWER5p_PME_PM_LSU1_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_LMQ_FULL", .pme_code = 0xc40c5, .pme_short_desc = "LSU1 reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER5p_PME_PM_GRP_DISP_REJECT ] = { .pme_name = "PM_GRP_DISP_REJECT", .pme_code = 0x120e4, .pme_short_desc = "Group dispatch rejected", .pme_long_desc = "A group that previously attempted dispatch was rejected.", }, [ POWER5p_PME_PM_PTEG_FROM_L25_SHR ] = { .pme_name = "PM_PTEG_FROM_L25_SHR", .pme_code = 0x183097, .pme_short_desc = "PTEG loaded from L2.5 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_L2SA_MOD_INV ] = { .pme_name = "PM_L2SA_MOD_INV", .pme_code = 0x730e0, .pme_short_desc = "L2 slice A transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_FAB_CMD_RETRIED ] = { .pme_name = "PM_FAB_CMD_RETRIED", .pme_code = 0x710c7, .pme_short_desc = "Fabric command retried", .pme_long_desc = "Incremented when a command issued by a chip on its SnoopA address bus is retried for any reason. The overwhelming majority of retries are due to running out of memory controller queues but retries can also be caused by trying to reference addresses that are in a transient cache state -- e.g. a line is transient after issuing a DCLAIM instruction to a shared line but before the associated store completes. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_L3SA_SHR_INV ] = { .pme_name = "PM_L3SA_SHR_INV", .pme_code = 0x710c3, .pme_short_desc = "L3 slice A transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5p_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c1, .pme_short_desc = "L2 slice B RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5p_PME_PM_PTEG_FROM_L375_MOD ] = { .pme_name = "PM_PTEG_FROM_L375_MOD", .pme_code = 0x1830a7, .pme_short_desc = "PTEG loaded from L3.75 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L3 of a chip on a different module than this processor is located, due to a demand load.", }, [ POWER5p_PME_PM_MRK_LSU1_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU1_FLUSH_UST", .pme_code = 0x810c5, .pme_short_desc = "LSU1 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ POWER5p_PME_PM_BR_ISSUED ] = { .pme_name = "PM_BR_ISSUED", .pme_code = 0x230e4, .pme_short_desc = "Branches issued", .pme_long_desc = "A branch instruction was issued to the branch unit. A branch that was incorrectly predicted may issue and execute multiple times.", }, [ POWER5p_PME_PM_MRK_GRP_BR_REDIR ] = { .pme_name = "PM_MRK_GRP_BR_REDIR", .pme_code = 0x212091, .pme_short_desc = "Group experienced marked branch redirect", .pme_long_desc = "A group containing a marked (sampled) instruction experienced a branch redirect.", }, [ POWER5p_PME_PM_EE_OFF ] = { .pme_name = "PM_EE_OFF", .pme_code = 0x130e3, .pme_short_desc = "Cycles MSR(EE) bit off", .pme_long_desc = "Cycles MSR(EE) bit was off indicating that interrupts due to external exceptions were masked.", }, [ POWER5p_PME_PM_IERAT_XLATE_WR_LP ] = { .pme_name = "PM_IERAT_XLATE_WR_LP", .pme_code = 0x210c6, .pme_short_desc = "Large page translation written to ierat", .pme_long_desc = "An entry was written into the IERAT as a result of an IERAT miss. This event can be used to count IERAT misses. An ERAT miss that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed.", }, [ POWER5p_PME_PM_DTLB_REF_64K ] = { .pme_name = "PM_DTLB_REF_64K", .pme_code = 0x2c2086, .pme_short_desc = "Data TLB reference for 64K page", .pme_long_desc = "Data TLB references for 64KB pages. Includes hits + misses.", }, [ POWER5p_PME_PM_MEM_RQ_DISP_Q4to7 ] = { .pme_name = "PM_MEM_RQ_DISP_Q4to7", .pme_code = 0x712c6, .pme_short_desc = "Memory read queue dispatched to queues 4-7", .pme_long_desc = "A memory operation was dispatched to read queue 4,5,6 or 7. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_MEM_FAST_PATH_RD_DISP ] = { .pme_name = "PM_MEM_FAST_PATH_RD_DISP", .pme_code = 0x731e6, .pme_short_desc = "Fast path memory read dispatched", .pme_long_desc = "Fast path memory read dispatched", }, [ POWER5p_PME_PM_INST_FROM_L3 ] = { .pme_name = "PM_INST_FROM_L3", .pme_code = 0x12208d, .pme_short_desc = "Instruction fetched from L3", .pme_long_desc = "An instruction fetch group was fetched from the local L3. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x800c0, .pme_short_desc = "Instruction TLB misses", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ POWER5p_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x400012, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy.", }, [ POWER5p_PME_PM_DTLB_REF_4K ] = { .pme_name = "PM_DTLB_REF_4K", .pme_code = 0x1c2086, .pme_short_desc = "Data TLB reference for 4K page", .pme_long_desc = "Data TLB references for 4KB pages. Includes hits + misses.", }, [ POWER5p_PME_PM_FXLS_FULL_CYC ] = { .pme_name = "PM_FXLS_FULL_CYC", .pme_code = 0x1110a8, .pme_short_desc = "Cycles FXLS queue is full", .pme_long_desc = "Cycles when the issue queues for one or both FXU/LSU units is full. Use with caution since this is the sum of cycles when Unit 0 was full plus Unit 1 full. It does not indicate when both units were full.", }, [ POWER5p_PME_PM_GRP_DISP_VALID ] = { .pme_name = "PM_GRP_DISP_VALID", .pme_code = 0x120e3, .pme_short_desc = "Group dispatch valid", .pme_long_desc = "A group is available for dispatch. This does not mean it was successfully dispatched.", }, [ POWER5p_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0x2c0088, .pme_short_desc = "SRQ unaligned store flushes", .pme_long_desc = "A store was flushed because it was unaligned (crossed a 4K boundary). Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x130e6, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_THRD_PRIO_4_CYC ] = { .pme_name = "PM_THRD_PRIO_4_CYC", .pme_code = 0x420e3, .pme_short_desc = "Cycles thread running at priority level 4", .pme_long_desc = "Cycles this thread was running at priority level 4.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L35_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L35_MOD", .pme_code = 0x2c709e, .pme_short_desc = "Marked data loaded from L3.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5p_PME_PM_4INST_CLB_CYC ] = { .pme_name = "PM_4INST_CLB_CYC", .pme_code = 0x400c4, .pme_short_desc = "Cycles 4 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_MRK_DTLB_REF_16M ] = { .pme_name = "PM_MRK_DTLB_REF_16M", .pme_code = 0x3c6086, .pme_short_desc = "Marked Data TLB reference for 16M page", .pme_long_desc = "Data TLB references by a marked instruction for 16MB pages.", }, [ POWER5p_PME_PM_INST_FROM_L375_MOD ] = { .pme_name = "PM_INST_FROM_L375_MOD", .pme_code = 0x42209d, .pme_short_desc = "Instruction fetched from L3.75 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L3 of a chip on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x300013, .pme_short_desc = "Group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c2, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_FPU1_1FLOP ] = { .pme_name = "PM_FPU1_1FLOP", .pme_code = 0xc7, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5p_PME_PM_FPU_FRSP_FCONV ] = { .pme_name = "PM_FPU_FRSP_FCONV", .pme_code = 0x2010a8, .pme_short_desc = "FPU executed FRSP or FCONV instructions", .pme_long_desc = "The floating point unit has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_L3SC_REF ] = { .pme_name = "PM_L3SC_REF", .pme_code = 0x701c5, .pme_short_desc = "L3 slice C references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice.", }, [ POWER5p_PME_PM_5INST_CLB_CYC ] = { .pme_name = "PM_5INST_CLB_CYC", .pme_code = 0x400c5, .pme_short_desc = "Cycles 5 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_THRD_L2MISS_BOTH_CYC ] = { .pme_name = "PM_THRD_L2MISS_BOTH_CYC", .pme_code = 0x410c7, .pme_short_desc = "Cycles both threads in L2 misses", .pme_long_desc = "Cycles that both threads have L2 miss pending. If only one thread has a L2 miss pending the other thread is given priority at decode. If both threads have L2 miss pending decode priority is determined by the number of GCT entries used.", }, [ POWER5p_PME_PM_MEM_PW_GATH ] = { .pme_name = "PM_MEM_PW_GATH", .pme_code = 0x714c6, .pme_short_desc = "Memory partial-write gathered", .pme_long_desc = "Two or more partial-writes have been merged into a single memory write. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_DTLB_REF_16G ] = { .pme_name = "PM_DTLB_REF_16G", .pme_code = 0x4c2086, .pme_short_desc = "Data TLB reference for 16G page", .pme_long_desc = "Data TLB references for 16GB pages. Includes hits + misses.", }, [ POWER5p_PME_PM_FAB_DCLAIM_ISSUED ] = { .pme_name = "PM_FAB_DCLAIM_ISSUED", .pme_code = 0x720e7, .pme_short_desc = "dclaim issued", .pme_long_desc = "A DCLAIM command was issued. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_FAB_PNtoNN_SIDECAR ] = { .pme_name = "PM_FAB_PNtoNN_SIDECAR", .pme_code = 0x713c7, .pme_short_desc = "PN to NN beat went to sidecar first", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and forwards it on to the outbound NN data bus after going into a sidecar first. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5p_PME_PM_GRP_IC_MISS ] = { .pme_name = "PM_GRP_IC_MISS", .pme_code = 0x120e7, .pme_short_desc = "Group experienced I cache miss", .pme_long_desc = "Number of groups, counted at dispatch, that have encountered an icache miss redirect. Every group constructed from a fetch group that missed the instruction cache will count.", }, [ POWER5p_PME_PM_INST_FROM_L35_SHR ] = { .pme_name = "PM_INST_FROM_L35_SHR", .pme_code = 0x12209d, .pme_short_desc = "Instruction fetched from L3.5 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L3 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0xc30e7, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The Load Miss Queue was full.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L2_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2_CYC", .pme_code = 0x2c70a0, .pme_short_desc = "Marked load latency from L2", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0x830e5, .pme_short_desc = "SRQ sync duration", .pme_long_desc = "Cycles that a sync instruction is active in the Store Request Queue.", }, [ POWER5p_PME_PM_LSU0_BUSY_REJECT ] = { .pme_name = "PM_LSU0_BUSY_REJECT", .pme_code = 0xc20e1, .pme_short_desc = "LSU0 busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions.", }, [ POWER5p_PME_PM_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU_REJECT_ERAT_MISS", .pme_code = 0x1c4090, .pme_short_desc = "LSU reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions due to an ERAT miss. Combined unit 0 + 1. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_RMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM_CYC", .pme_code = 0x4c70a1, .pme_short_desc = "Marked load latency from remote memory", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_DATA_FROM_L375_SHR ] = { .pme_name = "PM_DATA_FROM_L375_SHR", .pme_code = 0x3c309e, .pme_short_desc = "Data loaded from L3.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on a different module than this processor is located due to a demand load.", }, [ POWER5p_PME_PM_PTEG_FROM_L25_MOD ] = { .pme_name = "PM_PTEG_FROM_L25_MOD", .pme_code = 0x283097, .pme_short_desc = "PTEG loaded from L2.5 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_FPU0_FMOV_FEST ] = { .pme_name = "PM_FPU0_FMOV_FEST", .pme_code = 0x10c0, .pme_short_desc = "FPU0 executed FMOV or FEST instructions", .pme_long_desc = "FPU0 has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5p_PME_PM_THRD_PRIO_7_CYC ] = { .pme_name = "PM_THRD_PRIO_7_CYC", .pme_code = 0x420e6, .pme_short_desc = "Cycles thread running at priority level 7", .pme_long_desc = "Cycles this thread was running at priority level 7.", }, [ POWER5p_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0xc00c7, .pme_short_desc = "LSU1 SRQ lhs flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5p_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0xc10c0, .pme_short_desc = "LSU0 L1 D cache load references", .pme_long_desc = "Load references to Level 1 Data Cache, by unit 0.", }, [ POWER5p_PME_PM_L2SC_RCST_DISP ] = { .pme_name = "PM_L2SC_RCST_DISP", .pme_code = 0x702c2, .pme_short_desc = "L2 slice C RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5p_PME_PM_CMPLU_STALL_DIV ] = { .pme_name = "PM_CMPLU_STALL_DIV", .pme_code = 0x411099, .pme_short_desc = "Completion stall caused by DIV instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point divide instruction. This is a subset of PM_CMPLU_STALL_FXU.", }, [ POWER5p_PME_PM_MEM_RQ_DISP_Q12to15 ] = { .pme_name = "PM_MEM_RQ_DISP_Q12to15", .pme_code = 0x732e6, .pme_short_desc = "Memory read queue dispatched to queues 12-15", .pme_long_desc = "A memory operation was dispatched to read queue 12,13,14 or 15. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_INST_FROM_L375_SHR ] = { .pme_name = "PM_INST_FROM_L375_SHR", .pme_code = 0x32209d, .pme_short_desc = "Instruction fetched from L3.75 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L3 of a chip on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x2c10a8, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Store references to the Data Cache. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_L3SB_ALL_BUSY ] = { .pme_name = "PM_L3SB_ALL_BUSY", .pme_code = 0x721e4, .pme_short_desc = "L3 slice B active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5p_PME_PM_FAB_P1toVNorNN_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_P1toVNorNN_SIDECAR_EMPTY", .pme_code = 0x711c7, .pme_short_desc = "P1 to VN/NN sidecar empty", .pme_long_desc = "Fabric cycles when the Plus-1 jump sidecar (sidecars for mcm to mcm data transfer) is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L275_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L275_SHR_CYC", .pme_code = 0x2c70a3, .pme_short_desc = "Marked load latency from L2.75 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_FAB_HOLDtoNN_EMPTY ] = { .pme_name = "PM_FAB_HOLDtoNN_EMPTY", .pme_code = 0x722e7, .pme_short_desc = "Hold buffer to NN empty", .pme_long_desc = "Fabric cyles when the Next Node out hold-buffers are emtpy. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_DATA_FROM_LMEM ] = { .pme_name = "PM_DATA_FROM_LMEM", .pme_code = 0x2c3087, .pme_short_desc = "Data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to the same module this proccessor is located on.", }, [ POWER5p_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x100005, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. Gating by the run latch filters out the idle loop.", }, [ POWER5p_PME_PM_PTEG_FROM_RMEM ] = { .pme_name = "PM_PTEG_FROM_RMEM", .pme_code = 0x1830a1, .pme_short_desc = "PTEG loaded from remote memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to a different module than this proccessor is located on.", }, [ POWER5p_PME_PM_L2SC_RCLD_DISP ] = { .pme_name = "PM_L2SC_RCLD_DISP", .pme_code = 0x701c2, .pme_short_desc = "L2 slice C RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5p_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0xc60e6, .pme_short_desc = "LRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the LRQ is split between the two threads (16 entries each).", }, [ POWER5p_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0xc50c0, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed by LSU0", }, [ POWER5p_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x40000a, .pme_short_desc = "PMC3 Overflow", .pme_long_desc = "Overflows from PMC3 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_MRK_IMR_RELOAD ] = { .pme_name = "PM_MRK_IMR_RELOAD", .pme_code = 0x820e2, .pme_short_desc = "Marked IMR reloaded", .pme_long_desc = "A DL1 reload occurred due to marked load", }, [ POWER5p_PME_PM_MRK_GRP_TIMEO ] = { .pme_name = "PM_MRK_GRP_TIMEO", .pme_code = 0x40000b, .pme_short_desc = "Marked group completion timeout", .pme_long_desc = "The sampling timeout expired indicating that the previously sampled instruction is no longer in the processor", }, [ POWER5p_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0xc10c3, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_STOP_COMPLETION ] = { .pme_name = "PM_STOP_COMPLETION", .pme_code = 0x300018, .pme_short_desc = "Completion stopped", .pme_long_desc = "RAS Unit has signaled completion to stop", }, [ POWER5p_PME_PM_LSU_BUSY_REJECT ] = { .pme_name = "PM_LSU_BUSY_REJECT", .pme_code = 0x2c2088, .pme_short_desc = "LSU busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions. Combined unit 0 + 1.", }, [ POWER5p_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x800c1, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ POWER5p_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0xf, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ POWER5p_PME_PM_THRD_ONE_RUN_CYC ] = { .pme_name = "PM_THRD_ONE_RUN_CYC", .pme_code = 0x10000b, .pme_short_desc = "One of the threads in run cycles", .pme_long_desc = "At least one thread has set its run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. This event does not respect FCWAIT.", }, [ POWER5p_PME_PM_GRP_BR_REDIR_NONSPEC ] = { .pme_name = "PM_GRP_BR_REDIR_NONSPEC", .pme_code = 0x112091, .pme_short_desc = "Group experienced non-speculative branch redirect", .pme_long_desc = "Number of groups, counted at completion, that have encountered a branch redirect.", }, [ POWER5p_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0xc60e5, .pme_short_desc = "LSU1 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER5p_PME_PM_L3SC_MOD_INV ] = { .pme_name = "PM_L3SC_MOD_INV", .pme_code = 0x730e5, .pme_short_desc = "L3 slice C transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I) Mu|Me are not included since they are formed due to a previous read op Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_L2_PREF ] = { .pme_name = "PM_L2_PREF", .pme_code = 0xc50c3, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "A request to prefetch data into L2 was made", }, [ POWER5p_PME_PM_GCT_NOSLOT_BR_MPRED ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED", .pme_code = 0x41009c, .pme_short_desc = "No slot in GCT caused by branch mispredict", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of a branch misprediction.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x2c7097, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5p_PME_PM_L2SB_ST_REQ ] = { .pme_name = "PM_L2SB_ST_REQ", .pme_code = 0x723e1, .pme_short_desc = "L2 slice B store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_L2SB_MOD_INV ] = { .pme_name = "PM_L2SB_MOD_INV", .pme_code = 0x730e1, .pme_short_desc = "L2 slice B transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0xc70e4, .pme_short_desc = "Marked L1 reload data source valid", .pme_long_desc = "The source information is valid and is for a marked load", }, [ POWER5p_PME_PM_L3SB_HIT ] = { .pme_name = "PM_L3SB_HIT", .pme_code = 0x711c4, .pme_short_desc = "L3 slice B hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 slice", }, [ POWER5p_PME_PM_L2SB_SHR_MOD ] = { .pme_name = "PM_L2SB_SHR_MOD", .pme_code = 0x700c1, .pme_short_desc = "L2 slice B transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x130e7, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles when an interrupt due to an external exception is pending but external exceptions were masked.", }, [ POWER5p_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x100013, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER5p_PME_PM_L2SC_SHR_MOD ] = { .pme_name = "PM_L2SC_SHR_MOD", .pme_code = 0x700c2, .pme_short_desc = "L2 slice C transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x30001a, .pme_short_desc = "PMC6 Overflow", .pme_long_desc = "Overflows from PMC6 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_IC_PREF_INSTALL ] = { .pme_name = "PM_IC_PREF_INSTALL", .pme_code = 0x210c7, .pme_short_desc = "Instruction prefetched installed in prefetch buffer", .pme_long_desc = "A prefetch buffer entry (line) is allocated but the request is not a demand fetch.", }, [ POWER5p_PME_PM_LSU_LRQ_FULL_CYC ] = { .pme_name = "PM_LSU_LRQ_FULL_CYC", .pme_code = 0x110c2, .pme_short_desc = "Cycles LRQ full", .pme_long_desc = "Cycles when the LRQ is full.", }, [ POWER5p_PME_PM_TLB_MISS ] = { .pme_name = "PM_TLB_MISS", .pme_code = 0x180088, .pme_short_desc = "TLB misses", .pme_long_desc = "Total of Data TLB mises + Instruction TLB misses", }, [ POWER5p_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x100c0, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The Global Completion Table is completely full.", }, [ POWER5p_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x200012, .pme_short_desc = "FXU busy", .pme_long_desc = "Cycles when both FXU0 and FXU1 are busy.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L3_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3_CYC", .pme_code = 0x2c70a4, .pme_short_desc = "Marked load latency from L3", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_LSU_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU_REJECT_LMQ_FULL", .pme_code = 0x2c4088, .pme_short_desc = "LSU reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all the eight entries are full, subsequent load instructions are rejected. Combined unit 0 + 1.", }, [ POWER5p_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0xc20e7, .pme_short_desc = "SRQ slot 0 allocated", .pme_long_desc = "SRQ Slot zero was allocated", }, [ POWER5p_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x100014, .pme_short_desc = "Group marked in IDU", .pme_long_desc = "A group was sampled (marked). The group is called a marked group. One instruction within the group is tagged for detailed monitoring. The sampled instruction is called a marked instructions. Events associated with the marked instruction are annotated with the marked term.", }, [ POWER5p_PME_PM_INST_FROM_L25_SHR ] = { .pme_name = "PM_INST_FROM_L25_SHR", .pme_code = 0x122096, .pme_short_desc = "Instruction fetched from L2.5 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (T or SL) data from the L2 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions.", }, [ POWER5p_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x830e7, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated.", }, [ POWER5p_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0x10c7, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "FPU1 finished, produced a result. This only indicates finish, not completion. Floating Point Stores are included in this count but not Floating Point Loads., ,", }, [ POWER5p_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x230e6, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "A branch instruction target was incorrectly predicted. This will result in a branch mispredict flush unless a flush is detected from an older instruction.", }, [ POWER5p_PME_PM_MRK_DTLB_REF_64K ] = { .pme_name = "PM_MRK_DTLB_REF_64K", .pme_code = 0x2c6086, .pme_short_desc = "Marked Data TLB reference for 64K page", .pme_long_desc = "Data TLB references by a marked instruction for 64KB pages.", }, [ POWER5p_PME_PM_RUN_INST_CMPL ] = { .pme_name = "PM_RUN_INST_CMPL", .pme_code = 0x500009, .pme_short_desc = "Run instructions completed", .pme_long_desc = "Number of run instructions completed.", }, [ POWER5p_PME_PM_CRQ_FULL_CYC ] = { .pme_name = "PM_CRQ_FULL_CYC", .pme_code = 0x110c1, .pme_short_desc = "Cycles CR issue queue full", .pme_long_desc = "The issue queue that feeds the Conditional Register unit is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_L2SA_RCLD_DISP ] = { .pme_name = "PM_L2SA_RCLD_DISP", .pme_code = 0x701c0, .pme_short_desc = "L2 slice A RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5p_PME_PM_SNOOP_WR_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_WR_RETRY_QFULL", .pme_code = 0x710c6, .pme_short_desc = "Snoop read retry due to read queue full", .pme_long_desc = "A snoop request for a write to memory was retried because the write queues were full. When this happens the snoop request is retried and the writes in the write reorder queue are changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DTLB_REF_4K ] = { .pme_name = "PM_MRK_DTLB_REF_4K", .pme_code = 0x1c6086, .pme_short_desc = "Marked Data TLB reference for 4K page", .pme_long_desc = "Data TLB references by a marked instruction for 4KB pages.", }, [ POWER5p_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0xc20e6, .pme_short_desc = "SRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the SRQ is split between the two threads (16 entries each).", }, [ POWER5p_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0xc00c2, .pme_short_desc = "LSU0 LRQ flushes", .pme_long_desc = "A load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5p_PME_PM_INST_FROM_L275_MOD ] = { .pme_name = "PM_INST_FROM_L275_MOD", .pme_code = 0x422096, .pme_short_desc = "Instruction fetched from L2.75 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L2 on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x200004, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ POWER5p_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0x820e7, .pme_short_desc = "Larx executed on LSU0", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 (there is no corresponding unit 1 event since larx instructions can only execute on unit 0)", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_5or6_CYC", .pme_code = 0x430e6, .pme_short_desc = "Cycles thread priority difference is 5 or 6", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 5 or 6.", }, [ POWER5p_PME_PM_SNOOP_RETRY_1AHEAD ] = { .pme_name = "PM_SNOOP_RETRY_1AHEAD", .pme_code = 0x725e6, .pme_short_desc = "Snoop retry due to one ahead collision", .pme_long_desc = "Snoop retry due to one ahead collision", }, [ POWER5p_PME_PM_FPU1_FSQRT ] = { .pme_name = "PM_FPU1_FSQRT", .pme_code = 0xc6, .pme_short_desc = "FPU1 executed FSQRT instruction", .pme_long_desc = "FPU1 has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_MRK_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU1", .pme_code = 0x820e4, .pme_short_desc = "LSU1 marked L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by LSU1.", }, [ POWER5p_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x300014, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5p_PME_PM_THRD_PRIO_5_CYC ] = { .pme_name = "PM_THRD_PRIO_5_CYC", .pme_code = 0x420e4, .pme_short_desc = "Cycles thread running at priority level 5", .pme_long_desc = "Cycles this thread was running at priority level 5.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_LMEM ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM", .pme_code = 0x2c7087, .pme_short_desc = "Marked data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to the same module this proccessor is located on.", }, [ POWER5p_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0x800c3, .pme_short_desc = "Snoop TLBIE", .pme_long_desc = "A tlbie was snooped from another processor.", }, [ POWER5p_PME_PM_FPU1_FRSP_FCONV ] = { .pme_name = "PM_FPU1_FRSP_FCONV", .pme_code = 0x10c5, .pme_short_desc = "FPU1 executed FRSP or FCONV instructions", .pme_long_desc = "FPU1 has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_DTLB_MISS_16G ] = { .pme_name = "PM_DTLB_MISS_16G", .pme_code = 0x4c208d, .pme_short_desc = "Data TLB miss for 16G page", .pme_long_desc = "Data TLB references to 16GB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_L3SB_SNOOP_RETRY ] = { .pme_name = "PM_L3SB_SNOOP_RETRY", .pme_code = 0x731e4, .pme_short_desc = "L3 slice B snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5p_PME_PM_FAB_VBYPASS_EMPTY ] = { .pme_name = "PM_FAB_VBYPASS_EMPTY", .pme_code = 0x731e7, .pme_short_desc = "Vertical bypass buffer empty", .pme_long_desc = "Fabric cycles when the Middle Bypass sidecar is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L275_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L275_MOD", .pme_code = 0x1c70a3, .pme_short_desc = "Marked data loaded from L2.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 on a different module than this processor is located due to a marked load.", }, [ POWER5p_PME_PM_L2SB_RCST_DISP ] = { .pme_name = "PM_L2SB_RCST_DISP", .pme_code = 0x702c1, .pme_short_desc = "L2 slice B RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5p_PME_PM_6INST_CLB_CYC ] = { .pme_name = "PM_6INST_CLB_CYC", .pme_code = 0x400c6, .pme_short_desc = "Cycles 6 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_FLUSH ] = { .pme_name = "PM_FLUSH", .pme_code = 0x110c7, .pme_short_desc = "Flushes", .pme_long_desc = "Flushes occurred including LSU and Branch flushes.", }, [ POWER5p_PME_PM_L2SC_MOD_INV ] = { .pme_name = "PM_L2SC_MOD_INV", .pme_code = 0x730e2, .pme_short_desc = "L2 slice C transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x102088, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "The floating point unit has encountered a denormalized operand. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_L3SC_HIT ] = { .pme_name = "PM_L3SC_HIT", .pme_code = 0x711c5, .pme_short_desc = "L3 slice C hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 Slice", }, [ POWER5p_PME_PM_SNOOP_WR_RETRY_RQ ] = { .pme_name = "PM_SNOOP_WR_RETRY_RQ", .pme_code = 0x706c6, .pme_short_desc = "Snoop write/dclaim retry due to collision with active read queue", .pme_long_desc = "A snoop request for a write or dclaim to memory was retried because it matched the cacheline of an active read. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5p_PME_PM_LSU1_REJECT_SRQ ] = { .pme_name = "PM_LSU1_REJECT_SRQ", .pme_code = 0xc40c4, .pme_short_desc = "LSU1 SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue.", }, [ POWER5p_PME_PM_L3SC_ALL_BUSY ] = { .pme_name = "PM_L3SC_ALL_BUSY", .pme_code = 0x721e5, .pme_short_desc = "L3 slice C active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5p_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x220e6, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "An instruction prefetch request has been made.", }, [ POWER5p_PME_PM_MRK_GRP_IC_MISS ] = { .pme_name = "PM_MRK_GRP_IC_MISS", .pme_code = 0x412091, .pme_short_desc = "Group experienced marked I cache miss", .pme_long_desc = "A group containing a marked (sampled) instruction experienced an instruction cache miss.", }, [ POWER5p_PME_PM_GCT_NOSLOT_IC_MISS ] = { .pme_name = "PM_GCT_NOSLOT_IC_MISS", .pme_code = 0x21009c, .pme_short_desc = "No slot in GCT caused by I cache miss", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of an Instruction Cache miss.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x1c708e, .pme_short_desc = "Marked data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a marked load.", }, [ POWER5p_PME_PM_GCT_NOSLOT_SRQ_FULL ] = { .pme_name = "PM_GCT_NOSLOT_SRQ_FULL", .pme_code = 0x310084, .pme_short_desc = "No slot in GCT caused by SRQ full", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because the Store Request Queue (SRQ) is full. This happens when the storage subsystem can not process the stores in the SRQ. Groups can not be dispatched until a SRQ entry is available.", }, [ POWER5p_PME_PM_CMPLU_STALL_DCACHE_MISS ] = { .pme_name = "PM_CMPLU_STALL_DCACHE_MISS", .pme_code = 0x21109a, .pme_short_desc = "Completion stall caused by D cache miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a Data Cache Miss. Data Cache Miss has higher priority than any other Load/Store delay, so if an instruction encounters multiple delays only the Data Cache Miss will be reported and the entire delay period will be charged to Data Cache Miss. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER5p_PME_PM_THRD_SEL_OVER_ISU_HOLD ] = { .pme_name = "PM_THRD_SEL_OVER_ISU_HOLD", .pme_code = 0x410c5, .pme_short_desc = "Thread selection overrides caused by ISU holds", .pme_long_desc = "Thread selection was overridden because of an ISU hold.", }, [ POWER5p_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0x2c0090, .pme_short_desc = "LRQ flushes", .pme_long_desc = "A load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte. Combined Units 0 and 1.", }, [ POWER5p_PME_PM_THRD_PRIO_2_CYC ] = { .pme_name = "PM_THRD_PRIO_2_CYC", .pme_code = 0x420e1, .pme_short_desc = "Cycles thread running at priority level 2", .pme_long_desc = "Cycles this thread was running at priority level 2.", }, [ POWER5p_PME_PM_L3SA_MOD_INV ] = { .pme_name = "PM_L3SA_MOD_INV", .pme_code = 0x730e3, .pme_short_desc = "L3 slice A transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I) Mu|Me are not included since they are formed due to a prev read op. Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0x1c0090, .pme_short_desc = "SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_MRK_LSU_SRQ_INST_VALID ] = { .pme_name = "PM_MRK_LSU_SRQ_INST_VALID", .pme_code = 0xc70e6, .pme_short_desc = "Marked instruction valid in SRQ", .pme_long_desc = "This signal is asserted every cycle when a marked request is resident in the Store Request Queue", }, [ POWER5p_PME_PM_L3SA_REF ] = { .pme_name = "PM_L3SA_REF", .pme_code = 0x701c3, .pme_short_desc = "L3 slice A references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice", }, [ POWER5p_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c2, .pme_short_desc = "L2 slice C RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5p_PME_PM_FPU0_STALL3 ] = { .pme_name = "PM_FPU0_STALL3", .pme_code = 0x20e1, .pme_short_desc = "FPU0 stalled in pipe3", .pme_long_desc = "FPU0 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always).", }, [ POWER5p_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x100018, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1", }, [ POWER5p_PME_PM_GPR_MAP_FULL_CYC ] = { .pme_name = "PM_GPR_MAP_FULL_CYC", .pme_code = 0x130e5, .pme_short_desc = "Cycles GPR mapper full", .pme_long_desc = "The General Purpose Register mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_MRK_LSU_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_LRQ", .pme_code = 0x381088, .pme_short_desc = "Marked LRQ flushes", .pme_long_desc = "A marked load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5p_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0x20e2, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "FPU0 has executed a Floating Point Store instruction.", }, [ POWER5p_PME_PM_MRK_DTLB_MISS ] = { .pme_name = "PM_MRK_DTLB_MISS", .pme_code = 0xc50c6, .pme_short_desc = "Marked Data TLB misses", .pme_long_desc = "Data TLB references by a marked instruction that missed the TLB (all page sizes).", }, [ POWER5p_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0xc5, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "The floating point unit has executed a multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_L2SA_MOD_TAG ] = { .pme_name = "PM_L2SA_MOD_TAG", .pme_code = 0x720e0, .pme_short_desc = "L2 slice A transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0xc00c4, .pme_short_desc = "LSU1 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1).", }, [ POWER5p_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x300005, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5p_PME_PM_MRK_LSU0_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU0_FLUSH_UST", .pme_code = 0x810c0, .pme_short_desc = "LSU0 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 0 because it was unaligned", }, [ POWER5p_PME_PM_FPU0_FULL_CYC ] = { .pme_name = "PM_FPU0_FULL_CYC", .pme_code = 0x100c3, .pme_short_desc = "Cycles FPU0 issue queue full", .pme_long_desc = "The issue queue for FPU0 cannot accept any more instruction. Dispatch to this issue queue is stopped.", }, [ POWER5p_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0xc60e7, .pme_short_desc = "LRQ slot 0 allocated", .pme_long_desc = "LRQ slot zero was allocated", }, [ POWER5p_PME_PM_MRK_LSU1_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU1_FLUSH_ULD", .pme_code = 0x810c4, .pme_short_desc = "LSU1 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5p_PME_PM_MRK_DTLB_REF ] = { .pme_name = "PM_MRK_DTLB_REF", .pme_code = 0xc60e4, .pme_short_desc = "Marked Data TLB reference", .pme_long_desc = "Total number of Data TLB references by a marked instruction for all page sizes. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_BR_UNCOND ] = { .pme_name = "PM_BR_UNCOND", .pme_code = 0x123087, .pme_short_desc = "Unconditional branch", .pme_long_desc = "An unconditional branch was executed.", }, [ POWER5p_PME_PM_THRD_SEL_OVER_L2MISS ] = { .pme_name = "PM_THRD_SEL_OVER_L2MISS", .pme_code = 0x410c3, .pme_short_desc = "Thread selection overrides caused by L2 misses", .pme_long_desc = "Thread selection was overridden because one thread was had a L2 miss pending.", }, [ POWER5p_PME_PM_L2SB_SHR_INV ] = { .pme_name = "PM_L2SB_SHR_INV", .pme_code = 0x710c1, .pme_short_desc = "L2 slice B transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5p_PME_PM_MEM_LO_PRIO_WR_CMPL ] = { .pme_name = "PM_MEM_LO_PRIO_WR_CMPL", .pme_code = 0x736e6, .pme_short_desc = "Low priority write completed", .pme_long_desc = "A memory write, which was not upgraded to high priority, completed. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5p_PME_PM_MRK_DTLB_MISS_64K ] = { .pme_name = "PM_MRK_DTLB_MISS_64K", .pme_code = 0x2c608d, .pme_short_desc = "Marked Data TLB misses for 64K page", .pme_long_desc = "Data TLB references to 64KB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_MRK_ST_MISS_L1 ] = { .pme_name = "PM_MRK_ST_MISS_L1", .pme_code = 0x820e3, .pme_short_desc = "Marked L1 D cache store misses", .pme_long_desc = "A marked store missed the dcache", }, [ POWER5p_PME_PM_L3SC_MOD_TAG ] = { .pme_name = "PM_L3SC_MOD_TAG", .pme_code = 0x720e5, .pme_short_desc = "L3 slice C transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case); Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_GRP_DISP_SUCCESS ] = { .pme_name = "PM_GRP_DISP_SUCCESS", .pme_code = 0x300002, .pme_short_desc = "Group dispatch success", .pme_long_desc = "Number of groups sucessfully dispatched (not rejected)", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_1or2_CYC", .pme_code = 0x430e4, .pme_short_desc = "Cycles thread priority difference is 1 or 2", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 1 or 2.", }, [ POWER5p_PME_PM_IC_DEMAND_L2_BHT_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BHT_REDIRECT", .pme_code = 0x230e0, .pme_short_desc = "L2 I cache demand request due to BHT redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (CR mispredict).", }, [ POWER5p_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x280090, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total D-ERAT Misses. Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_MEM_WQ_DISP_Q8to15 ] = { .pme_name = "PM_MEM_WQ_DISP_Q8to15", .pme_code = 0x733e6, .pme_short_desc = "Memory write queue dispatched to queues 8-15", .pme_long_desc = "A memory operation was dispatched to a write queue in the range between 8 and 15. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0x20e3, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "FPU0 has executed a single precision instruction.", }, [ POWER5p_PME_PM_THRD_PRIO_1_CYC ] = { .pme_name = "PM_THRD_PRIO_1_CYC", .pme_code = 0x420e0, .pme_short_desc = "Cycles thread running at priority level 1", .pme_long_desc = "Cycles this thread was running at priority level 1. Priority level 1 is the lowest and indicates the thread is sleeping.", }, [ POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e2, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5p_PME_PM_SNOOP_RD_RETRY_RQ ] = { .pme_name = "PM_SNOOP_RD_RETRY_RQ", .pme_code = 0x705c6, .pme_short_desc = "Snoop read retry due to collision with active read queue", .pme_long_desc = "A snoop request for a read from memory was retried because it matched the cache line of an active read. The snoop request is retried because the L2 may be able to source data via intervention for the 2nd read faster than the MC. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FAB_HOLDtoVN_EMPTY ] = { .pme_name = "PM_FAB_HOLDtoVN_EMPTY", .pme_code = 0x721e7, .pme_short_desc = "Hold buffer to VN empty", .pme_long_desc = "Fabric cycles when the Vertical Node out hold-buffers are emtpy. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0x10c6, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "FPU1 has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5p_PME_PM_SNOOP_DCLAIM_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_DCLAIM_RETRY_QFULL", .pme_code = 0x720e6, .pme_short_desc = "Snoop dclaim/flush retry due to write/dclaim queues full", .pme_long_desc = "The memory controller A memory write was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L25_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR_CYC", .pme_code = 0x2c70a2, .pme_short_desc = "Marked load latency from L2.5 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x300003, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ POWER5p_PME_PM_FLUSH_BR_MPRED ] = { .pme_name = "PM_FLUSH_BR_MPRED", .pme_code = 0x110c6, .pme_short_desc = "Flush caused by branch mispredict", .pme_long_desc = "A flush was caused by a branch mispredict.", }, [ POWER5p_PME_PM_MRK_DTLB_MISS_16G ] = { .pme_name = "PM_MRK_DTLB_MISS_16G", .pme_code = 0x4c608d, .pme_short_desc = "Marked Data TLB misses for 16G page", .pme_long_desc = "Data TLB references to 16GB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x202090, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU has executed a store instruction. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_CMPLU_STALL_FPU ] = { .pme_name = "PM_CMPLU_STALL_FPU", .pme_code = 0x411098, .pme_short_desc = "Completion stall caused by FPU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a floating point instruction.", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus1or2_CYC", .pme_code = 0x430e2, .pme_short_desc = "Cycles thread priority difference is -1 or -2", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 1 or 2.", }, [ POWER5p_PME_PM_GCT_NOSLOT_CYC ] = { .pme_name = "PM_GCT_NOSLOT_CYC", .pme_code = 0x100004, .pme_short_desc = "Cycles no GCT slot allocated", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread.", }, [ POWER5p_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x300012, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ POWER5p_PME_PM_PTEG_FROM_L35_SHR ] = { .pme_name = "PM_PTEG_FROM_L35_SHR", .pme_code = 0x18309e, .pme_short_desc = "PTEG loaded from L3.5 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (S) data from the L3 of a chip on the same module as this processor is located, due to a demand load.", }, [ POWER5p_PME_PM_MRK_DTLB_REF_16G ] = { .pme_name = "PM_MRK_DTLB_REF_16G", .pme_code = 0x4c6086, .pme_short_desc = "Marked Data TLB reference for 16G page", .pme_long_desc = "Data TLB references by a marked instruction for 16GB pages.", }, [ POWER5p_PME_PM_MRK_LSU_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU_FLUSH_UST", .pme_code = 0x2810a8, .pme_short_desc = "Marked unaligned store flushes", .pme_long_desc = "A marked store was flushed because it was unaligned", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x1c7097, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5p_PME_PM_L3SA_HIT ] = { .pme_name = "PM_L3SA_HIT", .pme_code = 0x711c3, .pme_short_desc = "L3 slice A hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 slice", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L35_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L35_SHR", .pme_code = 0x1c709e, .pme_short_desc = "Marked data loaded from L3.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c1, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_IERAT_XLATE_WR ] = { .pme_name = "PM_IERAT_XLATE_WR", .pme_code = 0x220e7, .pme_short_desc = "Translation written to ierat", .pme_long_desc = "An entry was written into the IERAT as a result of an IERAT miss. This event can be used to count IERAT misses. An ERAT miss that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed.", }, [ POWER5p_PME_PM_L2SA_ST_REQ ] = { .pme_name = "PM_L2SA_ST_REQ", .pme_code = 0x723e0, .pme_short_desc = "L2 slice A store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_INST_FROM_LMEM ] = { .pme_name = "PM_INST_FROM_LMEM", .pme_code = 0x222086, .pme_short_desc = "Instruction fetched from local memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to the same module this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_THRD_SEL_T1 ] = { .pme_name = "PM_THRD_SEL_T1", .pme_code = 0x410c1, .pme_short_desc = "Decode selected thread 1", .pme_long_desc = "Thread selection picked thread 1 for decode.", }, [ POWER5p_PME_PM_IC_DEMAND_L2_BR_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BR_REDIRECT", .pme_code = 0x230e1, .pme_short_desc = "L2 I cache demand request due to branch redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (either ALL mispredicted or Target).", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L35_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L35_SHR_CYC", .pme_code = 0x2c70a6, .pme_short_desc = "Marked load latency from L3.5 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_FPU0_1FLOP ] = { .pme_name = "PM_FPU0_1FLOP", .pme_code = 0xc3, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5p_PME_PM_PTEG_FROM_L2 ] = { .pme_name = "PM_PTEG_FROM_L2", .pme_code = 0x183087, .pme_short_desc = "PTEG loaded from L2", .pme_long_desc = "A Page Table Entry was loaded into the TLB from the local L2 due to a demand load", }, [ POWER5p_PME_PM_MEM_PW_CMPL ] = { .pme_name = "PM_MEM_PW_CMPL", .pme_code = 0x724e6, .pme_short_desc = "Memory partial-write completed", .pme_long_desc = "Number of Partial Writes completed. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus5or6_CYC", .pme_code = 0x430e0, .pme_short_desc = "Cycles thread priority difference is -5 or -6", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 5 or 6.", }, [ POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5p_PME_PM_MRK_DTLB_MISS_4K ] = { .pme_name = "PM_MRK_DTLB_MISS_4K", .pme_code = 0x1c608d, .pme_short_desc = "Marked Data TLB misses for 4K page", .pme_long_desc = "Data TLB references to 4KB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0x10c3, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "FPU0 finished, produced a result. This only indicates finish, not completion. Floating Point Stores are included in this count but not Floating Point Loads.", }, [ POWER5p_PME_PM_L3SC_SHR_INV ] = { .pme_name = "PM_L3SC_SHR_INV", .pme_code = 0x710c5, .pme_short_desc = "L3 slice C transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5p_PME_PM_GRP_BR_REDIR ] = { .pme_name = "PM_GRP_BR_REDIR", .pme_code = 0x120e6, .pme_short_desc = "Group experienced branch redirect", .pme_long_desc = "Number of groups, counted at dispatch, that have encountered a branch redirect. Every group constructed from a fetch group that has been redirected will count.", }, [ POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5p_PME_PM_MRK_LSU_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_SRQ", .pme_code = 0x481088, .pme_short_desc = "Marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5p_PME_PM_PTEG_FROM_L275_SHR ] = { .pme_name = "PM_PTEG_FROM_L275_SHR", .pme_code = 0x383097, .pme_short_desc = "PTEG loaded from L2.75 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (T) data from the L2 on a different module than this processor is located due to a demand load.", }, [ POWER5p_PME_PM_L2SB_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5p_PME_PM_SNOOP_RD_RETRY_WQ ] = { .pme_name = "PM_SNOOP_RD_RETRY_WQ", .pme_code = 0x715c6, .pme_short_desc = "Snoop read retry due to collision with active write queue", .pme_long_desc = "A snoop request for a read from memory was retried because it matched the cache line of an active write. The snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_FAB_DCLAIM_RETRIED ] = { .pme_name = "PM_FAB_DCLAIM_RETRIED", .pme_code = 0x730e7, .pme_short_desc = "dclaim retried", .pme_long_desc = "A DCLAIM command was retried. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_LSU0_NCLD ] = { .pme_name = "PM_LSU0_NCLD", .pme_code = 0xc50c1, .pme_short_desc = "LSU0 non-cacheable loads", .pme_long_desc = "A non-cacheable load was executed by unit 0.", }, [ POWER5p_PME_PM_LSU1_BUSY_REJECT ] = { .pme_name = "PM_LSU1_BUSY_REJECT", .pme_code = 0xc20e5, .pme_short_desc = "LSU1 busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions.", }, [ POWER5p_PME_PM_FXLS0_FULL_CYC ] = { .pme_name = "PM_FXLS0_FULL_CYC", .pme_code = 0x110c0, .pme_short_desc = "Cycles FXU0/LS0 queue full", .pme_long_desc = "The issue queue that feeds the Fixed Point unit 0 / Load Store Unit 0 is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_DTLB_REF_16M ] = { .pme_name = "PM_DTLB_REF_16M", .pme_code = 0x3c2086, .pme_short_desc = "Data TLB reference for 16M page", .pme_long_desc = "Data TLB references for 16MB pages. Includes hits + misses.", }, [ POWER5p_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0x10c2, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "FPU0 has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5p_PME_PM_GCT_USAGE_60to79_CYC ] = { .pme_name = "PM_GCT_USAGE_60to79_CYC", .pme_code = 0x20001f, .pme_short_desc = "Cycles GCT 60-79% full", .pme_long_desc = "Cycles when the Global Completion Table has between 60% and 70% of its slots used. The GCT has 20 entries shared between threads.", }, [ POWER5p_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x2c3097, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_LSU0_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU0_REJECT_ERAT_MISS", .pme_code = 0xc40c3, .pme_short_desc = "LSU0 reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions due to an ERAT miss. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5p_PME_PM_DATA_FROM_L375_MOD ] = { .pme_name = "PM_DATA_FROM_L375_MOD", .pme_code = 0x1c30a7, .pme_short_desc = "Data loaded from L3.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x200015, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER5p_PME_PM_DTLB_MISS_64K ] = { .pme_name = "PM_DTLB_MISS_64K", .pme_code = 0x2c208d, .pme_short_desc = "Data TLB miss for 64K page", .pme_long_desc = "Data TLB references to 64KB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_LSU0_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU0_REJECT_RELOAD_CDF", .pme_code = 0xc40c2, .pme_short_desc = "LSU0 reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated.", }, [ POWER5p_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x42208d, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ POWER5p_PME_PM_LSU1_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU1_REJECT_RELOAD_CDF", .pme_code = 0xc40c6, .pme_short_desc = "LSU1 reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated.", }, [ POWER5p_PME_PM_MEM_WQ_DISP_Q0to7 ] = { .pme_name = "PM_MEM_WQ_DISP_Q0to7", .pme_code = 0x723e6, .pme_short_desc = "Memory write queue dispatched to queues 0-7", .pme_long_desc = "A memory operation was dispatched to a write queue in the range between 0 and 7. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0xc70e7, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ POWER5p_PME_PM_MRK_DATA_FROM_LMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM_CYC", .pme_code = 0x4c70a0, .pme_short_desc = "Marked load latency from local memory", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_BRQ_FULL_CYC ] = { .pme_name = "PM_BRQ_FULL_CYC", .pme_code = 0x100c5, .pme_short_desc = "Cycles branch queue full", .pme_long_desc = "Cycles when the issue queue that feeds the branch unit is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_GRP_IC_MISS_NONSPEC ] = { .pme_name = "PM_GRP_IC_MISS_NONSPEC", .pme_code = 0x112099, .pme_short_desc = "Group experienced non-speculative I cache miss", .pme_long_desc = "Number of groups, counted at completion, that have encountered an instruction cache miss.", }, [ POWER5p_PME_PM_PTEG_FROM_L275_MOD ] = { .pme_name = "PM_PTEG_FROM_L275_MOD", .pme_code = 0x1830a3, .pme_short_desc = "PTEG loaded from L2.75 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L2 on a different module than this processor is located due to a demand load.", }, [ POWER5p_PME_PM_MRK_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU0", .pme_code = 0x820e0, .pme_short_desc = "LSU0 marked L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by LSU0.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L375_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L375_SHR_CYC", .pme_code = 0x2c70a7, .pme_short_desc = "Marked load latency from L3.75 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x1c308e, .pme_short_desc = "Data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a demand load.", }, [ POWER5p_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x122086, .pme_short_desc = "Instruction fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_LSU_FLUSH ] = { .pme_name = "PM_LSU_FLUSH", .pme_code = 0x110c5, .pme_short_desc = "Flush initiated by LSU", .pme_long_desc = "A flush was initiated by the Load Store Unit", }, [ POWER5p_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x30000a, .pme_short_desc = "PMC2 Overflow", .pme_long_desc = "Overflows from PMC2 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0x20e0, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "FPU0 has encountered a denormalized operand.", }, [ POWER5p_PME_PM_FPU1_FMOV_FEST ] = { .pme_name = "PM_FPU1_FMOV_FEST", .pme_code = 0x10c4, .pme_short_desc = "FPU1 executed FMOV or FEST instructions", .pme_long_desc = "FPU1 has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5p_PME_PM_INST_FETCH_CYC ] = { .pme_name = "PM_INST_FETCH_CYC", .pme_code = 0x220e4, .pme_short_desc = "Cycles at least 1 instruction fetched", .pme_long_desc = "Cycles when at least one instruction was sent from the fetch unit to the decode unit.", }, [ POWER5p_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x300009, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "Number of PowerPC instructions successfully dispatched.", }, [ POWER5p_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x1c50a8, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x1c3097, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0xc30e4, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid,the data cache has been reloaded. Prior to POWER5+ this included data cache reloads due to prefetch activity. With POWER5+ this now only includes reloads due to demand loads.", }, [ POWER5p_PME_PM_MEM_WQ_DISP_DCLAIM ] = { .pme_name = "PM_MEM_WQ_DISP_DCLAIM", .pme_code = 0x713c6, .pme_short_desc = "Memory write queue dispatched due to dclaim/flush", .pme_long_desc = "A memory dclaim or flush operation was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_MRK_GRP_ISSUED ] = { .pme_name = "PM_MRK_GRP_ISSUED", .pme_code = 0x100015, .pme_short_desc = "Marked group issued", .pme_long_desc = "A sampled instruction was issued.", }, [ POWER5p_PME_PM_FPU_FULL_CYC ] = { .pme_name = "PM_FPU_FULL_CYC", .pme_code = 0x110090, .pme_short_desc = "Cycles FPU issue queue full", .pme_long_desc = "Cycles when one or both FPU issue queues are full. Combined Unit 0 + 1. Use with caution since this is the sum of cycles when Unit 0 was full plus Unit 1 full. It does not indicate when both units were full.", }, [ POWER5p_PME_PM_INST_FROM_L35_MOD ] = { .pme_name = "PM_INST_FROM_L35_MOD", .pme_code = 0x22209d, .pme_short_desc = "Instruction fetched from L3.5 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L3 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x200088, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_THRD_PRIO_3_CYC ] = { .pme_name = "PM_THRD_PRIO_3_CYC", .pme_code = 0x420e2, .pme_short_desc = "Cycles thread running at priority level 3", .pme_long_desc = "Cycles this thread was running at priority level 3.", }, [ POWER5p_PME_PM_MRK_CRU_FIN ] = { .pme_name = "PM_MRK_CRU_FIN", .pme_code = 0x400005, .pme_short_desc = "Marked instruction CRU processing finished", .pme_long_desc = "The Condition Register Unit finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_SNOOP_WR_RETRY_WQ ] = { .pme_name = "PM_SNOOP_WR_RETRY_WQ", .pme_code = 0x716c6, .pme_short_desc = "Snoop write/dclaim retry due to collision with active write queue", .pme_long_desc = "A snoop request for a write or dclaim to memory was retried because it matched the cache line of an active write. The snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_CMPLU_STALL_REJECT ] = { .pme_name = "PM_CMPLU_STALL_REJECT", .pme_code = 0x41109a, .pme_short_desc = "Completion stall caused by reject", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a load/store reject. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER5p_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x200014, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "One of the Fixed Point Units finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_LSU1_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU1_REJECT_ERAT_MISS", .pme_code = 0xc40c7, .pme_short_desc = "LSU1 reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions due to an ERAT miss. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e1, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5p_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SC_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c2, .pme_short_desc = "L2 slice C RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", }, [ POWER5p_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x10000a, .pme_short_desc = "PMC4 Overflow", .pme_long_desc = "Overflows from PMC4 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_L3SA_SNOOP_RETRY ] = { .pme_name = "PM_L3SA_SNOOP_RETRY", .pme_code = 0x731e3, .pme_short_desc = "L3 slice A snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5p_PME_PM_PTEG_FROM_L35_MOD ] = { .pme_name = "PM_PTEG_FROM_L35_MOD", .pme_code = 0x28309e, .pme_short_desc = "PTEG loaded from L3.5 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L3 of a chip on the same module as this processor is located, due to a demand load.", }, [ POWER5p_PME_PM_INST_FROM_L25_MOD ] = { .pme_name = "PM_INST_FROM_L25_MOD", .pme_code = 0x222096, .pme_short_desc = "Instruction fetched from L2.5 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L2 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions.", }, [ POWER5p_PME_PM_THRD_SMT_HANG ] = { .pme_name = "PM_THRD_SMT_HANG", .pme_code = 0x330e7, .pme_short_desc = "SMT hang detected", .pme_long_desc = "A hung thread was detected", }, [ POWER5p_PME_PM_CMPLU_STALL_ERAT_MISS ] = { .pme_name = "PM_CMPLU_STALL_ERAT_MISS", .pme_code = 0x41109b, .pme_short_desc = "Completion stall caused by ERAT miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered an ERAT miss. This is a subset of PM_CMPLU_STALL_REJECT.", }, [ POWER5p_PME_PM_L3SA_MOD_TAG ] = { .pme_name = "PM_L3SA_MOD_TAG", .pme_code = 0x720e3, .pme_short_desc = "L3 slice A transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case) Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_INST_FROM_L2MISS ] = { .pme_name = "PM_INST_FROM_L2MISS", .pme_code = 0x12209b, .pme_short_desc = "Instruction fetched missed L2", .pme_long_desc = "An instruction fetch group was fetched from beyond the local L2.", }, [ POWER5p_PME_PM_FLUSH_SYNC ] = { .pme_name = "PM_FLUSH_SYNC", .pme_code = 0x330e1, .pme_short_desc = "Flush caused by sync", .pme_long_desc = "This thread has been flushed at dispatch due to a sync, lwsync, ptesync, or tlbsync instruction. This allows the other thread to have more machine resources for it to make progress until the sync finishes.", }, [ POWER5p_PME_PM_MRK_GRP_DISP ] = { .pme_name = "PM_MRK_GRP_DISP", .pme_code = 0x100002, .pme_short_desc = "Marked group dispatched", .pme_long_desc = "A group containing a sampled instruction was dispatched", }, [ POWER5p_PME_PM_MEM_RQ_DISP_Q8to11 ] = { .pme_name = "PM_MEM_RQ_DISP_Q8to11", .pme_code = 0x722e6, .pme_short_desc = "Memory read queue dispatched to queues 8-11", .pme_long_desc = "A memory operation was dispatched to read queue 8,9,10 or 11. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_L2SC_ST_HIT ] = { .pme_name = "PM_L2SC_ST_HIT", .pme_code = 0x733e2, .pme_short_desc = "L2 slice C store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_L2SB_MOD_TAG ] = { .pme_name = "PM_L2SB_MOD_TAG", .pme_code = 0x720e1, .pme_short_desc = "L2 slice B transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_CLB_EMPTY_CYC ] = { .pme_name = "PM_CLB_EMPTY_CYC", .pme_code = 0x410c6, .pme_short_desc = "Cycles CLB empty", .pme_long_desc = "Cycles when both thread's CLB is completely empty.", }, [ POWER5p_PME_PM_L2SB_ST_HIT ] = { .pme_name = "PM_L2SB_ST_HIT", .pme_code = 0x733e1, .pme_short_desc = "L2 slice B store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A, B and C.", }, [ POWER5p_PME_PM_MEM_NONSPEC_RD_CANCEL ] = { .pme_name = "PM_MEM_NONSPEC_RD_CANCEL", .pme_code = 0x711c6, .pme_short_desc = "Non speculative memory read cancelled", .pme_long_desc = "A non-speculative read was cancelled because the combined response indicated it was sourced from aother L2 or L3. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5p_PME_PM_BR_PRED_CR_TA ] = { .pme_name = "PM_BR_PRED_CR_TA", .pme_code = 0x423087, .pme_short_desc = "A conditional branch was predicted, CR and target prediction", .pme_long_desc = "Both the condition (taken or not taken) and the target address of a branch instruction was predicted.", }, [ POWER5p_PME_PM_MRK_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_SRQ", .pme_code = 0x810c3, .pme_short_desc = "LSU0 marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5p_PME_PM_MRK_LSU_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU_FLUSH_ULD", .pme_code = 0x1810a8, .pme_short_desc = "Marked unaligned load flushes", .pme_long_desc = "A marked load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5p_PME_PM_INST_DISP_ATTEMPT ] = { .pme_name = "PM_INST_DISP_ATTEMPT", .pme_code = 0x120e1, .pme_short_desc = "Instructions dispatch attempted", .pme_long_desc = "Number of PowerPC Instructions dispatched (attempted, not filtered by success.", }, [ POWER5p_PME_PM_INST_FROM_RMEM ] = { .pme_name = "PM_INST_FROM_RMEM", .pme_code = 0x422086, .pme_short_desc = "Instruction fetched from remote memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to a different module than this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_ST_REF_L1_LSU0 ] = { .pme_name = "PM_ST_REF_L1_LSU0", .pme_code = 0xc10c1, .pme_short_desc = "LSU0 L1 D cache store references", .pme_long_desc = "Store references to the Data Cache by LSU0.", }, [ POWER5p_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x800c2, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "Total D-ERAT Misses by LSU0. Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ POWER5p_PME_PM_FPU_STALL3 ] = { .pme_name = "PM_FPU_STALL3", .pme_code = 0x202088, .pme_short_desc = "FPU stalled in pipe3", .pme_long_desc = "FPU has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_L2SB_RCLD_DISP ] = { .pme_name = "PM_L2SB_RCLD_DISP", .pme_code = 0x701c1, .pme_short_desc = "L2 slice B RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5p_PME_PM_BR_PRED_CR ] = { .pme_name = "PM_BR_PRED_CR", .pme_code = 0x230e2, .pme_short_desc = "A conditional branch was predicted, CR prediction", .pme_long_desc = "A conditional branch instruction was predicted as taken or not taken.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1c7087, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a marked load.", }, [ POWER5p_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0xc00c3, .pme_short_desc = "LSU0 SRQ lhs flushes", .pme_long_desc = "A store was flushed by unit 0 because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5p_PME_PM_FAB_PNtoNN_DIRECT ] = { .pme_name = "PM_FAB_PNtoNN_DIRECT", .pme_code = 0x703c7, .pme_short_desc = "PN to NN beat went straight to its destination", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and passes it through to the outbound NN bus without going into a sidecar. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5p_PME_PM_IOPS_CMPL ] = { .pme_name = "PM_IOPS_CMPL", .pme_code = 0x1, .pme_short_desc = "Internal operations completed", .pme_long_desc = "Number of internal operations that completed.", }, [ POWER5p_PME_PM_L2SA_RCST_DISP ] = { .pme_name = "PM_L2SA_RCST_DISP", .pme_code = 0x702c0, .pme_short_desc = "L2 slice A RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5p_PME_PM_L2SA_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5p_PME_PM_L2SC_SHR_INV ] = { .pme_name = "PM_L2SC_SHR_INV", .pme_code = 0x710c2, .pme_short_desc = "L2 slice C transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5p_PME_PM_SNOOP_RETRY_AB_COLLISION ] = { .pme_name = "PM_SNOOP_RETRY_AB_COLLISION", .pme_code = 0x735e6, .pme_short_desc = "Snoop retry due to a b collision", .pme_long_desc = "Snoop retry due to a b collision", }, [ POWER5p_PME_PM_FAB_PNtoVN_SIDECAR ] = { .pme_name = "PM_FAB_PNtoVN_SIDECAR", .pme_code = 0x733e7, .pme_short_desc = "PN to VN beat went to sidecar first", .pme_long_desc = "Fabric data beats that the base chip takes the inbound PN data and forwards it on to the outbound VN data bus after going into a sidecar first. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_LSU0_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_LMQ_FULL", .pme_code = 0xc40c1, .pme_short_desc = "LSU0 reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER5p_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0xc30e6, .pme_short_desc = "LMQ slot 0 allocated", .pme_long_desc = "The first entry in the LMQ was allocated.", }, [ POWER5p_PME_PM_SNOOP_PW_RETRY_RQ ] = { .pme_name = "PM_SNOOP_PW_RETRY_RQ", .pme_code = 0x707c6, .pme_short_desc = "Snoop partial-write retry due to collision with active read queue", .pme_long_desc = "A snoop request for a partial write to memory was retried because it matched the cache line of an active read. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_DTLB_REF ] = { .pme_name = "PM_DTLB_REF", .pme_code = 0xc20e4, .pme_short_desc = "Data TLB references", .pme_long_desc = "Total number of Data TLB references for all page sizes. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_PTEG_FROM_L3 ] = { .pme_name = "PM_PTEG_FROM_L3", .pme_code = 0x18308e, .pme_short_desc = "PTEG loaded from L3", .pme_long_desc = "A Page Table Entry was loaded into the TLB from the local L3 due to a demand load.", }, [ POWER5p_PME_PM_FAB_M1toVNorNN_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_M1toVNorNN_SIDECAR_EMPTY", .pme_code = 0x712c7, .pme_short_desc = "M1 to VN/NN sidecar empty", .pme_long_desc = "Fabric cycles when the Minus-1 jump sidecar (sidecars for mcm to mcm data transfer) is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5p_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x400015, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "Cycles the Store Request Queue is empty", }, [ POWER5p_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0x20e6, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "FPU1 has executed a Floating Point Store instruction.", }, [ POWER5p_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0xc30e5, .pme_short_desc = "LMQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle when the first entry in the LMQ is valid. The LMQ had eight entries that are allocated FIFO", }, [ POWER5p_PME_PM_GCT_USAGE_00to59_CYC ] = { .pme_name = "PM_GCT_USAGE_00to59_CYC", .pme_code = 0x10001f, .pme_short_desc = "Cycles GCT less than 60% full", .pme_long_desc = "Cycles when the Global Completion Table has fewer than 60% of its slots used. The GCT has 20 entries shared between threads.", }, [ POWER5p_PME_PM_FPU_FMOV_FEST ] = { .pme_name = "PM_FPU_FMOV_FEST", .pme_code = 0x301088, .pme_short_desc = "FPU executed FMOV or FEST instructions", .pme_long_desc = "The floating point unit has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.. Combined Unit 0 + Unit 1.", }, [ POWER5p_PME_PM_DATA_FROM_L2MISS ] = { .pme_name = "PM_DATA_FROM_L2MISS", .pme_code = 0x3c309b, .pme_short_desc = "Data loaded missed L2", .pme_long_desc = "The processor's Data Cache was reloaded but not from the local L2.", }, [ POWER5p_PME_PM_XER_MAP_FULL_CYC ] = { .pme_name = "PM_XER_MAP_FULL_CYC", .pme_code = 0x100c2, .pme_short_desc = "Cycles XER mapper full", .pme_long_desc = "The XER mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_GRP_DISP_BLK_SB_CYC ] = { .pme_name = "PM_GRP_DISP_BLK_SB_CYC", .pme_code = 0x130e1, .pme_short_desc = "Cycles group dispatch blocked by scoreboard", .pme_long_desc = "A scoreboard operation on a non-renamed resource has blocked dispatch.", }, [ POWER5p_PME_PM_FLUSH_SB ] = { .pme_name = "PM_FLUSH_SB", .pme_code = 0x330e2, .pme_short_desc = "Flush caused by scoreboard operation", .pme_long_desc = "This thread has been flushed at dispatch because its scoreboard bit is set indicating that a non-renamed resource is being updated. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L375_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L375_SHR", .pme_code = 0x3c709e, .pme_short_desc = "Marked data loaded from L3.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on a different module than this processor is located due to a marked load.", }, [ POWER5p_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x400013, .pme_short_desc = "Marked group completed", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER5p_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Suspended", .pme_long_desc = "The counter is suspended (does not count).", }, [ POWER5p_PME_PM_SNOOP_RD_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_RD_RETRY_QFULL", .pme_code = 0x700c6, .pme_short_desc = "Snoop read retry due to read queue full", .pme_long_desc = "A snoop request for a read from memory was retried because the read queues were full. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_GRP_IC_MISS_BR_REDIR_NONSPEC ] = { .pme_name = "PM_GRP_IC_MISS_BR_REDIR_NONSPEC", .pme_code = 0x120e5, .pme_short_desc = "Group experienced non-speculative I cache miss or branch redirect", .pme_long_desc = "Group experienced non-speculative I cache miss or branch redirect", }, [ POWER5p_PME_PM_DATA_FROM_L35_SHR ] = { .pme_name = "PM_DATA_FROM_L35_SHR", .pme_code = 0x1c309e, .pme_short_desc = "Data loaded from L3.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5p_PME_PM_L3SB_MOD_INV ] = { .pme_name = "PM_L3SB_MOD_INV", .pme_code = 0x730e4, .pme_short_desc = "L3 slice B transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I). Mu|Me are not included since they are formed due to a prev read op. Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x820e1, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ POWER5p_PME_PM_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_LD_MISS_L1_LSU1", .pme_code = 0xc10c5, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by unit 1.", }, [ POWER5p_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x200002, .pme_short_desc = "Group dispatches", .pme_long_desc = "A group was dispatched", }, [ POWER5p_PME_PM_DC_PREF_DST ] = { .pme_name = "PM_DC_PREF_DST", .pme_code = 0x830e6, .pme_short_desc = "DST (Data Stream Touch) stream start", .pme_long_desc = "A prefetch stream was started using the DST instruction.", }, [ POWER5p_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0x20e4, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "FPU1 has encountered a denormalized operand.", }, [ POWER5p_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0x30e0, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "FPU0 has executed FPSCR move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*, mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1c3087, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a demand load.", }, [ POWER5p_PME_PM_L2SA_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5p_PME_PM_FPU_1FLOP ] = { .pme_name = "PM_FPU_1FLOP", .pme_code = 0x100090, .pme_short_desc = "FPU executed one flop instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5p_PME_PM_L2SC_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5p_PME_PM_FPU0_FSQRT ] = { .pme_name = "PM_FPU0_FSQRT", .pme_code = 0xc2, .pme_short_desc = "FPU0 executed FSQRT instruction", .pme_long_desc = "FPU0 has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5p_PME_PM_L2SC_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e1, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5p_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x1c10a8, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Load references to the Level 1 Data Cache. Combined unit 0 + 1.", }, [ POWER5p_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x22208d, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_TLBIE_HELD ] = { .pme_name = "PM_TLBIE_HELD", .pme_code = 0x130e4, .pme_short_desc = "TLBIE held at dispatch", .pme_long_desc = "Cycles a TLBIE instruction was held at dispatch.", }, [ POWER5p_PME_PM_DC_PREF_OUT_OF_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_OF_STREAMS", .pme_code = 0xc50c2, .pme_short_desc = "D cache out of prefetch streams", .pme_long_desc = "A new prefetch stream was detected but no more stream entries were available.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L25_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD_CYC", .pme_code = 0x4c70a2, .pme_short_desc = "Marked load latency from L2.5 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5p_PME_PM_MRK_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_SRQ", .pme_code = 0x810c7, .pme_short_desc = "LSU1 marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5p_PME_PM_MEM_RQ_DISP_Q0to3 ] = { .pme_name = "PM_MEM_RQ_DISP_Q0to3", .pme_code = 0x702c6, .pme_short_desc = "Memory read queue dispatched to queues 0-3", .pme_long_desc = "A memory operation was dispatched to read queue 0,1,2, or 3. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5p_PME_PM_ST_REF_L1_LSU1 ] = { .pme_name = "PM_ST_REF_L1_LSU1", .pme_code = 0xc10c4, .pme_short_desc = "LSU1 L1 D cache store references", .pme_long_desc = "Store references to the Data Cache by LSU1.", }, [ POWER5p_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x182088, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ POWER5p_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x230e7, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "Cycles that a cache line was written to the instruction cache.", }, [ POWER5p_PME_PM_L2SC_ST_REQ ] = { .pme_name = "PM_L2SC_ST_REQ", .pme_code = 0x723e2, .pme_short_desc = "L2 slice C store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_CMPLU_STALL_FDIV ] = { .pme_name = "PM_CMPLU_STALL_FDIV", .pme_code = 0x21109b, .pme_short_desc = "Completion stall caused by FDIV or FQRT instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a floating point divide or square root instruction. This is a subset of PM_CMPLU_STALL_FPU.", }, [ POWER5p_PME_PM_THRD_SEL_OVER_CLB_EMPTY ] = { .pme_name = "PM_THRD_SEL_OVER_CLB_EMPTY", .pme_code = 0x410c2, .pme_short_desc = "Thread selection overrides caused by CLB empty", .pme_long_desc = "Thread selection was overridden because one thread's CLB was empty.", }, [ POWER5p_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x230e5, .pme_short_desc = "Branch mispredictions due to CR bit setting", .pme_long_desc = "A conditional branch instruction was incorrectly predicted as taken or not taken. The branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This will result in a branch redirect flush if not overfidden by a flush of an older instruction.", }, [ POWER5p_PME_PM_L3SB_MOD_TAG ] = { .pme_name = "PM_L3SB_MOD_TAG", .pme_code = 0x720e4, .pme_short_desc = "L3 slice B transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case); Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5p_PME_PM_MRK_DATA_FROM_L2MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS", .pme_code = 0x3c709b, .pme_short_desc = "Marked data loaded missed L2", .pme_long_desc = "DL1 was reloaded from beyond L2 due to a marked demand load.", }, [ POWER5p_PME_PM_LSU_REJECT_SRQ ] = { .pme_name = "PM_LSU_REJECT_SRQ", .pme_code = 0x1c4088, .pme_short_desc = "LSU SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x3c1088, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache. Combined unit 0 + 1.", }, [ POWER5p_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x32208d, .pme_short_desc = "Instruction fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch groups can contain up to 8 instructions", }, [ POWER5p_PME_PM_STCX_PASS ] = { .pme_name = "PM_STCX_PASS", .pme_code = 0x820e5, .pme_short_desc = "Stcx passes", .pme_long_desc = "A stcx (stwcx or stdcx) instruction was successful", }, [ POWER5p_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0xc10c7, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ POWER5p_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x110c3, .pme_short_desc = "Cycles SRQ full", .pme_long_desc = "Cycles the Store Request Queue is full.", }, [ POWER5p_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x401088, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result. This only indicates finish, not completion. Combined Unit 0 + Unit 1. Floating Point Stores are included in this count but not Floating Point Loads., , , XYZs", }, [ POWER5p_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0x2c6088, .pme_short_desc = "SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss. Combined Unit 0 + 1.", }, [ POWER5p_PME_PM_L2SA_SHR_MOD ] = { .pme_name = "PM_L2SA_SHR_MOD", .pme_code = 0x700c0, .pme_short_desc = "L2 slice A transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C.", }, [ POWER5p_PME_PM_0INST_CLB_CYC ] = { .pme_name = "PM_0INST_CLB_CYC", .pme_code = 0x400c0, .pme_short_desc = "Cycles no instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5p_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x130e2, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER5p_PME_PM_L2SB_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e2, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5p_PME_PM_THRD_GRP_CMPL_BOTH_CYC ] = { .pme_name = "PM_THRD_GRP_CMPL_BOTH_CYC", .pme_code = 0x200013, .pme_short_desc = "Cycles group completed by both threads", .pme_long_desc = "Cycles that both threads completed.", }, [ POWER5p_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x10001a, .pme_short_desc = "PMC5 Overflow", .pme_long_desc = "Overflows from PMC5 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5p_PME_PM_FPU0_FDIV ] = { .pme_name = "PM_FPU0_FDIV", .pme_code = 0xc0, .pme_short_desc = "FPU0 executed FDIV instruction", .pme_long_desc = "FPU0 has executed a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER5p_PME_PM_PTEG_FROM_L375_SHR ] = { .pme_name = "PM_PTEG_FROM_L375_SHR", .pme_code = 0x38309e, .pme_short_desc = "PTEG loaded from L3.75 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (S) data from the L3 of a chip on a different module than this processor is located, due to a demand load.", }, [ POWER5p_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x20000b, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ POWER5p_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SA_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c0, .pme_short_desc = "L2 slice A RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", }, [ POWER5p_PME_PM_THRD_PRIO_DIFF_0_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_0_CYC", .pme_code = 0x430e3, .pme_short_desc = "Cycles no thread priority difference", .pme_long_desc = "Cycles when this thread's priority is equal to the other thread's priority.", }, [ POWER5p_PME_PM_LR_CTR_MAP_FULL_CYC ] = { .pme_name = "PM_LR_CTR_MAP_FULL_CYC", .pme_code = 0x100c6, .pme_short_desc = "Cycles LR/CTR mapper full", .pme_long_desc = "The LR/CTR mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5p_PME_PM_L3SB_SHR_INV ] = { .pme_name = "PM_L3SB_SHR_INV", .pme_code = 0x710c4, .pme_short_desc = "L3 slice B transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5p_PME_PM_DATA_FROM_RMEM ] = { .pme_name = "PM_DATA_FROM_RMEM", .pme_code = 0x1c30a1, .pme_short_desc = "Data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to a different module than this proccessor is located on.", }, [ POWER5p_PME_PM_DATA_FROM_L275_MOD ] = { .pme_name = "PM_DATA_FROM_L275_MOD", .pme_code = 0x1c30a3, .pme_short_desc = "Data loaded from L2.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 on a different module than this processor is located due to a demand load.", }, [ POWER5p_PME_PM_LSU0_REJECT_SRQ ] = { .pme_name = "PM_LSU0_REJECT_SRQ", .pme_code = 0xc40c0, .pme_short_desc = "LSU0 SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue.", }, [ POWER5p_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x800c6, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER5p_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x400014, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5p_PME_PM_DTLB_MISS_16M ] = { .pme_name = "PM_DTLB_MISS_16M", .pme_code = 0x3c208d, .pme_short_desc = "Data TLB miss for 16M page", .pme_long_desc = "Data TLB references to 16MB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5p_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0xc00c1, .pme_short_desc = "LSU0 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4K boundary).", }, [ POWER5p_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SB_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c1, .pme_short_desc = "L2 slice B RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", }, [ POWER5p_PME_PM_L2SC_MOD_TAG ] = { .pme_name = "PM_L2SC_MOD_TAG", .pme_code = 0x720e2, .pme_short_desc = "L2 slice C transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", } }; #endif papi-5.3.0/src/libpfm4/lib/events/intel_core_events.h0000600003276200002170000014744012247131123022226 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: core (Intel Core) */ static const intel_x86_umask_t core_rs_uops_dispatched_cycles[]={ { .uname = "PORT_0", .udesc = "On port 0", .ucode = 0x100, }, { .uname = "PORT_1", .udesc = "On port 1", .ucode = 0x200, }, { .uname = "PORT_2", .udesc = "On port 2", .ucode = 0x400, }, { .uname = "PORT_3", .udesc = "On port 3", .ucode = 0x800, }, { .uname = "PORT_4", .udesc = "On port 4", .ucode = 0x1000, }, { .uname = "PORT_5", .udesc = "On port 5", .ucode = 0x2000, }, { .uname = "ANY", .udesc = "On any port", .uequiv = "PORT_0:PORT_1:PORT_2:PORT_3:PORT_4:PORT_5", .ucode = 0x3f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_load_block[]={ { .uname = "STA", .udesc = "Loads blocked by a preceding store with unknown address", .ucode = 0x200, }, { .uname = "STD", .udesc = "Loads blocked by a preceding store with unknown data", .ucode = 0x400, }, { .uname = "OVERLAP_STORE", .udesc = "Loads that partially overlap an earlier store, or 4K equived with a previous store", .ucode = 0x800, }, { .uname = "UNTIL_RETIRE", .udesc = "Loads blocked until retirement", .ucode = 0x1000, }, { .uname = "L1D", .udesc = "Loads blocked by the L1 data cache", .ucode = 0x2000, }, }; static const intel_x86_umask_t core_store_block[]={ { .uname = "ORDER", .udesc = "Cycles while store is waiting for a preceding store to be globally observed", .ucode = 0x200, }, { .uname = "SNOOP", .udesc = "A store is blocked due to a conflict with an external or internal snoop", .ucode = 0x800, }, }; static const intel_x86_umask_t core_sse_pre_exec[]={ { .uname = "NTA", .udesc = "Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORES", .udesc = "Streaming SIMD Extensions (SSE) Weakly-ordered store instructions executed", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_dtlb_misses[]={ { .uname = "ANY", .udesc = "Any memory access that missed the DTLB", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, { .uname = "MISS_LD", .udesc = "DTLB misses due to load operations", .ucode = 0x200, }, { .uname = "L0_MISS_LD", .udesc = "L0 DTLB misses due to load operations", .ucode = 0x400, }, { .uname = "MISS_ST", .udesc = "DTLB misses due to store operations", .ucode = 0x800, }, }; static const intel_x86_umask_t core_memory_disambiguation[]={ { .uname = "RESET", .udesc = "Memory disambiguation reset cycles", .ucode = 0x100, }, { .uname = "SUCCESS", .udesc = "Number of loads that were successfully disambiguated", .ucode = 0x200, }, }; static const intel_x86_umask_t core_page_walks[]={ { .uname = "COUNT", .udesc = "Number of page-walks executed", .ucode = 0x100, }, { .uname = "CYCLES", .udesc = "Duration of page-walks in core cycles", .ucode = 0x200, }, }; static const intel_x86_umask_t core_delayed_bypass[]={ { .uname = "FP", .udesc = "Delayed bypass to FP operation", .ucode = 0x0, }, { .uname = "SIMD", .udesc = "Delayed bypass to SIMD operation", .ucode = 0x100, }, { .uname = "LOAD", .udesc = "Delayed bypass to load operation", .ucode = 0x200, }, }; static const intel_x86_umask_t core_l2_ads[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_l2_lines_in[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, { .uname = "EXCL_PREFETCH", .udesc = "Exclude hardware prefetch", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t core_l2_ifetch[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "MESI", .udesc = "Any cacheline access", .uequiv = "M_STATE:E_STATE:S_STATE:I_STATE", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 1, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 1, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 1, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 1, }, }; static const intel_x86_umask_t core_l2_ld[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, { .uname = "EXCL_PREFETCH", .udesc = "Exclude hardware prefetch", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, { .uname = "MESI", .udesc = "Any cacheline access", .uequiv = "M_STATE:E_STATE:S_STATE:I_STATE", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 2, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 2, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 2, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 2, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 2, }, }; static const intel_x86_umask_t core_cpu_clk_unhalted[]={ { .uname = "CORE_P", .udesc = "Core cycles when core is not halted", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "BUS", .udesc = "Bus cycles when core is not halted. This event can give a measurement of the elapsed time. This events has a constant ratio with CPU_CLK_UNHALTED:REF event, which is the maximum bus to processor frequency ratio", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NO_OTHER", .udesc = "Bus cycles when core is active and the other is halted", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_l1d_cache_ld[]={ { .uname = "MESI", .udesc = "Any cacheline access", .uequiv = "M_STATE:E_STATE:S_STATE:I_STATE", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, }, }; static const intel_x86_umask_t core_l1d_split[]={ { .uname = "LOADS", .udesc = "Cache line split loads from the L1 data cache", .ucode = 0x100, }, { .uname = "STORES", .udesc = "Cache line split stores to the L1 data cache", .ucode = 0x200, }, }; static const intel_x86_umask_t core_sse_pre_miss[]={ { .uname = "NTA", .udesc = "Streaming SIMD Extensions (SSE) Prefetch NTA instructions missing all cache levels", .ucode = 0x0, }, { .uname = "L1", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT0 instructions missing all cache levels", .ucode = 0x100, }, { .uname = "L2", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions missing all cache levels", .ucode = 0x200, }, }; static const intel_x86_umask_t core_l1d_prefetch[]={ { .uname = "REQUESTS", .udesc = "L1 data cache prefetch requests", .ucode = 0x1000, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_bus_request_outstanding[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t core_bus_bnr_drv[]={ { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_ext_snoop[]={ { .uname = "ANY", .udesc = "Any external snoop response", .ucode = 0xb00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "CLEAN", .udesc = "External snoop CLEAN response", .ucode = 0x100, .grpid = 0, }, { .uname = "HIT", .udesc = "External snoop HIT response", .ucode = 0x200, .grpid = 0, }, { .uname = "HITM", .udesc = "External snoop HITM response", .ucode = 0x800, .grpid = 0, }, { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t core_cmp_snoop[]={ { .uname = "ANY", .udesc = "L1 data cache is snooped by other core", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "SHARE", .udesc = "L1 data cache is snooped for sharing by other core", .ucode = 0x100, .grpid = 0, }, { .uname = "INVALIDATE", .udesc = "L1 data cache is snooped for Invalidation by other core", .ucode = 0x200, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t core_itlb[]={ { .uname = "SMALL_MISS", .udesc = "ITLB small page misses", .ucode = 0x200, }, { .uname = "LARGE_MISS", .udesc = "ITLB large page misses", .ucode = 0x1000, }, { .uname = "FLUSH", .udesc = "ITLB flushes", .ucode = 0x4000, }, { .uname = "MISSES", .udesc = "ITLB misses", .ucode = 0x1200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_inst_queue[]={ { .uname = "FULL", .udesc = "Cycles during which the instruction queue is full", .ucode = 0x200, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_macro_insts[]={ { .uname = "DECODED", .udesc = "Instructions decoded", .ucode = 0x100, }, { .uname = "CISC_DECODED", .udesc = "CISC instructions decoded", .ucode = 0x800, }, }; static const intel_x86_umask_t core_esp[]={ { .uname = "SYNCH", .udesc = "ESP register content synchronization", .ucode = 0x100, }, { .uname = "ADDITIONS", .udesc = "ESP register automatic additions", .ucode = 0x200, }, }; static const intel_x86_umask_t core_simd_uop_type_exec[]={ { .uname = "MUL", .udesc = "SIMD packed multiply micro-ops executed", .ucode = 0x100, }, { .uname = "SHIFT", .udesc = "SIMD packed shift micro-ops executed", .ucode = 0x200, }, { .uname = "PACK", .udesc = "SIMD pack micro-ops executed", .ucode = 0x400, }, { .uname = "UNPACK", .udesc = "SIMD unpack micro-ops executed", .ucode = 0x800, }, { .uname = "LOGICAL", .udesc = "SIMD packed logical micro-ops executed", .ucode = 0x1000, }, { .uname = "ARITHMETIC", .udesc = "SIMD packed arithmetic micro-ops executed", .ucode = 0x2000, }, }; static const intel_x86_umask_t core_inst_retired[]={ { .uname = "ANY_P", .udesc = "Instructions retired (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "LOADS", .udesc = "Instructions retired, which contain a load", .ucode = 0x100, }, { .uname = "STORES", .udesc = "Instructions retired, which contain a store", .ucode = 0x200, }, { .uname = "OTHER", .udesc = "Instructions retired, with no load or store operation", .ucode = 0x400, }, }; static const intel_x86_umask_t core_x87_ops_retired[]={ { .uname = "FXCH", .udesc = "FXCH instructions retired", .ucode = 0x100, }, { .uname = "ANY", .udesc = "Retired floating-point computational operations (Precise Event)", .ucode = 0xfe00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_uops_retired[]={ { .uname = "LD_IND_BR", .udesc = "Fused load+op or load+indirect branch retired", .ucode = 0x100, }, { .uname = "STD_STA", .udesc = "Fused store address + data retired", .ucode = 0x200, }, { .uname = "MACRO_FUSION", .udesc = "Retired instruction pairs fused into one micro-op", .ucode = 0x400, }, { .uname = "NON_FUSED", .udesc = "Non-fused micro-ops retired", .ucode = 0x800, }, { .uname = "FUSED", .udesc = "Fused micro-ops retired", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "Micro-ops retired", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_machine_nukes[]={ { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x100, }, { .uname = "MEM_ORDER", .udesc = "Execution pipeline restart due to memory ordering conflict or memory disambiguation misprediction", .ucode = 0x400, }, }; static const intel_x86_umask_t core_br_inst_retired[]={ { .uname = "ANY", .udesc = "Retired branch instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PRED_NOT_TAKEN", .udesc = "Retired branch instructions that were predicted not-taken", .ucode = 0x100, }, { .uname = "MISPRED_NOT_TAKEN", .udesc = "Retired branch instructions that were mispredicted not-taken", .ucode = 0x200, }, { .uname = "PRED_TAKEN", .udesc = "Retired branch instructions that were predicted taken", .ucode = 0x400, }, { .uname = "MISPRED_TAKEN", .udesc = "Retired branch instructions that were mispredicted taken", .ucode = 0x800, }, { .uname = "TAKEN", .udesc = "Retired taken branch instructions", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t core_simd_inst_retired[]={ { .uname = "PACKED_SINGLE", .udesc = "Retired Streaming SIMD Extensions (SSE) packed-single instructions", .ucode = 0x100, }, { .uname = "SCALAR_SINGLE", .udesc = "Retired Streaming SIMD Extensions (SSE) scalar-single instructions", .ucode = 0x200, }, { .uname = "PACKED_DOUBLE", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions", .ucode = 0x400, }, { .uname = "SCALAR_DOUBLE", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions", .ucode = 0x800, }, { .uname = "VECTOR", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) vector integer instructions", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "Retired Streaming SIMD instructions (Precise Event)", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_simd_comp_inst_retired[]={ { .uname = "PACKED_SINGLE", .udesc = "Retired computational Streaming SIMD Extensions (SSE) packed-single instructions", .ucode = 0x100, }, { .uname = "SCALAR_SINGLE", .udesc = "Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions", .ucode = 0x200, }, { .uname = "PACKED_DOUBLE", .udesc = "Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions", .ucode = 0x400, }, { .uname = "SCALAR_DOUBLE", .udesc = "Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions", .ucode = 0x800, }, }; static const intel_x86_umask_t core_mem_load_retired[]={ { .uname = "L1D_MISS", .udesc = "Retired loads that miss the L1 data cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_PEBS, }, { .uname = "L1D_LINE_MISS", .udesc = "L1 data cache line missed by retired loads (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_PEBS, }, { .uname = "L2_MISS", .udesc = "Retired loads that miss the L2 cache (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_PEBS, }, { .uname = "L2_LINE_MISS", .udesc = "L2 cache line missed by retired loads (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_PEBS, }, { .uname = "DTLB_MISS", .udesc = "Retired loads that miss the DTLB (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_PEBS, }, }; static const intel_x86_umask_t core_fp_mmx_trans[]={ { .uname = "TO_FP", .udesc = "Transitions from MMX (TM) Instructions to Floating Point Instructions", .ucode = 0x200, }, { .uname = "TO_MMX", .udesc = "Transitions from Floating Point to MMX (TM) Instructions", .ucode = 0x100, }, }; static const intel_x86_umask_t core_rat_stalls[]={ { .uname = "ROB_READ_PORT", .udesc = "ROB read port stalls cycles", .ucode = 0x100, }, { .uname = "PARTIAL_CYCLES", .udesc = "Partial register stall cycles", .ucode = 0x200, }, { .uname = "FLAGS", .udesc = "Flag stall cycles", .ucode = 0x400, }, { .uname = "FPSW", .udesc = "FPU status word stall", .ucode = 0x800, }, { .uname = "ANY", .udesc = "All RAT stall cycles", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_seg_rename_stalls[]={ { .uname = "ES", .udesc = "Segment rename stalls - ES ", .ucode = 0x100, }, { .uname = "DS", .udesc = "Segment rename stalls - DS", .ucode = 0x200, }, { .uname = "FS", .udesc = "Segment rename stalls - FS", .ucode = 0x400, }, { .uname = "GS", .udesc = "Segment rename stalls - GS", .ucode = 0x800, }, { .uname = "ANY", .udesc = "Any (ES/DS/FS/GS) segment rename stall", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_seg_reg_renames[]={ { .uname = "ES", .udesc = "Segment renames - ES", .ucode = 0x100, }, { .uname = "DS", .udesc = "Segment renames - DS", .ucode = 0x200, }, { .uname = "FS", .udesc = "Segment renames - FS", .ucode = 0x400, }, { .uname = "GS", .udesc = "Segment renames - GS", .ucode = 0x800, }, { .uname = "ANY", .udesc = "Any (ES/DS/FS/GS) segment rename", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t core_resource_stalls[]={ { .uname = "ROB_FULL", .udesc = "Cycles during which the ROB is full", .ucode = 0x100, }, { .uname = "RS_FULL", .udesc = "Cycles during which the RS is full", .ucode = 0x200, }, { .uname = "LD_ST", .udesc = "Cycles during which the pipeline has exceeded load or store limit or waiting to commit all stores", .ucode = 0x400, }, { .uname = "FPCW", .udesc = "Cycles stalled due to FPU control word write", .ucode = 0x800, }, { .uname = "BR_MISS_CLEAR", .udesc = "Cycles stalled due to branch misprediction", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "Resource related stalls", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_entry_t intel_core_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x200000003ull, .code = 0x3c, }, { .name = "INSTRUCTION_RETIRED", .desc = "Count the number of instructions at retirement", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x100000003ull, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias from INSTRUCTION_RETIRED", .modmsk = INTEL_X86_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x100000003ull, .code = 0xc0, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED2_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "LLC_REFERENCES", .desc = "Count each request originating equiv the core to reference a cache line in the last level cache. The count may include speculation, but excludes cache line fills due to hardware prefetch. Alias to L2_RQSTS:SELF_DEMAND_MESI", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for LLC_REFERENCES", .modmsk = INTEL_X86_ATTRS, .equiv = "LLC_REFERENCES", .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LLC_MISSES", .desc = "Count each cache miss condition for references to the last level cache. The event count may include speculation, but excludes cache line fills due to hardware prefetch. Alias to event L2_RQSTS:SELF_DEMAND_I_STATE", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for LLC_MISSES", .modmsk = INTEL_X86_ATTRS, .equiv = "LLC_MISSES", .cntmsk = 0x3, .code = 0x412e, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction.", .modmsk = INTEL_X86_ATTRS, .equiv = "BR_INST_RETIRED:ANY", .cntmsk = 0x3, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware.", .modmsk = INTEL_X86_ATTRS, .equiv = "BR_INST_RETIRED_MISPRED", .cntmsk = 0x3, .code = 0xc5, .flags= INTEL_X86_PEBS, }, { .name = "RS_UOPS_DISPATCHED_CYCLES", .desc = "Cycles micro-ops dispatched for execution", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xa1, .numasks = LIBPFM_ARRAY_SIZE(core_rs_uops_dispatched_cycles), .ngrp = 1, .umasks = core_rs_uops_dispatched_cycles, }, { .name = "RS_UOPS_DISPATCHED", .desc = "Number of micro-ops dispatched for execution", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa0, }, { .name = "RS_UOPS_DISPATCHED_NONE", .desc = "Number of of cycles in which no micro-ops is dispatched for execution", .modmsk =0x0, .equiv = "RS_UOPS_DISPATCHED:i=1:c=1", .cntmsk = 0x3, .code = 0xa0 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), }, { .name = "LOAD_BLOCK", .desc = "Loads blocked", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(core_load_block), .ngrp = 1, .umasks = core_load_block, }, { .name = "SB_DRAIN_CYCLES", .desc = "Cycles while stores are blocked due to store buffer drain", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x104, }, { .name = "STORE_BLOCK", .desc = "Cycles while store is waiting", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(core_store_block), .ngrp = 1, .umasks = core_store_block, }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "SSE_PRE_EXEC", .desc = "Streaming SIMD Extensions (SSE) Prefetch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(core_sse_pre_exec), .ngrp = 1, .umasks = core_sse_pre_exec, }, { .name = "DTLB_MISSES", .desc = "Memory accesses that missed the DTLB", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(core_dtlb_misses), .ngrp = 1, .umasks = core_dtlb_misses, }, { .name = "MEMORY_DISAMBIGUATION", .desc = "Memory disambiguation", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x9, .numasks = LIBPFM_ARRAY_SIZE(core_memory_disambiguation), .ngrp = 1, .umasks = core_memory_disambiguation, }, { .name = "PAGE_WALKS", .desc = "Number of page-walks executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc, .numasks = LIBPFM_ARRAY_SIZE(core_page_walks), .ngrp = 1, .umasks = core_page_walks, }, { .name = "FP_COMP_OPS_EXE", .desc = "Floating point computational micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "Floating point assists", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Multiply operations executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Divide operations executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Cycles the divider is busy", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "IDLE_DURING_DIV", .desc = "Cycles the divider is busy and all other execution units are idle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x18, }, { .name = "DELAYED_BYPASS", .desc = "Delayed bypass", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x19, .numasks = LIBPFM_ARRAY_SIZE(core_delayed_bypass), .ngrp = 1, .umasks = core_delayed_bypass, }, { .name = "L2_ADS", .desc = "Cycles L2 address bus is in use", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, }, { .name = "L2_DBUS_BUSY_RD", .desc = "Cycles the L2 transfers data to the core", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "L2 cache misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(core_l2_lines_in), .ngrp = 2, .umasks = core_l2_lines_in, }, { .name = "L2_M_LINES_IN", .desc = "L2 cache line modifications", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_OUT", .desc = "L2 cache lines evicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(core_l2_lines_in), .ngrp = 2, .umasks = core_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_OUT", .desc = "Modified lines evicted from the L2 cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(core_l2_lines_in), .ngrp = 2, .umasks = core_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "L2_IFETCH", .desc = "L2 cacheable instruction fetch requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ifetch), .ngrp = 2, .umasks = core_l2_ifetch, }, { .name = "L2_LD", .desc = "L2 cache reads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ld), .ngrp = 3, .umasks = core_l2_ld, }, { .name = "L2_ST", .desc = "L2 store requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ifetch), .ngrp = 2, .umasks = core_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_LOCK", .desc = "L2 locked accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2b, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ifetch), .ngrp = 2, .umasks = core_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_RQSTS", .desc = "L2 cache requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ld), .ngrp = 3, .umasks = core_l2_ld, /* identical to actual umasks list for this event */ }, { .name = "L2_REJECT_BUSQ", .desc = "Rejected L2 cache requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x30, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ld), .ngrp = 3, .umasks = core_l2_ld, /* identical to actual umasks list for this event */ }, { .name = "L2_NO_REQ", .desc = "Cycles no L2 cache requests are pending", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x32, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "EIST_TRANS", .desc = "Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3a, }, { .name = "THERMAL_TRIP", .desc = "Number of thermal trips", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc03b, }, { .name = "CPU_CLK_UNHALTED", .desc = "Core cycles when core is not halted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(core_cpu_clk_unhalted), .ngrp = 1, .umasks = core_cpu_clk_unhalted, }, { .name = "L1D_CACHE_LD", .desc = "L1 cacheable data reads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(core_l1d_cache_ld), .ngrp = 1, .umasks = core_l1d_cache_ld, }, { .name = "L1D_CACHE_ST", .desc = "L1 cacheable data writes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(core_l1d_cache_ld), .ngrp = 1, .umasks = core_l1d_cache_ld, /* identical to actual umasks list for this event */ }, { .name = "L1D_CACHE_LOCK", .desc = "L1 data cacheable locked reads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(core_l1d_cache_ld), .ngrp = 1, .umasks = core_l1d_cache_ld, /* identical to actual umasks list for this event */ }, { .name = "L1D_ALL_REF", .desc = "All references to the L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x143, }, { .name = "L1D_ALL_CACHE_REF", .desc = "L1 Data cacheable reads and writes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x243, }, { .name = "L1D_REPL", .desc = "Cache lines allocated in the L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf45, }, { .name = "L1D_M_REPL", .desc = "Modified cache lines allocated in the L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "L1D_M_EVICT", .desc = "Modified cache lines evicted from the L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "L1D_PEND_MISS", .desc = "Total number of outstanding L1 data cache misses at any cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "L1D_SPLIT", .desc = "Cache line split from L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(core_l1d_split), .ngrp = 1, .umasks = core_l1d_split, }, { .name = "SSE_PRE_MISS", .desc = "Streaming SIMD Extensions (SSE) instructions missing all cache levels", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(core_sse_pre_miss), .ngrp = 1, .umasks = core_sse_pre_miss, }, { .name = "LOAD_HIT_PRE", .desc = "Load operations conflicting with a software prefetch to the same address", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4c, }, { .name = "L1D_PREFETCH", .desc = "L1 data cache prefetch", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4e, .numasks = LIBPFM_ARRAY_SIZE(core_l1d_prefetch), .ngrp = 1, .umasks = core_l1d_prefetch, }, { .name = "BUS_REQUEST_OUTSTANDING", .desc = "Number of pending full cache line read transactions on the bus occurring in each cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, }, { .name = "BUS_BNR_DRV", .desc = "Number of Bus Not Ready signals asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, .numasks = LIBPFM_ARRAY_SIZE(core_bus_bnr_drv), .ngrp = 1, .umasks = core_bus_bnr_drv, }, { .name = "BUS_DRDY_CLOCKS", .desc = "Bus cycles when data is sent on the bus", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(core_bus_bnr_drv), .ngrp = 1, .umasks = core_bus_bnr_drv, /* identical to actual umasks list for this event */ }, { .name = "BUS_LOCK_CLOCKS", .desc = "Bus cycles when a LOCK signal is asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RCV", .desc = "Bus cycles while processor receives data", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x64, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_BRD", .desc = "Burst read bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "RFO bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Explicit writeback bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IFETCH", .desc = "Instruction-fetch bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_INVAL", .desc = "Invalidate bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_PWR", .desc = "Partial write bus transaction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Partial bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "IO bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_DEF", .desc = "Deferred bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_BURST", .desc = "Burst (full cache-line) bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_MEM", .desc = "Memory bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_ANY", .desc = "All bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "EXT_SNOOP", .desc = "External snoops responses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x77, .numasks = LIBPFM_ARRAY_SIZE(core_ext_snoop), .ngrp = 2, .umasks = core_ext_snoop, }, { .name = "CMP_SNOOP", .desc = "L1 data cache is snooped by other core", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x78, .numasks = LIBPFM_ARRAY_SIZE(core_cmp_snoop), .ngrp = 2, .umasks = core_cmp_snoop, }, { .name = "BUS_HIT_DRV", .desc = "HIT signal asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7a, .numasks = LIBPFM_ARRAY_SIZE(core_bus_bnr_drv), .ngrp = 1, .umasks = core_bus_bnr_drv, /* identical to actual umasks list for this event */ }, { .name = "BUS_HITM_DRV", .desc = "HITM signal asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7b, .numasks = LIBPFM_ARRAY_SIZE(core_bus_bnr_drv), .ngrp = 1, .umasks = core_bus_bnr_drv, /* identical to actual umasks list for this event */ }, { .name = "BUSQ_EMPTY", .desc = "Bus queue is empty", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(core_bus_bnr_drv), .ngrp = 1, .umasks = core_bus_bnr_drv, /* identical to actual umasks list for this event */ }, { .name = "SNOOP_STALL_DRV", .desc = "Bus stalled for snoops", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(core_bus_request_outstanding), .ngrp = 2, .umasks = core_bus_request_outstanding, /* identical to actual umasks list for this event */ }, { .name = "BUS_IO_WAIT", .desc = "IO requests waiting in the bus queue", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(core_l2_ads), .ngrp = 1, .umasks = core_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "L1I_READS", .desc = "Instruction fetches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "L1I_MISSES", .desc = "Instruction Fetch Unit misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB", .desc = "ITLB small page misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x82, .numasks = LIBPFM_ARRAY_SIZE(core_itlb), .ngrp = 1, .umasks = core_itlb, }, { .name = "INST_QUEUE", .desc = "Cycles during which the instruction queue is full", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x83, .numasks = LIBPFM_ARRAY_SIZE(core_inst_queue), .ngrp = 1, .umasks = core_inst_queue, }, { .name = "CYCLES_L1I_MEM_STALLED", .desc = "Cycles during which instruction fetches are stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stall cycles due to a length changing prefix", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x88, }, { .name = "BR_MISSP_EXEC", .desc = "Mispredicted branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x89, }, { .name = "BR_BAC_MISSP_EXEC", .desc = "Branch instructions mispredicted at decoding", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8a, }, { .name = "BR_CND_EXEC", .desc = "Conditional branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8b, }, { .name = "BR_CND_MISSP_EXEC", .desc = "Mispredicted conditional branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8c, }, { .name = "BR_IND_EXEC", .desc = "Indirect branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8d, }, { .name = "BR_IND_MISSP_EXEC", .desc = "Mispredicted indirect branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8e, }, { .name = "BR_RET_EXEC", .desc = "RET instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8f, }, { .name = "BR_RET_MISSP_EXEC", .desc = "Mispredicted RET instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x90, }, { .name = "BR_RET_BAC_MISSP_EXEC", .desc = "RET instructions executed mispredicted at decoding", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x91, }, { .name = "BR_CALL_EXEC", .desc = "CALL instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x92, }, { .name = "BR_CALL_MISSP_EXEC", .desc = "Mispredicted CALL instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x93, }, { .name = "BR_IND_CALL_EXEC", .desc = "Indirect CALL instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x94, }, { .name = "BR_TKN_BUBBLE_1", .desc = "Branch predicted taken with bubble I", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x97, }, { .name = "BR_TKN_BUBBLE_2", .desc = "Branch predicted taken with bubble II", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x98, }, { .name = "MACRO_INSTS", .desc = "Instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xaa, .numasks = LIBPFM_ARRAY_SIZE(core_macro_insts), .ngrp = 1, .umasks = core_macro_insts, }, { .name = "ESP", .desc = "ESP register content synchronization", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xab, .numasks = LIBPFM_ARRAY_SIZE(core_esp), .ngrp = 1, .umasks = core_esp, }, { .name = "SIMD_UOPS_EXEC", .desc = "SIMD micro-ops executed (excluding stores)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb0, }, { .name = "SIMD_SAT_UOP_EXEC", .desc = "SIMD saturated arithmetic micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb1, }, { .name = "SIMD_UOP_TYPE_EXEC", .desc = "SIMD packed multiply micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(core_simd_uop_type_exec), .ngrp = 1, .umasks = core_simd_uop_type_exec, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(core_inst_retired), .ngrp = 1, .umasks = core_inst_retired, }, { .name = "X87_OPS_RETIRED", .desc = "FXCH instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc1, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(core_x87_ops_retired), .ngrp = 1, .umasks = core_x87_ops_retired, }, { .name = "UOPS_RETIRED", .desc = "Fused load+op or load+indirect branch retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, .numasks = LIBPFM_ARRAY_SIZE(core_uops_retired), .ngrp = 1, .umasks = core_uops_retired, }, { .name = "MACHINE_NUKES", .desc = "Self-Modifying Code detected", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(core_machine_nukes), .ngrp = 1, .umasks = core_machine_nukes, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, .numasks = LIBPFM_ARRAY_SIZE(core_br_inst_retired), .ngrp = 1, .umasks = core_br_inst_retired, }, { .name = "BR_INST_RETIRED_MISPRED", .desc = "Retired mispredicted branch instructions (Precise_Event)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, .flags= INTEL_X86_PEBS, }, { .name = "CYCLES_INT_MASKED", .desc = "Cycles during which interrupts are disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x1c6, }, { .name = "CYCLES_INT_PENDING_AND_MASKED", .desc = "Cycles during which interrupts are pending and disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2c6, }, { .name = "SIMD_INST_RETIRED", .desc = "Retired Streaming SIMD Extensions (SSE) packed-single instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(core_simd_inst_retired), .ngrp = 1, .umasks = core_simd_inst_retired, }, { .name = "HW_INT_RCV", .desc = "Hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "ITLB_MISS_RETIRED", .desc = "Retired instructions that missed the ITLB", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "SIMD_COMP_INST_RETIRED", .desc = "Retired computational Streaming SIMD Extensions (SSE) packed-single instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, .numasks = LIBPFM_ARRAY_SIZE(core_simd_comp_inst_retired), .ngrp = 1, .umasks = core_simd_comp_inst_retired, }, { .name = "MEM_LOAD_RETIRED", .desc = "Retired loads that miss the L1 data cache", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xcb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(core_mem_load_retired), .ngrp = 1, .umasks = core_mem_load_retired, }, { .name = "FP_MMX_TRANS", .desc = "Transitions from MMX (TM) Instructions to Floating Point Instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(core_fp_mmx_trans), .ngrp = 1, .umasks = core_fp_mmx_trans, }, { .name = "SIMD_ASSIST", .desc = "SIMD assists invoked", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "SIMD_INSTR_RETIRED", .desc = "SIMD Instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xce, }, { .name = "SIMD_SAT_INSTR_RETIRED", .desc = "Saturated arithmetic instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcf, }, { .name = "RAT_STALLS", .desc = "ROB read port stalls cycles", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd2, .numasks = LIBPFM_ARRAY_SIZE(core_rat_stalls), .ngrp = 1, .umasks = core_rat_stalls, }, { .name = "SEG_RENAME_STALLS", .desc = "Segment rename stalls - ES ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd4, .numasks = LIBPFM_ARRAY_SIZE(core_seg_rename_stalls), .ngrp = 1, .umasks = core_seg_rename_stalls, }, { .name = "SEG_REG_RENAMES", .desc = "Segment renames - ES", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd5, .numasks = LIBPFM_ARRAY_SIZE(core_seg_reg_renames), .ngrp = 1, .umasks = core_seg_reg_renames, }, { .name = "RESOURCE_STALLS", .desc = "Cycles during which the ROB is full", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xdc, .numasks = LIBPFM_ARRAY_SIZE(core_resource_stalls), .ngrp = 1, .umasks = core_resource_stalls, }, { .name = "BR_INST_DECODED", .desc = "Branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BOGUS_BR", .desc = "Bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "BACLEARS asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "PREF_RQSTS_UP", .desc = "Upward prefetches issued from the DPL", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf0, }, { .name = "PREF_RQSTS_DN", .desc = "Downward prefetches issued from the DPL", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf8, }, }; papi-5.3.0/src/libpfm4/lib/events/amd64_events_k7.h0000600003276200002170000001555712247131123021422 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Regenerated from previous version by: * * Copyright (c) 2006, 2007 Advanced Micro Devices, Inc. * Contributed by Ray Bryant * Contributed by Robert Richter * Modified for K7 by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: amd64_k7 (AMD64 K7) */ /* * Definitions taken from "AMD Athlon Processor x86 Code Optimization Guide" * Table 11 February 2002 */ static const amd64_umask_t amd64_k7_data_cache_refills[]={ { .uname = "L2_INVALID", .udesc = "Invalid line from L2", .ucode = 0x1, }, { .uname = "L2_SHARED", .udesc = "Shared-state line from L2", .ucode = 0x2, }, { .uname = "L2_EXCLUSIVE", .udesc = "Exclusive-state line from L2", .ucode = 0x4, }, { .uname = "L2_OWNED", .udesc = "Owned-state line from L2", .ucode = 0x8, }, { .uname = "L2_MODIFIED", .udesc = "Modified-state line from L2", .ucode = 0x10, }, { .uname = "ALL", .udesc = "Shared, Exclusive, Owned, Modified State Refills", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k7_data_cache_refills_from_system[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "Invalid, Shared, Exclusive, Owned, Modified", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_k7_pe[]={ { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2", .modmsk = AMD64_BASIC_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_k7_data_cache_refills), .ngrp = 1, .umasks = amd64_k7_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_SYSTEM", .desc = "Data Cache Refills from System", .modmsk = AMD64_BASIC_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_k7_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_k7_data_cache_refills_from_system, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_k7_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_k7_data_cache_refills_from_system, /* identical to actual umasks list for this event */ }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "L1 DTLB Miss and L2 DTLB Hit", .modmsk = AMD64_BASIC_ATTRS, .code = 0x45, }, { .name = "L1_DTLB_AND_L2_DTLB_MISS", .desc = "L1 DTLB and L2 DTLB Miss", .modmsk = AMD64_BASIC_ATTRS, .code = 0x46, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x47, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_BASIC_ATTRS, .code = 0x76, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_BASIC_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x81, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss and L2 ITLB Hit", .modmsk = AMD64_BASIC_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_BASIC_ATTRS, .code = 0x85, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions (includes exceptions, interrupts, resyncs)", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs (only non-control transfer branches)", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc7, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_BASIC_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcf, }, }; papi-5.3.0/src/libpfm4/lib/events/amd64_events_fam12h.h0000600003276200002170000014173712247131123022157 0ustar ralphundrgrad/* * Copyright (c) 2011 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: amd64_fam12h (AMD64 Fam12h) */ static const amd64_umask_t amd64_fam12h_dispatched_fpu[]={ { .uname = "OPS_ADD", .udesc = "Add pipe ops excluding load ops and SSE move ops", .ucode = 0x1, }, { .uname = "OPS_MULTIPLY", .udesc = "Multiply pipe ops excluding load ops and SSE move ops", .ucode = 0x2, }, { .uname = "OPS_STORE", .udesc = "Store pipe ops excluding load ops and SSE move ops", .ucode = 0x4, }, { .uname = "OPS_ADD_PIPE_LOAD_OPS", .udesc = "Add pipe load ops and SSE move ops", .ucode = 0x8, }, { .uname = "OPS_MULTIPLY_PIPE_LOAD_OPS", .udesc = "Multiply pipe load ops and SSE move ops", .ucode = 0x10, }, { .uname = "OPS_STORE_PIPE_LOAD_OPS", .udesc = "Store pipe load ops and SSE move ops", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_retired_sse_operations[]={ { .uname = "SINGLE_ADD_SUB_OPS", .udesc = "Single precision add/subtract ops", .ucode = 0x1, }, { .uname = "SINGLE_MUL_OPS", .udesc = "Single precision multiply ops", .ucode = 0x2, }, { .uname = "SINGLE_DIV_OPS", .udesc = "Single precision divide/square root ops", .ucode = 0x4, }, { .uname = "DOUBLE_ADD_SUB_OPS", .udesc = "Double precision add/subtract ops", .ucode = 0x8, }, { .uname = "DOUBLE_MUL_OPS", .udesc = "Double precision multiply ops", .ucode = 0x10, }, { .uname = "DOUBLE_DIV_OPS", .udesc = "Double precision divide/square root ops", .ucode = 0x20, }, { .uname = "OP_TYPE", .udesc = "Op type: 0=uops. 1=FLOPS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_retired_move_ops[]={ { .uname = "LOW_QW_MOVE_UOPS", .udesc = "Merging low quadword move uops", .ucode = 0x1, }, { .uname = "HIGH_QW_MOVE_UOPS", .udesc = "Merging high quadword move uops", .ucode = 0x2, }, { .uname = "ALL_OTHER_MERGING_MOVE_UOPS", .udesc = "All other merging move uops", .ucode = 0x4, }, { .uname = "ALL_OTHER_MOVE_UOPS", .udesc = "All other move uops", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_retired_serializing_ops[]={ { .uname = "SSE_BOTTOM_EXECUTING_UOPS", .udesc = "SSE bottom-executing uops retired", .ucode = 0x1, }, { .uname = "SSE_BOTTOM_SERIALIZING_UOPS", .udesc = "SSE bottom-serializing uops retired", .ucode = 0x2, }, { .uname = "X87_BOTTOM_EXECUTING_UOPS", .udesc = "X87 bottom-executing uops retired", .ucode = 0x4, }, { .uname = "X87_BOTTOM_SERIALIZING_UOPS", .udesc = "X87 bottom-serializing uops retired", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_fp_scheduler_cycles[]={ { .uname = "BOTTOM_EXECUTE_CYCLES", .udesc = "Number of cycles a bottom-execute uop is in the FP scheduler", .ucode = 0x1, }, { .uname = "BOTTOM_SERIALIZING_CYCLES", .udesc = "Number of cycles a bottom-serializing uop is in the FP scheduler", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_locked_ops[]={ { .uname = "EXECUTED", .udesc = "The number of locked instructions executed", .ucode = 0x1, }, { .uname = "CYCLES_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in speculative phase", .ucode = 0x2, }, { .uname = "CYCLES_NON_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in non-speculative phase (including cache miss penalty)", .ucode = 0x4, }, { .uname = "CYCLES_WAITING", .udesc = "The number of cycles waiting for a cache hit (cache miss penalty).", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_cancelled_store_to_load_forward_operations[]={ { .uname = "ADDRESS_MISMATCHES", .udesc = "Address mismatches (starting byte not the same).", .ucode = 0x1, }, { .uname = "STORE_IS_SMALLER_THAN_LOAD", .udesc = "Store is smaller than load.", .ucode = 0x2, }, { .uname = "MISALIGNED", .udesc = "Misaligned.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_data_cache_refills[]={ { .uname = "SYSTEM", .udesc = "Refill from the Northbridge", .ucode = 0x1, }, { .uname = "L2_SHARED", .udesc = "Shared-state line from L2", .ucode = 0x2, }, { .uname = "L2_EXCLUSIVE", .udesc = "Exclusive-state line from L2", .ucode = 0x4, }, { .uname = "L2_OWNED", .udesc = "Owned-state line from L2", .ucode = 0x8, }, { .uname = "L2_MODIFIED", .udesc = "Modified-state line from L2", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_data_cache_refills_from_northbridge[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_data_cache_lines_evicted[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "BY_PREFETCHNTA", .udesc = "Cache line evicted was brought into the cache with by a PrefetchNTA instruction.", .ucode = 0x20, }, { .uname = "NOT_BY_PREFETCHNTA", .udesc = "Cache line evicted was not brought into the cache with by a PrefetchNTA instruction.", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l1_dtlb_miss_and_l2_dtlb_hit[]={ { .uname = "L2_4K_TLB_HIT", .udesc = "L2 4K TLB hit", .ucode = 0x1, }, { .uname = "L2_2M_TLB_HIT", .udesc = "L2 2M TLB hit", .ucode = 0x2, }, { .uname = "L2_1G_TLB_HIT", .udesc = "L2 1G TLB hit", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l1_dtlb_and_l2_dtlb_miss[]={ { .uname = "4K_TLB_RELOAD", .udesc = "4K TLB reload", .ucode = 0x1, }, { .uname = "2M_TLB_RELOAD", .udesc = "2M TLB reload", .ucode = 0x2, }, { .uname = "1G_TLB_RELOAD", .udesc = "1G TLB reload", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_dcache_misses_by_locked_instructions[]={ { .uname = "DATA_CACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .udesc = "Data cache misses by locked instructions", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l1_dtlb_hit[]={ { .uname = "L1_4K_TLB_HIT", .udesc = "L1 4K TLB hit", .ucode = 0x1, }, { .uname = "L1_2M_TLB_HIT", .udesc = "L1 2M TLB hit", .ucode = 0x2, }, { .uname = "L1_1G_TLB_HIT", .udesc = "L1 1G TLB hit", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_ineffective_sw_prefetches[]={ { .uname = "SW_PREFETCH_HIT_IN_L1", .udesc = "Software prefetch hit in the L1.", .ucode = 0x1, }, { .uname = "SW_PREFETCH_HIT_IN_L2", .udesc = "Software prefetch hit in L2.", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x9, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to write-combining (WC) memory or WC buffer flushes to WB memory", .ucode = 0x2, }, { .uname = "CACHE_DISABLED", .udesc = "Requests to cache-disabled (CD) memory", .ucode = 0x4, }, { .uname = "STREAMING_STORE", .udesc = "Streaming store (SS) requests", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x87, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_data_prefetches[]={ { .uname = "CANCELLED", .udesc = "Cancelled prefetches", .ucode = 0x1, }, { .uname = "ATTEMPTED", .udesc = "Prefetch attempts", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_northbridge_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "DATA_ERROR", .udesc = "Data Error", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_octwords_written_to_system[]={ { .uname = "OCTWORD_WRITE_TRANSFER", .udesc = "Octword write transfer", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB fill (page table walks)", .ucode = 0x4, }, { .uname = "SNOOP", .udesc = "Tag snoop request", .ucode = 0x8, }, { .uname = "CANCELLED", .udesc = "Cancelled request", .ucode = 0x10, }, { .uname = "HW_PREFETCH_FROM_DC", .udesc = "Hardware prefetch from DC", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas EventSelect 041h does not)", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB page table walk", .ucode = 0x4, }, { .uname = "HW_PREFETCH_FROM_DC", .udesc = "Hardware prefetch from DC", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l2_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills (victims from L1 caches, TLB page table walks and data prefetches)", .ucode = 0x1, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_l1_itlb_miss_and_l2_itlb_miss[]={ { .uname = "4K_PAGE_FETCHES", .udesc = "Instruction fetches to a 4K page.", .ucode = 0x1, }, { .uname = "2M_PAGE_FETCHES", .udesc = "Instruction fetches to a 2M page.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_instruction_cache_lines_invalidated[]={ { .uname = "INVALIDATING_PROBE_NO_IN_FLIGHT", .udesc = "Invalidating probe that did not hit any in-flight instructions.", .ucode = 0x1, }, { .uname = "INVALIDATING_PROBE_ONE_OR_MORE_IN_FLIGHT", .udesc = "Invalidating probe that hit one or more in-flight instructions.", .ucode = 0x2, }, { .uname = "SMC_NO_INFLIGHT", .udesc = "SMC that did not hit any in-flight instructions.", .ucode = 0x4, }, { .uname = "SMC_INFLIGHT", .udesc = "SMC that hit one or more in-flight instructions.", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_retired_mmx_and_fp_instructions[]={ { .uname = "X87", .udesc = "X87 instructions", .ucode = 0x1, }, { .uname = "MMX_AND_3DNOW", .udesc = "MMX and 3DNow! instructions", .ucode = 0x2, }, { .uname = "SSE_AND_SSE2", .udesc = "SSE and SSE2 instructions", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_interrupt_events[]={ { .uname = "FIXED_AND_LPA", .udesc = "Fixed and LPA", .ucode = 0x1, }, { .uname = "LPA", .udesc = "LPA", .ucode = 0x2, }, { .uname = "SMI", .udesc = "SMI", .ucode = 0x4, }, { .uname = "NMI", .udesc = "NMI", .ucode = 0x8, }, { .uname = "INIT", .udesc = "INIT", .ucode = 0x10, }, { .uname = "STARTUP", .udesc = "STARTUP", .ucode = 0x20, }, { .uname = "INT", .udesc = "INT", .ucode = 0x40, }, { .uname = "EOI", .udesc = "EOI", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_sideband_signals[]={ { .uname = "STOPGRANT", .udesc = "STOPGRANT", .ucode = 0x2, }, { .uname = "SHUTDOWN", .udesc = "SHUTDOWN", .ucode = 0x4, }, { .uname = "WBINVD", .udesc = "WBINVD", .ucode = 0x8, }, { .uname = "INVD", .udesc = "INVD", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1e, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_fpu_exceptions[]={ { .uname = "X87_RECLASS_MICROFAULTS", .udesc = "X87 reclass microfaults", .ucode = 0x1, }, { .uname = "SSE_RETYPE_MICROFAULTS", .udesc = "SSE retype microfaults", .ucode = 0x2, }, { .uname = "SSE_RECLASS_MICROFAULTS", .udesc = "SSE reclass microfaults", .ucode = 0x4, }, { .uname = "SSE_AND_X87_MICROTRAPS", .udesc = "SSE and x87 microtraps", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_dram_accesses_page[]={ { .uname = "DCT0_HIT", .udesc = "DCT0 Page hit", .ucode = 0x1, }, { .uname = "DCT0_MISS", .udesc = "DCT0 Page Miss", .ucode = 0x2, }, { .uname = "DCT0_CONFLICT", .udesc = "DCT0 Page Conflict", .ucode = 0x4, }, { .uname = "DCT1_PAGE_HIT", .udesc = "DCT1 Page hit", .ucode = 0x8, }, { .uname = "DCT1_PAGE_MISS", .udesc = "DCT1 Page Miss", .ucode = 0x10, }, { .uname = "DCT1_PAGE_CONFLICT", .udesc = "DCT1 Page Conflict", .ucode = 0x20, }, { .uname = "WRITE_REQUEST", .udesc = "Write request.", .ucode = 0x40, }, { .uname = "READ_REQUEST", .udesc = "Read request.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_controller_page_table_events[]={ { .uname = "PAGE_TABLE_OVERFLOW", .udesc = "Page Table Overflow", .ucode = 0x1, }, { .uname = "STALE_TABLE_ENTRY_HITS", .udesc = "Number of stale table entry hits. (hit on a page closed too soon).", .ucode = 0x2, }, { .uname = "PAGE_TABLE_IDLE_CYCLE_LIMIT_INCREMENTED", .udesc = "Page table idle cycle limit incremented.", .ucode = 0x4, }, { .uname = "PAGE_TABLE_IDLE_CYCLE_LIMIT_DECREMENTED", .udesc = "Page table idle cycle limit decremented.", .ucode = 0x8, }, { .uname = "PAGE_TABLE_CLOSED_INACTIVITY", .udesc = "Page table is closed due to row inactivity.", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_controller_slot_misses[]={ { .uname = "DCT0_RBD", .udesc = "DCT0 RBD.", .ucode = 0x10, }, { .uname = "DCT1_RBD", .udesc = "DCT1 RBD.", .ucode = 0x20, }, { .uname = "DCT0_PREFETCH", .udesc = "DCT0 Prefetch.", .ucode = 0x40, }, { .uname = "DCT1_PREFETCH", .udesc = "DCT1 Prefetch.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf0, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_controller_turnarounds[]={ { .uname = "DCT0_READ_TO_WRITE", .udesc = "DCT0 read-to-write turnaround.", .ucode = 0x1, }, { .uname = "DCT0_WRITE_TO_READ", .udesc = "DCT0 write-to-read turnaround", .ucode = 0x2, }, { .uname = "DCT1_READ_TO_WRITE", .udesc = "DCT1 read-to-write turnaround.", .ucode = 0x8, }, { .uname = "DCT1_WRITE_TO_READ", .udesc = "DCT1 write-to-read turnaround", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1b, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_rbd_queue[]={ { .uname = "COUNTER_REACHED", .udesc = "D18F2x[1,0]94[DcqBypassMax] counter reached.", .ucode = 0x4, }, { .uname = "BANK_CLOSED", .udesc = "Bank is closed due to bank conflict with an outstanding request in the RBD queue.", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xc, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_thermal_status[]={ { .uname = "MEMHOT_L_ASSERTIONS", .udesc = "MEMHOT_L assertions.", .ucode = 0x1, }, { .uname = "HTC_TRANSITIONS", .udesc = "Number of times the HTC transitions from inactive to active.", .ucode = 0x4, }, { .uname = "CLOCKS_HTC_P_STATE_INACTIVE", .udesc = "Number of clocks HTC P-state is inactive.", .ucode = 0x20, }, { .uname = "CLOCKS_HTC_P_STATE_ACTIVE", .udesc = "Number of clocks HTC P-state is active", .ucode = 0x40, }, { .uname = "PROCHOT_L_ASSERTIONS", .udesc = "PROCHOT_L asserted by an external source and the assertion causes a P-state change.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xe5, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_cpu_io_requests_to_memory_io[]={ { .uname = "I_O_TO_I_O", .udesc = "IO to IO", .ucode = 0x1, }, { .uname = "I_O_TO_MEM", .udesc = "IO to Mem", .ucode = 0x2, }, { .uname = "CPU_TO_I_O", .udesc = "CPU to IO", .ucode = 0x4, }, { .uname = "CPU_TO_MEM", .udesc = "CPU to Mem", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x0f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_cache_block[]={ { .uname = "VICTIM_WRITEBACK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "DCACHE_LOAD_MISS", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "SHARED_ICACHE_REFILL", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "READ_TO_DIRTY", .udesc = "Change-to-Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_sized_commands[]={ { .uname = "NON_POSTED_WRITE_BYTE", .udesc = "Non-Posted SzWr Byte (1-32 bytes) Legacy or mapped IO, typically 1-4 bytes", .ucode = 0x1, }, { .uname = "NON_POSTED_WRITE_DWORD", .udesc = "Non-Posted SzWr DW (1-16 dwords) Legacy or mapped IO, typically 1 DWORD", .ucode = 0x2, }, { .uname = "POSTED_WRITE_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes) Subcache-line DMA writes, size varies; also flushes of partially-filled Write Combining buffer", .ucode = 0x4, }, { .uname = "POSTED_WRITE_DWORD", .udesc = "Posted SzWr DW (1-16 dwords) Block-oriented DMA writes, often cache-line sized; also processor Write Combining buffer flushes", .ucode = 0x8, }, { .uname = "READ_BYTE_4_BYTES", .udesc = "SzRd Byte (4 bytes) Legacy or mapped IO", .ucode = 0x10, }, { .uname = "READ_DWORD_1_16_DWORDS", .udesc = "SzRd DW (1-16 dwords) Block-oriented DMA reads, typically cache-line size", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_probe[]={ { .uname = "MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "HIT_DIRTY_NO_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_HIGH_PRIORITY_READS", .udesc = "Upstream high priority reads.", .ucode = 0x10, }, { .uname = "UPSTREAM_LOW_PRIORITY_READS", .udesc = "Upstream low priority reads.", .ucode = 0x20, }, { .uname = "UPSTREAM_LOW_PRIORITY_WRITES", .udesc = "Upstream low priority writes.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xbf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_dev[]={ { .uname = "DEV_HIT", .udesc = "DEV hit", .ucode = 0x10, }, { .uname = "DEV_MISS", .udesc = "DEV miss", .ucode = 0x20, }, { .uname = "DEV_ERROR", .udesc = "DEV error", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x70, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_memory_controller_requests[]={ { .uname = "32_BYTES_WRITES", .udesc = "32 Bytes Sized Writes", .ucode = 0x8, }, { .uname = "64_BYTES_WRITES", .udesc = "64 Bytes Sized Writes", .ucode = 0x10, }, { .uname = "32_BYTES_READS", .udesc = "32 Bytes Sized Reads", .ucode = 0x20, }, { .uname = "64_BYTES_READS", .udesc = "64 Byte Sized Reads", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x78, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_page_size_mismatches[]={ { .uname = "GUEST_LARGER", .udesc = "Guest page size is larger than the host page size.", .ucode = 0x1, }, { .uname = "MTRR_MISMATCH", .udesc = "MTRR mismatch.", .ucode = 0x2, }, { .uname = "HOST_LARGER", .udesc = "Host page size is larger than the guest page size.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam12h_retired_x87_ops[]={ { .uname = "ADD_SUB_OPS", .udesc = "Add/subtract ops", .ucode = 0x1, }, { .uname = "MUL_OPS", .udesc = "Multiply ops", .ucode = 0x2, }, { .uname = "DIV_OPS", .udesc = "Divide ops", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_fam12h_pe[]={ { .name = "DISPATCHED_FPU", .desc = "Dispatched FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_dispatched_fpu), .ngrp = 1, .umasks = amd64_fam12h_dispatched_fpu, }, { .name = "CYCLES_NO_FPU_OPS_RETIRED", .desc = "Cycles in which the FPU is Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1, }, { .name = "DISPATCHED_FPU_OPS_FAST_FLAG", .desc = "Dispatched Fast Flag FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2, }, { .name = "RETIRED_SSE_OPERATIONS", .desc = "Retired SSE Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_retired_sse_operations), .ngrp = 1, .umasks = amd64_fam12h_retired_sse_operations, }, { .name = "RETIRED_MOVE_OPS", .desc = "Retired Move Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_retired_move_ops), .ngrp = 1, .umasks = amd64_fam12h_retired_move_ops, }, { .name = "RETIRED_SERIALIZING_OPS", .desc = "Retired Serializing Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_retired_serializing_ops), .ngrp = 1, .umasks = amd64_fam12h_retired_serializing_ops, }, { .name = "FP_SCHEDULER_CYCLES", .desc = "Number of Cycles that a Serializing uop is in the FP Scheduler", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_fp_scheduler_cycles), .ngrp = 1, .umasks = amd64_fam12h_fp_scheduler_cycles, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_segment_register_loads), .ngrp = 1, .umasks = amd64_fam12h_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline Restart Due to Self-Modifying Code", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline Restart Due to Probe Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x22, }, { .name = "LS_BUFFER_2_FULL_CYCLES", .desc = "LS Buffer 2 Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x23, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_locked_ops), .ngrp = 1, .umasks = amd64_fam12h_locked_ops, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x27, }, { .name = "CANCELLED_STORE_TO_LOAD_FORWARD_OPERATIONS", .desc = "Cancelled Store to Load Forward Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_cancelled_store_to_load_forward_operations), .ngrp = 1, .umasks = amd64_fam12h_cancelled_store_to_load_forward_operations, }, { .name = "SMIS_RECEIVED", .desc = "SMIs Received", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2b, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2 or Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_data_cache_refills), .ngrp = 1, .umasks = amd64_fam12h_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_SYSTEM", .desc = "Data Cache Refills from the Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_data_cache_refills_from_northbridge), .ngrp = 1, .umasks = amd64_fam12h_data_cache_refills_from_northbridge, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_data_cache_lines_evicted), .ngrp = 1, .umasks = amd64_fam12h_data_cache_lines_evicted, }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "L1 DTLB Miss and L2 DTLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x45, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l1_dtlb_miss_and_l2_dtlb_hit), .ngrp = 1, .umasks = amd64_fam12h_l1_dtlb_miss_and_l2_dtlb_hit, }, { .name = "L1_DTLB_AND_L2_DTLB_MISS", .desc = "L1 DTLB and L2 DTLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x46, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l1_dtlb_and_l2_dtlb_miss), .ngrp = 1, .umasks = amd64_fam12h_l1_dtlb_and_l2_dtlb_miss, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x47, }, { .name = "MICROARCHITECTURAL_LATE_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Late Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x48, }, { .name = "MICROARCHITECTURAL_EARLY_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Early Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x49, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_fam12h_prefetch_instructions_dispatched, }, { .name = "DCACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .desc = "DCACHE Misses by Locked Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_dcache_misses_by_locked_instructions), .ngrp = 1, .umasks = amd64_fam12h_dcache_misses_by_locked_instructions, }, { .name = "L1_DTLB_HIT", .desc = "L1 DTLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l1_dtlb_hit), .ngrp = 1, .umasks = amd64_fam12h_l1_dtlb_hit, }, { .name = "INEFFECTIVE_SW_PREFETCHES", .desc = "Ineffective Software Prefetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x52, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_ineffective_sw_prefetches), .ngrp = 1, .umasks = amd64_fam12h_ineffective_sw_prefetches, }, { .name = "GLOBAL_TLB_FLUSHES", .desc = "Global TLB Flushes", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x54, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_requests), .ngrp = 1, .umasks = amd64_fam12h_memory_requests, }, { .name = "DATA_PREFETCHES", .desc = "Data Prefetcher", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_data_prefetches), .ngrp = 1, .umasks = amd64_fam12h_data_prefetches, }, { .name = "NORTHBRIDGE_READ_RESPONSES", .desc = "Northbridge Read Responses by Coherency State", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_northbridge_read_responses), .ngrp = 1, .umasks = amd64_fam12h_northbridge_read_responses, }, { .name = "OCTWORDS_WRITTEN_TO_SYSTEM", .desc = "Octwords Written to System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_octwords_written_to_system), .ngrp = 1, .umasks = amd64_fam12h_octwords_written_to_system, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x76, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_requests_to_l2), .ngrp = 1, .umasks = amd64_fam12h_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l2_cache_miss), .ngrp = 1, .umasks = amd64_fam12h_l2_cache_miss, }, { .name = "L2_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l2_fill_writeback), .ngrp = 1, .umasks = amd64_fam12h_l2_fill_writeback, }, { .name = "PAGE_SIZE_MISMATCHES", .desc = "Page Size Mismatches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x165, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_page_size_mismatches), .ngrp = 1, .umasks = amd64_fam12h_page_size_mismatches, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss and L2 ITLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_l1_itlb_miss_and_l2_itlb_miss), .ngrp = 1, .umasks = amd64_fam12h_l1_itlb_miss_and_l2_itlb_miss, }, { .name = "PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE", .desc = "Pipeline Restart Due to Instruction Stream Probe", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x86, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x89, }, { .name = "INSTRUCTION_CACHE_VICTIMS", .desc = "Instruction Cache Victims", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8b, }, { .name = "INSTRUCTION_CACHE_LINES_INVALIDATED", .desc = "Instruction Cache Lines Invalidated", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_instruction_cache_lines_invalidated), .ngrp = 1, .umasks = amd64_fam12h_instruction_cache_lines_invalidated, }, { .name = "ITLB_RELOADS", .desc = "ITLB Reloads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x99, }, { .name = "ITLB_RELOADS_ABORTED", .desc = "ITLB Reloads Aborted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x9a, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xca, }, { .name = "RETIRED_MMX_AND_FP_INSTRUCTIONS", .desc = "Retired MMX/FP Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_retired_mmx_and_fp_instructions), .ngrp = 1, .umasks = amd64_fam12h_retired_mmx_and_fp_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcf, }, { .name = "DECODER_EMPTY", .desc = "Decoder Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd0, }, { .name = "DISPATCH_STALLS", .desc = "Dispatch Stalls", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd1, }, { .name = "DISPATCH_STALL_FOR_BRANCH_ABORT", .desc = "Dispatch Stall for Branch Abort to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd2, }, { .name = "DISPATCH_STALL_FOR_SERIALIZATION", .desc = "Dispatch Stall for Serialization", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd3, }, { .name = "DISPATCH_STALL_FOR_SEGMENT_LOAD", .desc = "Dispatch Stall for Segment Load", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd4, }, { .name = "DISPATCH_STALL_FOR_REORDER_BUFFER_FULL", .desc = "Dispatch Stall for Reorder Buffer Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd5, }, { .name = "DISPATCH_STALL_FOR_RESERVATION_STATION_FULL", .desc = "Dispatch Stall for Reservation Station Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd6, }, { .name = "DISPATCH_STALL_FOR_FPU_FULL", .desc = "Dispatch Stall for FPU Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd7, }, { .name = "DISPATCH_STALL_FOR_LS_FULL", .desc = "Dispatch Stall for LS Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd8, }, { .name = "DISPATCH_STALL_WAITING_FOR_ALL_QUIET", .desc = "Dispatch Stall Waiting for All Quiet", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd9, }, { .name = "DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RSYNC", .desc = "Dispatch Stall for Far Transfer or Resync to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xda, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_fpu_exceptions), .ngrp = 1, .umasks = amd64_fam12h_fpu_exceptions, }, { .name = "DR0_BREAKPOINT_MATCHES", .desc = "DR0 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINT_MATCHES", .desc = "DR1 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINT_MATCHES", .desc = "DR2 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINT_MATCHES", .desc = "DR3 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdf, }, { .name = "RETIRED_X87_OPS", .desc = "Retired x87 Floating Point Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1c0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_retired_x87_ops), .ngrp = 1, .umasks = amd64_fam12h_retired_x87_ops, }, { .name = "LFENCE_INST_RETIRED", .desc = "LFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d3, }, { .name = "SFENCE_INST_RETIRED", .desc = "SFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d4, }, { .name = "MFENCE_INST_RETIRED", .desc = "MFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d5, }, { .name = "DRAM_ACCESSES_PAGE", .desc = "DRAM Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_dram_accesses_page), .ngrp = 1, .umasks = amd64_fam12h_dram_accesses_page, }, { .name = "MEMORY_CONTROLLER_0_PAGE", .desc = "DRAM Controller 0 Page Table Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_controller_page_table_events), .ngrp = 1, .umasks = amd64_fam12h_memory_controller_page_table_events, }, { .name = "MEMORY_CONTROLLER_SLOT_MISSES", .desc = "Memory Controller DRAM Command Slots Missed", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_controller_slot_misses), .ngrp = 1, .umasks = amd64_fam12h_memory_controller_slot_misses, }, { .name = "MEMORY_CONTROLLER_TURNAROUNDS", .desc = "Memory Controller Turnarounds", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_controller_turnarounds), .ngrp = 1, .umasks = amd64_fam12h_memory_controller_turnarounds, }, { .name = "MEMORY_CONTROLLER_RBD_QUEUE", .desc = "Memory Controller RBD Queue Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_rbd_queue), .ngrp = 1, .umasks = amd64_fam12h_memory_rbd_queue, }, { .name = "MEMORY_CONTROLLER_1_PAGE", .desc = "DRAM Controller 1 Page Table Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_controller_page_table_events), .ngrp = 1, .umasks = amd64_fam12h_memory_controller_page_table_events, }, { .name = "THERMAL_STATUS", .desc = "Thermal Status", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_thermal_status), .ngrp = 1, .umasks = amd64_fam12h_thermal_status, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_fam12h_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK", .desc = "Cache Block Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_cache_block), .ngrp = 1, .umasks = amd64_fam12h_cache_block, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_sized_commands), .ngrp = 1, .umasks = amd64_fam12h_sized_commands, }, { .name = "PROBE", .desc = "Probe Responses and Upstream Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_probe), .ngrp = 1, .umasks = amd64_fam12h_probe, }, { .name = "DEV", .desc = "DEV Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_dev), .ngrp = 1, .umasks = amd64_fam12h_dev, }, { .name = "MEMORY_CONTROLLER_REQUESTS", .desc = "Memory Controller Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1f0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_memory_controller_requests), .ngrp = 1, .umasks = amd64_fam12h_memory_controller_requests, }, { .name = "SIDEBAND_SIGNALS", .desc = "Sideband Signals and Special Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_sideband_signals), .ngrp = 1, .umasks = amd64_fam12h_sideband_signals, }, { .name = "Interrupt Events", .desc = "Interrupt Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1ea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam12h_interrupt_events), .ngrp = 1, .umasks = amd64_fam12h_interrupt_events, }, }; papi-5.3.0/src/libpfm4/lib/events/sparc_ultra4plus_events.h0000600003276200002170000004364312247131124023413 0ustar ralphundrgradstatic const sparc_entry_t ultra4plus_pe[] = { /* These two must always be first. */ { .name = "Cycle_cnt", .desc = "Accumulated cycles", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x1, }, /* PIC0 UltraSPARC-IV+ events */ { .name = "Dispatch0_IC_miss", .desc = "I-buffer is empty from I-Cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, { .name = "IU_stat_jmp_correct_pred", .desc = "Retired non-annulled register indirect jumps predicted correctly", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "Dispatch0_2nd_br", .desc = "Stall cycles due to having two branch instructions line-up in one 4-instruction group causing the second branch in the group to be re-fetched, delaying it's entrance into the I-buffer", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "Rstall_storeQ", .desc = "R-stage stall for a store instruction which is the next instruction to be executed, but it stailled due to the store queue being full", .ctrl = PME_CTRL_S0, .code = 0x5, }, { .name = "Rstall_IU_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding integer instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S0, .code = 0x6, }, { .name = "IU_stat_ret_correct_pred", .desc = "Retired non-annulled returns predicted correctly", .ctrl = PME_CTRL_S0, .code = 0x7, }, { .name = "IC_ref", .desc = "I-cache refrences", .ctrl = PME_CTRL_S0, .code = 0x8, }, { .name = "DC_rd", .desc = "D-cache read references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0x9, }, { .name = "Rstall_FP_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding floating-point instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S0, .code = 0xa, }, { .name = "SW_pf_instr", .desc = "Retired SW prefetch instructions", .ctrl = PME_CTRL_S0, .code = 0xb, }, { .name = "L2_ref", .desc = "L2-cache references", .ctrl = PME_CTRL_S0, .code = 0xc, }, { .name = "L2_write_hit_RTO", .desc = "L2-cache exclusive requests that hit L2-cache in S, O, or Os state and thus, do a read-to-own bus transaction", .ctrl = PME_CTRL_S0, .code = 0xd, }, { .name = "L2_snoop_inv_sh", .desc = "L2 cache lines that were written back to the L3 cache due to requests from both cores", .ctrl = PME_CTRL_S0, .code = 0xe, }, { .name = "L2_rd_miss", .desc = "L2-cache miss events (including atomics) from D-cache events", .ctrl = PME_CTRL_S0, .code = 0xf, }, { .name = "PC_rd", .desc = "P-cache cacheable loads", .ctrl = PME_CTRL_S0, .code = 0x10, }, { .name = "SI_snoop_sh", .desc = "Counts snoops from remote processor(s) including RTS, RTSR, RTO, RTOR, RS, RSR, RTSM, and WS", .ctrl = PME_CTRL_S0, .code = 0x11, }, { .name = "SI_ciq_flow_sh", .desc = "Counts system clock cycles when the flow control (PauseOut) signal is asserted", .ctrl = PME_CTRL_S0, .code = 0x12, }, { .name = "Re_DC_miss", .desc = "Stall due to loads that miss D-cache and get recirculated", .ctrl = PME_CTRL_S0, .code = 0x13, }, { .name = "SW_count_NOP0", .desc = "Retired, non-annulled special software NOP instructions (which is equivalent to 'sethi %hi(0xfc000), %g0' instruction)", .ctrl = PME_CTRL_S0, .code = 0x14, }, { .name = "IU_Stat_Br_miss_taken", .desc = "Retired branches that were predicted to be taken, but in fact were not taken", .ctrl = PME_CTRL_S0, .code = 0x15, }, { .name = "IU_Stat_Br_Count_taken", .desc = "Retired taken branches", .ctrl = PME_CTRL_S0, .code = 0x16, }, { .name = "HW_pf_exec", .desc = "Hardware prefetches enqueued in the prefetch queue", .ctrl = PME_CTRL_S0, .code = 0x17, }, { .name = "FA_pipe_completion", .desc = "Instructions that complete execution on the FPG ALU pipelines", .ctrl = PME_CTRL_S0, .code = 0x18, }, { .name = "SSM_L3_wb_remote", .desc = "L3 cache line victimizations from this core which generate R_WB transactions to non-LPA (remote physical address) regions", .ctrl = PME_CTRL_S0, .code = 0x19, }, { .name = "SSM_L3_miss_local", .desc = "L3 cache misses to LPA (local physical address) from this core which generate an RTS, RTO, or RS transaction", .ctrl = PME_CTRL_S0, .code = 0x1a, }, { .name = "SSM_L3_miss_mtag_remote", .desc = "L3 cache misses to LPA (local physical address) from this core which generate retry (R_*) transactions including R_RTS, R_RTO, and R_RS", .ctrl = PME_CTRL_S0, .code = 0x1b, }, { .name = "SW_pf_str_trapped", .desc = "Strong software prefetch instructions trapping due to TLB miss", .ctrl = PME_CTRL_S0, .code = 0x1c, }, { .name = "SW_pf_PC_installed", .desc = "Software prefetch instructions that installed lines in the P-cache", .ctrl = PME_CTRL_S0, .code = 0x1d, }, { .name = "IPB_to_IC_fill", .desc = "I-cache filles from the instruction prefetch buffer", .ctrl = PME_CTRL_S0, .code = 0x1e, }, { .name = "L2_write_miss", .desc = "L2-cache misses from this core by cacheable store requests", .ctrl = PME_CTRL_S0, .code = 0x1f, }, { .name = "MC_reads_0_sh", .desc = "Read requests completed to memory bank 0", .ctrl = PME_CTRL_S0, .code = 0x20, }, { .name = "MC_reads_1_sh", .desc = "Read requests completed to memory bank 1", .ctrl = PME_CTRL_S0, .code = 0x21, }, { .name = "MC_reads_2_sh", .desc = "Read requests completed to memory bank 2", .ctrl = PME_CTRL_S0, .code = 0x22, }, { .name = "MC_reads_3_sh", .desc = "Read requests completed to memory bank 3", .ctrl = PME_CTRL_S0, .code = 0x23, }, { .name = "MC_stalls_0_sh", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 0 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x24, }, { .name = "MC_stalls_2_sh", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 2 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x25, }, { .name = "L2_hit_other_half", .desc = "L2 cache hits from this core to the ways filled by the other core when the cache is in the pseudo-split mode", .ctrl = PME_CTRL_S0, .code = 0x26, }, { .name = "L3_rd_miss", .desc = "L3 cache misses sent out to SIU from this code by cacheable I-cache, D-cache, PO-cache, and W-cache (excluding block store) requests", .ctrl = PME_CTRL_S0, .code = 0x28, }, { .name = "Re_L2_miss", .desc = "Stall cycles due to recirculation of cacheable loads that miss both D-cache and L2 cache", .ctrl = PME_CTRL_S0, .code = 0x29, }, { .name = "IC_miss_cancelled", .desc = "I-cache miss requests cancelled due to new fetch stream", .ctrl = PME_CTRL_S0, .code = 0x2a, }, { .name = "DC_wr_miss", .desc = "D-cache store accesses that miss D-cache", .ctrl = PME_CTRL_S0, .code = 0x2b, }, { .name = "L3_hit_I_state_sh", .desc = "Tag hits in L3 cache when the line is in I state", .ctrl = PME_CTRL_S0, .code = 0x2c, }, { .name = "SI_RTS_src_data", .desc = "Local RTS transactions due to I-cache, D-cache, or P-cache requests from this core where data is from the cache of another processor on the system, not from memory", .ctrl = PME_CTRL_S0, .code = 0x2d, }, { .name = "L2_IC_miss", .desc = "L2 cache misses from this code by cacheable I-cache requests", .ctrl = PME_CTRL_S0, .code = 0x2e, }, { .name = "SSM_new_transaction_sh", .desc = "New SSM transactions (RTSU, RTOU, UGM) observed by this processor on the Fireplane Interconnect", .ctrl = PME_CTRL_S0, .code = 0x2f, }, { .name = "L2_SW_pf_miss", .desc = "L2 cache misses by software prefetch requests from this core", .ctrl = PME_CTRL_S0, .code = 0x30, }, { .name = "L2_wb", .desc = "L2 cache lines that were written back to the L3 cache because of requests from this core", .ctrl = PME_CTRL_S0, .code = 0x31, }, { .name = "L2_wb_sh", .desc = "L2 cache lines that were written back to the L3 cache because of requests from both cores", .ctrl = PME_CTRL_S0, .code = 0x32, }, { .name = "L2_snoop_cb_sh", .desc = "L2 cache lines that were copied back due to other processors", .ctrl = PME_CTRL_S0, .code = 0x33, }, /* PIC1 UltraSPARC-IV+ events */ { .name = "Dispatch0_other", .desc = "Stall cycles due to the event that no instructions are dispatched because the I-queue is empty due to various other events, including branch target address fetch and various events which cause an instruction to be refetched", .ctrl = PME_CTRL_S1, .code = 0x2, }, { .name = "DC_wr", .desc = "D-cache write references by cacheable stores (excluding block stores)", .ctrl = PME_CTRL_S1, .code = 0x3, }, { .name = "Re_DC_missovhd", .desc = "Stall cycles due to D-cache load miss", .ctrl = PME_CTRL_S1, .code = 0x4, }, { .name = "Re_FPU_bypass", .desc = "Stall due to recirculation when an FPU bypass condition that does not have a direct bypass path occurs", .ctrl = PME_CTRL_S1, .code = 0x5, }, { .name = "L3_write_hit_RTO", .desc = "L3 cache hits in O, Os, or S state by cacheable store requests from this core that do a read-to-own (RTO) bus transaction", .ctrl = PME_CTRL_S1, .code = 0x6, }, { .name = "L2L3_snoop_inv_sh", .desc = "L2 and L3 cache lines that were invalidated due to other processors doing RTO, RTOR, RTOU, or WS transactions", .ctrl = PME_CTRL_S1, .code = 0x7, }, { .name = "IC_L2_req", .desc = "I-cache requests sent to L2 cache", .ctrl = PME_CTRL_S1, .code = 0x8, }, { .name = "DC_rd_miss", .desc = "Cacheable loads (excluding atomics and block loads) that miss D-cache as well as P-cache (for FP loads)", .ctrl = PME_CTRL_S1, .code = 0x9, }, { .name = "L2_hit_I_state_sh", .desc = "Tag hits in L2 cache when the line is in I state", .ctrl = PME_CTRL_S1, .code = 0xa, }, { .name = "L3_write_miss_RTO", .desc = "L3 cache misses from this core by cacheable store requests that do a read-to-own (RTO) bus transaction. This count does not include RTO requests for prefetch (fcn=2,3/22,23) instructions", .ctrl = PME_CTRL_S1, .code = 0xb, }, { .name = "L2_miss", .desc = "L2 cache misses from this core by cacheable I-cache, D-cache, P-cache, and W-cache (excluding block stores) requests", .ctrl = PME_CTRL_S1, .code = 0xc, }, { .name = "SI_owned_sh", .desc = "Number of times owned_in is asserted on bus requests from the local processor", .ctrl = PME_CTRL_S1, .code = 0xd, }, { .name = "SI_RTO_src_data", .desc = "Number of local RTO transactions due to W-cache or P-cache requests from this core where data is from the cache of another processor on the system, not from memory", .ctrl = PME_CTRL_S1, .code = 0xe, }, { .name = "SW_pf_duplicate", .desc = "Number of software prefetch instructions that were dropped because the prefetch request matched an outstanding requests in the prefetch queue or the request hit the P-cache", .ctrl = PME_CTRL_S1, .code = 0xf, }, { .name = "IU_stat_jmp_mispred", .desc = "Number of retired non-annulled register indirect jumps mispredicted", .ctrl = PME_CTRL_S1, .code = 0x10, }, { .name = "ITLB_miss", .desc = "I-TLB misses", .ctrl = PME_CTRL_S1, .code = 0x11, }, { .name = "DTLB_miss", .desc = "D-TLB misses", .ctrl = PME_CTRL_S1, .code = 0x12, }, { .name = "WC_miss", .desc = "W-cache misses", .ctrl = PME_CTRL_S1, .code = 0x13, }, { .name = "IC_fill", .desc = "Number of I-cache fills excluding fills from the instruction prefetch buffer. This is the best approximation of the number of I-cache misses for instructions that were actually executed", .ctrl = PME_CTRL_S1, .code = 0x14, }, { .name = "IU_stat_ret_mispred", .desc = "Number of retired non-annulled returns mispredicted", .ctrl = PME_CTRL_S1, .code = 0x15, }, { .name = "Re_L3_miss", .desc = "Stall cycles due to recirculation of cacheable loads that miss D-cache, L2, and L3 cache", .ctrl = PME_CTRL_S1, .code = 0x16, }, { .name = "Re_PFQ_full", .desc = "Stall cycles due to recirculation of prefetch instructions because the prefetch queue (PFQ) was full", .ctrl = PME_CTRL_S1, .code = 0x17, }, { .name = "PC_soft_hit", .desc = "Number of cacheable FP loads that hit a P-cache line that was prefetched by a software prefetch instruction", .ctrl = PME_CTRL_S1, .code = 0x18, }, { .name = "PC_inv", .desc = "Number of P-cache lines that were invalidated due to external snoops, internal stores, and L2 evictions", .ctrl = PME_CTRL_S1, .code = 0x19, }, { .name = "PC_hard_hit", .desc = "Number of FP loads that hit a P-cache line that was fetched by a FP load or a hardware prefetch, irrespective of whether the loads hit or miss the D-cache", .ctrl = PME_CTRL_S1, .code = 0x1a, }, { .name = "IC_pf", .desc = "Number of I-cache prefetch requests sent to L2 cache", .ctrl = PME_CTRL_S1, .code = 0x1b, }, { .name = "SW_count_NOP1", .desc = "Retired, non-annulled special software NOP instructions (which is equivalent to 'sethi %hi(0xfc000), %g0' instruction)", .ctrl = PME_CTRL_S1, .code = 0x1c, }, { .name = "IU_stat_br_miss_untaken", .desc = "Number of retired non-annulled conditional branches that were predicted to be not taken, but in fact were taken", .ctrl = PME_CTRL_S1, .code = 0x1d, }, { .name = "IU_stat_br_count_taken", .desc = "Number of retired non-annulled conditional branches that were taken", .ctrl = PME_CTRL_S1, .code = 0x1e, }, { .name = "PC_miss", .desc = "Number of cacheable FP loads that miss P-cache, irrespective of whether the loads hit or miss the D-cache", .ctrl = PME_CTRL_S1, .code = 0x1f, }, { .name = "MC_writes_0_sh", .desc = "Number of write requests complete to memory bank 0", .ctrl = PME_CTRL_S1, .code = 0x20, }, { .name = "MC_writes_1_sh", .desc = "Number of write requests complete to memory bank 1", .ctrl = PME_CTRL_S1, .code = 0x21, }, { .name = "MC_writes_2_sh", .desc = "Number of write requests complete to memory bank 2", .ctrl = PME_CTRL_S1, .code = 0x22, }, { .name = "MC_writes_3_sh", .desc = "Number of write requests complete to memory bank 3", .ctrl = PME_CTRL_S1, .code = 0x23, }, { .name = "MC_stalls_1_sh", .desc = "Number of processor cycles that requests were stalled in the MCU queues because bank 0 was busy with a previous requests", .ctrl = PME_CTRL_S1, .code = 0x24, }, { .name = "MC_stalls_3_sh", .desc = "Number of processor cycles that requests were stalled in the MCU queues because bank 3 was busy with a previous requests", .ctrl = PME_CTRL_S1, .code = 0x25, }, { .name = "Re_RAW_miss", .desc = "Stall cycles due to recirculation when there is a load instruction in the E-stage of the pipeline which has a non-bypassable read-after-write (RAW) hazard with an earlier store instruction", .ctrl = PME_CTRL_S1, .code = 0x26, }, { .name = "FM_pipe_completion", .desc = "Number of retired instructions that complete execution on the FLoat-Point/Graphics Multiply pipeline", .ctrl = PME_CTRL_S1, .code = 0x27, }, { .name = "SSM_L3_miss_mtag_remote", .desc = "Number of L3 cache misses to LPA (local physical address) from this core which generate retry (R_*) transactions including R_RTS, R_RTO, and R_RS", .ctrl = PME_CTRL_S1, .code = 0x28, }, { .name = "SSM_L3_miss_remote", .desc = "Number of L3 cache misses from this core which generate retry (R_*) transactions to non-LPA (non-local physical address) address space, or R_WS transactions due to block store (BST) / block store commit (BSTC) to any address space (LPA or non-LPA), or R_RTO due to atomic request on Os state to LPA space.", .ctrl = PME_CTRL_S1, .code = 0x29, }, { .name = "SW_pf_exec", .desc = "Number of retired, non-trapping software prefetch instructions that completed, i.e. number of retired prefetch instructions that were not dropped due to the prefecth queue being full", .ctrl = PME_CTRL_S1, .code = 0x2a, }, { .name = "SW_pf_str_exec", .desc = "Number of retired, non-trapping strong prefetch instructions that completed", .ctrl = PME_CTRL_S1, .code = 0x2b, }, { .name = "SW_pf_dropped", .desc = "Number of software prefetch instructions dropped due to TLB miss or due to the prefetch queue being full", .ctrl = PME_CTRL_S1, .code = 0x2c, }, { .name = "SW_pf_L2_installed", .desc = "Number of software prefetch instructions that installed lines in the L2 cache", .ctrl = PME_CTRL_S1, .code = 0x2d, }, { .name = "L2_HW_pf_miss", .desc = "Number of L2 cache misses by hardware prefetch requests from this core", .ctrl = PME_CTRL_S1, .code = 0x2f, }, { .name = "L3_miss", .desc = "Number of L3 cache misses sent out to SIU from this core by cacheable I-cache, D-cache, P-cache, and W-cache (exclusing block stores) requests", .ctrl = PME_CTRL_S1, .code = 0x31, }, { .name = "L3_IC_miss", .desc = "Number of L3 cache misses by cacheable I-cache requests from this core", .ctrl = PME_CTRL_S1, .code = 0x32, }, { .name = "L3_SW_pf_miss", .desc = "Number of L3 cache misses by software prefetch requests from this core", .ctrl = PME_CTRL_S1, .code = 0x33, }, { .name = "L3_hit_other_half", .desc = "Number of L3 cache hits from this core to the ways filled by the other core when the cache is in pseudo-split mode", .ctrl = PME_CTRL_S1, .code = 0x34, }, { .name = "L3_wb", .desc = "Number of L3 cache lines that were written back because of requests from this core", .ctrl = PME_CTRL_S1, .code = 0x35, }, { .name = "L3_wb_sh", .desc = "Number of L3 cache lines that were written back because of requests from both cores", .ctrl = PME_CTRL_S1, .code = 0x36, }, { .name = "L2L3_snoop_cb_sh", .desc = "Total number of L2 and L3 cache lines that were copied back due to other processors", .ctrl = PME_CTRL_S1, .code = 0x37, }, }; #define PME_SPARC_ULTRA4PLUS_EVENT_COUNT (sizeof(ultra4plus_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_atom_events.h0000600003276200002170000011174212247131123022232 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: atom (Intel Atom) */ static const intel_x86_umask_t atom_l2_reject_busq[]={ { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 0, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 0, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 0, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .grpid = 1, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 2, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .grpid = 2, }, }; static const intel_x86_umask_t atom_icache[]={ { .uname = "ACCESSES", .udesc = "Instruction fetches, including uncacheacble fetches", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISSES", .udesc = "Count all instructions fetches that miss tha icache or produce memory requests. This includes uncacheache fetches. Any instruction fetch miss is counted only once and not once for every cycle it is outstanding", .ucode = 0x200, }, }; static const intel_x86_umask_t atom_l2_lock[]={ { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 0, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 0, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 0, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .grpid = 1, }, }; static const intel_x86_umask_t atom_uops_retired[]={ { .uname = "ANY", .udesc = "Micro-ops retired", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "STALLED_CYCLES", .udesc = "Cycles no micro-ops retired", .ucode = 0x1000 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALLS", .udesc = "Periods no micro-ops retired", .ucode = 0x1000 | INTEL_X86_MOD_EDGE | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_l2_m_lines_out[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .grpid = 0, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .grpid = 1, }, }; static const intel_x86_umask_t atom_simd_comp_inst_retired[]={ { .uname = "PACKED_SINGLE", .udesc = "Retired computational Streaming SIMD Extensions (SSE) packed-single instructions", .ucode = 0x100, }, { .uname = "SCALAR_SINGLE", .udesc = "Retired computational Streaming SIMD Extensions (SSE) scalar-single instructions", .ucode = 0x200, }, { .uname = "PACKED_DOUBLE", .udesc = "Retired computational Streaming SIMD Extensions 2 (SSE2) packed-double instructions", .ucode = 0x400, }, { .uname = "SCALAR_DOUBLE", .udesc = "Retired computational Streaming SIMD Extensions 2 (SSE2) scalar-double instructions", .ucode = 0x800, }, }; static const intel_x86_umask_t atom_simd_sat_uop_exec[]={ { .uname = "S", .udesc = "SIMD saturated arithmetic micro-ops executed", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AR", .udesc = "SIMD saturated arithmetic micro-ops retired", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_inst_retired[]={ { .uname = "ANY_P", .udesc = "Instructions retired using generic counter (precise event)", .ucode = 0x0, .uflags= INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_l1d_cache[]={ { .uname = "LD", .udesc = "L1 Cacheable Data Reads", .ucode = 0x2100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ST", .udesc = "L1 Cacheable Data Writes", .ucode = 0x2200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_mul[]={ { .uname = "S", .udesc = "Multiply operations executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AR", .udesc = "Multiply operations retired", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_div[]={ { .uname = "S", .udesc = "Divide operations executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AR", .udesc = "Divide operations retired", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_bus_trans_p[]={ { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .grpid = 1, }, }; static const intel_x86_umask_t atom_bus_io_wait[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, }, }; static const intel_x86_umask_t atom_bus_hitm_drv[]={ { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_itlb[]={ { .uname = "FLUSH", .udesc = "ITLB flushes", .ucode = 0x400, }, { .uname = "MISSES", .udesc = "ITLB misses", .ucode = 0x200, }, }; static const intel_x86_umask_t atom_simd_uop_type_exec[]={ { .uname = "MUL_S", .udesc = "SIMD packed multiply micro-ops executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MUL_AR", .udesc = "SIMD packed multiply micro-ops retired", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHIFT_S", .udesc = "SIMD packed shift micro-ops executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHIFT_AR", .udesc = "SIMD packed shift micro-ops retired", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACK_S", .udesc = "SIMD packed micro-ops executed", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACK_AR", .udesc = "SIMD packed micro-ops retired", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK_S", .udesc = "SIMD unpacked micro-ops executed", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK_AR", .udesc = "SIMD unpacked micro-ops retired", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOGICAL_S", .udesc = "SIMD packed logical micro-ops executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOGICAL_AR", .udesc = "SIMD packed logical micro-ops retired", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ARITHMETIC_S", .udesc = "SIMD packed arithmetic micro-ops executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ARITHMETIC_AR", .udesc = "SIMD packed arithmetic micro-ops retired", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_simd_inst_retired[]={ { .uname = "PACKED_SINGLE", .udesc = "Retired Streaming SIMD Extensions (SSE) packed-single instructions", .ucode = 0x100, }, { .uname = "SCALAR_SINGLE", .udesc = "Retired Streaming SIMD Extensions (SSE) scalar-single instructions", .ucode = 0x200, }, { .uname = "PACKED_DOUBLE", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) packed-double instructions", .ucode = 0x400, }, { .uname = "SCALAR_DOUBLE", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) scalar-double instructions", .ucode = 0x800, }, { .uname = "VECTOR", .udesc = "Retired Streaming SIMD Extensions 2 (SSE2) vector instructions", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "Retired Streaming SIMD instructions", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_prefetch[]={ { .uname = "PREFETCHT0", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed", .ucode = 0x100, }, { .uname = "SW_L2", .udesc = "Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed", .ucode = 0x600, }, { .uname = "PREFETCHNTA", .udesc = "Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed", .ucode = 0x800, }, }; static const intel_x86_umask_t atom_l2_rqsts[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .grpid = 0, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .grpid = 1, }, { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 2, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 2, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 2, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 2, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 2, }, }; static const intel_x86_umask_t atom_simd_uops_exec[]={ { .uname = "S", .udesc = "Number of SIMD saturated arithmetic micro-ops executed", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AR", .udesc = "Number of SIMD saturated arithmetic micro-ops retired", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_machine_clears[]={ { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_br_inst_retired[]={ { .uname = "ANY", .udesc = "Retired branch instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PRED_NOT_TAKEN", .udesc = "Retired branch instructions that were predicted not-taken", .ucode = 0x100, }, { .uname = "MISPRED_NOT_TAKEN", .udesc = "Retired branch instructions that were mispredicted not-taken", .ucode = 0x200, }, { .uname = "PRED_TAKEN", .udesc = "Retired branch instructions that were predicted taken", .ucode = 0x400, }, { .uname = "MISPRED_TAKEN", .udesc = "Retired branch instructions that were mispredicted taken", .ucode = 0x800, }, { .uname = "MISPRED", .udesc = "Retired mispredicted branch instructions", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN", .udesc = "Retired taken branch instructions", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY1", .udesc = "Retired branch instructions", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_macro_insts[]={ { .uname = "NON_CISC_DECODED", .udesc = "Non-CISC macro instructions decoded ", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DECODED", .udesc = "All Instructions decoded", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_segment_reg_loads[]={ { .uname = "ANY", .udesc = "Number of segment register loads", .ucode = 0x8000, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_baclears[]={ { .uname = "ANY", .udesc = "BACLEARS asserted", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_cycles_int_masked[]={ { .uname = "CYCLES_INT_MASKED", .udesc = "Cycles during which interrupts are disabled", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES_INT_PENDING_AND_MASKED", .udesc = "Cycles during which interrupts are pending and disabled", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_fp_assist[]={ { .uname = "S", .udesc = "Floating point assists for executed instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AR", .udesc = "Floating point assists for retired instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_data_tlb_misses[]={ { .uname = "DTLB_MISS", .udesc = "Memory accesses that missed the DTLB", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DTLB_MISS_LD", .udesc = "DTLB misses due to load operations", .ucode = 0x500, }, { .uname = "L0_DTLB_MISS_LD", .udesc = "L0 (micro-TLB) misses due to load operations", .ucode = 0x900, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DTLB_MISS_ST", .udesc = "DTLB misses due to store operations", .ucode = 0x600, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_store_forwards[]={ { .uname = "GOOD", .udesc = "Good store forwards", .ucode = 0x8100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t atom_cpu_clk_unhalted[]={ { .uname = "CORE_P", .udesc = "Core cycles when core is not halted", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "BUS", .udesc = "Bus cycles when core is not halted. This event can give a measurement of the elapsed time. This events has a constant ratio with CPU_CLK_UNHALTED:REF event, which is the maximum bus to processor frequency ratio", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NO_OTHER", .udesc = "Bus cycles when core is active and other is halted", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_mem_load_retired[]={ { .uname = "L2_HIT", .udesc = "Retired loads that hit the L2 cache (precise event)", .ucode = 0x100, .uflags= INTEL_X86_PEBS, }, { .uname = "L2_MISS", .udesc = "Retired loads that miss the L2 cache (precise event)", .ucode = 0x200, .uflags= INTEL_X86_PEBS, }, { .uname = "DTLB_MISS", .udesc = "Retired loads that miss the DTLB (precise event)", .ucode = 0x400, .uflags= INTEL_X86_PEBS, }, }; static const intel_x86_umask_t atom_x87_comp_ops_exe[]={ { .uname = "ANY_S", .udesc = "Floating point computational micro-ops executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_AR", .udesc = "Floating point computational micro-ops retired", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t atom_page_walks[]={ { .uname = "WALKS", .udesc = "Number of page-walks executed", .uequiv = "CYCLES", .ucode = 0x300 | INTEL_X86_MOD_EDGE, .modhw = _INTEL_X86_ATTR_E, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES", .udesc = "Duration of page-walks in core cycles", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_atom_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Unhalted core cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x200000003ull, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycle", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "INSTRUCTION_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x100000003ull, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10003, .code = 0xc0, }, { .name = "LLC_REFERENCES", .desc = "Last level of cache references", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for LLC_REFERENCES", .modmsk = INTEL_V3_ATTRS, .equiv = "LLC_REFERENCES", .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LLC_MISSES", .desc = "Last level of cache misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for LLC_MISSES", .modmsk = INTEL_V3_ATTRS, .equiv = "LLC_MISSES", .cntmsk = 0x3, .code = 0x412e, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Branch instructions retired", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ANY", .cntmsk = 0x3, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Mispredicted branch instruction retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc5, .flags= INTEL_X86_PEBS, }, { .name = "SIMD_INSTR_RETIRED", .desc = "SIMD Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xce, }, { .name = "L2_REJECT_BUSQ", .desc = "Rejected L2 cache requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x30, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_reject_busq), .ngrp = 3, .umasks = atom_l2_reject_busq, }, { .name = "SIMD_SAT_INSTR_RETIRED", .desc = "Saturated arithmetic instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xcf, }, { .name = "ICACHE", .desc = "Instruction fetches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(atom_icache), .ngrp = 1, .umasks = atom_icache, }, { .name = "L2_LOCK", .desc = "L2 locked accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x2b, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_lock), .ngrp = 2, .umasks = atom_l2_lock, }, { .name = "UOPS_RETIRED", .desc = "Micro-ops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc2, .numasks = LIBPFM_ARRAY_SIZE(atom_uops_retired), .ngrp = 1, .umasks = atom_uops_retired, }, { .name = "L2_M_LINES_OUT", .desc = "Modified lines evicted from the L2 cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, }, { .name = "SIMD_COMP_INST_RETIRED", .desc = "Retired computational Streaming SIMD Extensions (SSE) instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xca, .numasks = LIBPFM_ARRAY_SIZE(atom_simd_comp_inst_retired), .ngrp = 1, .umasks = atom_simd_comp_inst_retired, }, { .name = "SNOOP_STALL_DRV", .desc = "Bus stalled for snoops", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_BURST", .desc = "Burst (full cache-line) bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "SIMD_SAT_UOP_EXEC", .desc = "SIMD saturated arithmetic micro-ops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(atom_simd_sat_uop_exec), .ngrp = 1, .umasks = atom_simd_sat_uop_exec, }, { .name = "BUS_TRANS_IO", .desc = "IO bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "RFO bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "SIMD_ASSIST", .desc = "SIMD assists invoked", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(atom_inst_retired), .ngrp = 1, .umasks = atom_inst_retired, }, { .name = "L1D_CACHE", .desc = "L1 Cacheable Data Reads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(atom_l1d_cache), .ngrp = 1, .umasks = atom_l1d_cache, }, { .name = "MUL", .desc = "Multiply operations executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x12, .numasks = LIBPFM_ARRAY_SIZE(atom_mul), .ngrp = 1, .umasks = atom_mul, }, { .name = "DIV", .desc = "Divide operations executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x13, .numasks = LIBPFM_ARRAY_SIZE(atom_div), .ngrp = 1, .umasks = atom_div, }, { .name = "BUS_TRANS_P", .desc = "Partial bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_trans_p), .ngrp = 2, .umasks = atom_bus_trans_p, }, { .name = "BUS_IO_WAIT", .desc = "IO requests waiting in the bus queue", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, }, { .name = "L2_M_LINES_IN", .desc = "L2 cache line modifications", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x25, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "L2 cache misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "BUSQ_EMPTY", .desc = "Bus queue is empty", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, { .name = "L2_IFETCH", .desc = "L2 cacheable instruction fetch requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_lock), .ngrp = 2, .umasks = atom_l2_lock, /* identical to actual umasks list for this event */ }, { .name = "BUS_HITM_DRV", .desc = "HITM signal asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7b, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_hitm_drv), .ngrp = 1, .umasks = atom_bus_hitm_drv, }, { .name = "ITLB", .desc = "ITLB hits", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x82, .numasks = LIBPFM_ARRAY_SIZE(atom_itlb), .ngrp = 1, .umasks = atom_itlb, }, { .name = "BUS_TRANS_MEM", .desc = "Memory bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_PWR", .desc = "Partial write bus transaction", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "BR_INST_DECODED", .desc = "Branch instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x1e0, }, { .name = "BUS_TRANS_INVAL", .desc = "Invalidate bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "SIMD_UOP_TYPE_EXEC", .desc = "SIMD micro-ops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(atom_simd_uop_type_exec), .ngrp = 1, .umasks = atom_simd_uop_type_exec, }, { .name = "SIMD_INST_RETIRED", .desc = "Retired Streaming SIMD Extensions (SSE) instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc7, .numasks = LIBPFM_ARRAY_SIZE(atom_simd_inst_retired), .ngrp = 1, .umasks = atom_simd_inst_retired, }, { .name = "CYCLES_DIV_BUSY", .desc = "Cycles the divider is busy", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x14, }, { .name = "PREFETCH", .desc = "Streaming SIMD Extensions (SSE) PrefetchT0 instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(atom_prefetch), .ngrp = 1, .umasks = atom_prefetch, }, { .name = "L2_RQSTS", .desc = "L2 cache requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_rqsts), .ngrp = 3, .umasks = atom_l2_rqsts, }, { .name = "SIMD_UOPS_EXEC", .desc = "SIMD micro-ops executed (excluding stores)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(atom_simd_uops_exec), .ngrp = 1, .umasks = atom_simd_uops_exec, }, { .name = "HW_INT_RCV", .desc = "Hardware interrupts received (warning overcounts by 2x)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x1c8, }, { .name = "BUS_TRANS_BRD", .desc = "Burst read bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_trans_p), .ngrp = 2, .umasks = atom_bus_trans_p, /* identical to actual umasks list for this event */ }, { .name = "BOGUS_BR", .desc = "Bogus branches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BUS_DATA_RCV", .desc = "Bus cycles while processor receives data", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x64, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, { .name = "MACHINE_CLEARS", .desc = "Self-Modifying Code detected", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(atom_machine_clears), .ngrp = 1, .umasks = atom_machine_clears, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc4, .numasks = LIBPFM_ARRAY_SIZE(atom_br_inst_retired), .ngrp = 1, .umasks = atom_br_inst_retired, }, { .name = "L2_ADS", .desc = "Cycles L2 address bus is in use", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x21, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, { .name = "EIST_TRANS", .desc = "Number of Enhanced Intel SpeedStep(R) Technology (EIST) transitions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x3a, }, { .name = "BUS_TRANS_WB", .desc = "Explicit writeback bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "MACRO_INSTS", .desc = "Macro-instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xaa, .numasks = LIBPFM_ARRAY_SIZE(atom_macro_insts), .ngrp = 1, .umasks = atom_macro_insts, }, { .name = "L2_LINES_OUT", .desc = "L2 cache lines evicted. ", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "L2_LD", .desc = "L2 cache reads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_rqsts), .ngrp = 3, .umasks = atom_l2_rqsts, /* identical to actual umasks list for this event */ }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(atom_segment_reg_loads), .ngrp = 1, .umasks = atom_segment_reg_loads, }, { .name = "L2_NO_REQ", .desc = "Cycles no L2 cache requests are pending", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x32, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, { .name = "THERMAL_TRIP", .desc = "Number of thermal trips", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc03b, }, { .name = "EXT_SNOOP", .desc = "External snoops", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x77, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_lock), .ngrp = 2, .umasks = atom_l2_lock, /* identical to actual umasks list for this event */ }, { .name = "BACLEARS", .desc = "Branch address calculator", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(atom_baclears), .ngrp = 1, .umasks = atom_baclears, }, { .name = "CYCLES_INT_MASKED", .desc = "Cycles during which interrupts are disabled", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc6, .numasks = LIBPFM_ARRAY_SIZE(atom_cycles_int_masked), .ngrp = 1, .umasks = atom_cycles_int_masked, }, { .name = "FP_ASSIST", .desc = "Floating point assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x11, .numasks = LIBPFM_ARRAY_SIZE(atom_fp_assist), .ngrp = 1, .umasks = atom_fp_assist, }, { .name = "L2_ST", .desc = "L2 store requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_lock), .ngrp = 2, .umasks = atom_l2_lock, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_DEF", .desc = "Deferred bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "DATA_TLB_MISSES", .desc = "Memory accesses that missed the DTLB", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(atom_data_tlb_misses), .ngrp = 1, .umasks = atom_data_tlb_misses, }, { .name = "BUS_BNR_DRV", .desc = "Number of Bus Not Ready signals asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x61, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_hitm_drv), .ngrp = 1, .umasks = atom_bus_hitm_drv, /* identical to actual umasks list for this event */ }, { .name = "STORE_FORWARDS", .desc = "All store forwards", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x2, .numasks = LIBPFM_ARRAY_SIZE(atom_store_forwards), .ngrp = 1, .umasks = atom_store_forwards, }, { .name = "CPU_CLK_UNHALTED", .desc = "Core cycles when core is not halted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(atom_cpu_clk_unhalted), .ngrp = 1, .umasks = atom_cpu_clk_unhalted, }, { .name = "BUS_TRANS_ANY", .desc = "All bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(atom_l2_m_lines_out), .ngrp = 2, .umasks = atom_l2_m_lines_out, /* identical to actual umasks list for this event */ }, { .name = "MEM_LOAD_RETIRED", .desc = "Retired loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xcb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(atom_mem_load_retired), .ngrp = 1, .umasks = atom_mem_load_retired, }, { .name = "X87_COMP_OPS_EXE", .desc = "Floating point computational micro-ops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(atom_x87_comp_ops_exe), .ngrp = 1, .umasks = atom_x87_comp_ops_exe, }, { .name = "PAGE_WALKS", .desc = "Number of page-walks executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0xc, .numasks = LIBPFM_ARRAY_SIZE(atom_page_walks), .ngrp = 1, .umasks = atom_page_walks, }, { .name = "BUS_LOCK_CLOCKS", .desc = "Bus cycles when a LOCK signal is asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_trans_p), .ngrp = 2, .umasks = atom_bus_trans_p, /* identical to actual umasks list for this event */ }, { .name = "BUS_REQUEST_OUTSTANDING", .desc = "Outstanding cacheable data read bus requests duration", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_trans_p), .ngrp = 2, .umasks = atom_bus_trans_p, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IFETCH", .desc = "Instruction-fetch bus transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_trans_p), .ngrp = 2, .umasks = atom_bus_trans_p, /* identical to actual umasks list for this event */ }, { .name = "BUS_HIT_DRV", .desc = "HIT signal asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x7a, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_hitm_drv), .ngrp = 1, .umasks = atom_bus_hitm_drv, /* identical to actual umasks list for this event */ }, { .name = "BUS_DRDY_CLOCKS", .desc = "Bus cycles when data is sent on the bus", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_hitm_drv), .ngrp = 1, .umasks = atom_bus_hitm_drv, /* identical to actual umasks list for this event */ }, { .name = "L2_DBUS_BUSY", .desc = "Cycles the L2 cache data bus is busy", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(atom_bus_io_wait), .ngrp = 1, .umasks = atom_bus_io_wait, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_cbo_events.h0000600003276200002170000005745412247131123024102 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: snbep_unc_cbo (Intel SandyBridge-EP C-Box uncore PMU) */ #define CBO_FILT_MESIF(a, b, c, d) \ { .uname = "STATE_"#a,\ .udesc = #b" cacheline state",\ .ufilters[0] = 1ULL << (18 + (c)),\ .grpid = d, \ } #define CBO_FILT_MESIFS(d) \ CBO_FILT_MESIF(I, Invalid, 0, d), \ CBO_FILT_MESIF(S, Shared, 1, d), \ CBO_FILT_MESIF(E, Exclusive, 2, d), \ CBO_FILT_MESIF(M, Modified, 3, d), \ CBO_FILT_MESIF(F, Forward, 4, d), \ { .uname = "STATE_MESIF",\ .udesc = "Any cache line state",\ .ufilters[0] = 0x1fULL << 18,\ .grpid = d, \ .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, \ } #define CBO_FILT_OPC(d) \ { .uname = "OPC_RFO",\ .udesc = "Demand data RFO (combine with any OPCODE umask)",\ .ufilters[0] = 0x180ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_CRD",\ .udesc = "Demand code read (combine with any OPCODE umask)",\ .ufilters[0] = 0x181ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_DRD",\ .udesc = "Demand data read (combine with any OPCODE umask)",\ .ufilters[0] = 0x182ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PRD",\ .udesc = "Partial reads (UC) (combine with any OPCODE umask)",\ .ufilters[0] = 0x187ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_WCILF",\ .udesc = "Full Stream store (combine with any OPCODE umask)", \ .ufilters[0] = 0x18cULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_WCIL",\ .udesc = "Partial Stream store (combine with any OPCODE umask)", \ .ufilters[0] = 0x18dULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PF_RFO",\ .udesc = "Prefetch RFO into LLC but do not pass to L2 (includes hints) (combine with any OPCODE umask)", \ .ufilters[0] = 0x190ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PF_CODE",\ .udesc = "Prefetch code into LLC but do not pass to L2 (includes hints) (combine with any OPCODE umask)", \ .ufilters[0] = 0x191ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PF_DATA",\ .udesc = "Prefetch data into LLC but do not pass to L2 (includes hints) (combine with any OPCODE umask)", \ .ufilters[0] = 0x192ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCIWILF",\ .udesc = "PCIe write (non-allocating) (combine with any OPCODE umask)", \ .ufilters[0] = 0x194ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCIPRD",\ .udesc = "PCIe UC read (combine with any OPCODE umask)", \ .ufilters[0] = 0x195ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCIITOM",\ .udesc = "PCIe write (allocating) (combine with any OPCODE umask)", \ .ufilters[0] = 0x19cULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCIRDCUR",\ .udesc = "PCIe read current (combine with any OPCODE umask)", \ .ufilters[0] = 0x19eULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_WBMTOI",\ .udesc = "Request writeback modified invalidate line (combine with any OPCODE umask)", \ .ufilters[0] = 0x1c4ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_WBMTOE",\ .udesc = "Request writeback modified set to exclusive (combine with any OPCODE umask)", \ .ufilters[0] = 0x1c5ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_ITOM",\ .udesc = "Request invalidate line (combine with any OPCODE umask)", \ .ufilters[0] = 0x1c8ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCINSRD",\ .udesc = "PCIe non-snoop read (combine with any OPCODE umask)", \ .ufilters[0] = 0x1e4ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCINSWR",\ .udesc = "PCIe non-snoop write (partial) (combine with any OPCODE umask)", \ .ufilters[0] = 0x1e5ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ }, \ { .uname = "OPC_PCINSWRF",\ .udesc = "PCIe non-snoop write (full) (combine with any OPCODE umask)", \ .ufilters[0] = 0x1e6ULL << 23, \ .uflags = INTEL_X86_NCOMBO, \ .grpid = d, \ } static const intel_x86_umask_t snbep_unc_c_llc_lookup[]={ { .uname = "DATA_READ", .udesc = "Data read requests", .grpid = 0, .uflags = INTEL_X86_NCOMBO, .ucode = 0x300, }, { .uname = "WRITE", .udesc = "Write requests. Includes all write transactions (cached, uncached)", .grpid = 0, .uflags = INTEL_X86_NCOMBO, .ucode = 0x500, }, { .uname = "REMOTE_SNOOP", .udesc = "External snoop request", .grpid = 0, .uflags = INTEL_X86_NCOMBO, .ucode = 0x900, }, { .uname = "NID", .udesc = "Match a given RTID destination NID", .uflags = INTEL_X86_NCOMBO, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .grpid = 0, .ucode = 0x4100, }, CBO_FILT_MESIFS(1), }; static const intel_x86_umask_t snbep_unc_c_llc_victims[]={ { .uname = "M_STATE", .udesc = "Lines in M state", .ucode = 0x100, }, { .uname = "E_STATE", .udesc = "Lines in E state", .ucode = 0x200, }, { .uname = "S_STATE", .udesc = "Lines in S state", .ucode = 0x400, }, { .uname = "MISS", .udesc = "TBD", .ucode = 0x800, }, { .uname = "NID", .udesc = "Victimized Lines matching the NID filter", .ucode = 0x4000, }, }; static const intel_x86_umask_t snbep_unc_c_misc[]={ { .uname = "RSPI_WAS_FSE", .udesc = "Silent snoop eviction", .ucode = 0x100, }, { .uname = "WC_ALIASING", .udesc = "Write combining aliasing", .ucode = 0x200, }, { .uname = "STARTED", .udesc = "TBD", .ucode = 0x400, }, { .uname = "RFO_HIT_S", .udesc = "RFO hits in S state", .ucode = 0x800, }, }; static const intel_x86_umask_t snbep_unc_c_ring_ad_used[]={ { .uname = "UP_EVEN", .udesc = "Up and Even ring polarity filter", .ucode = 0x100, }, { .uname = "UP_ODD", .udesc = "Up and odd ring polarity filter", .ucode = 0x200, }, { .uname = "DOWN_EVEN", .udesc = "Down and even ring polarity filter", .ucode = 0x400, }, { .uname = "DOWN_ODD", .udesc = "Down and odd ring polarity filter", .ucode = 0x800, }, }; static const intel_x86_umask_t snbep_unc_c_ring_bounces[]={ { .uname = "AK_CORE", .udesc = "Acknowledgment to core", .ucode = 0x200, }, { .uname = "BL_CORE", .udesc = "Data response to core", .ucode = 0x400, }, { .uname = "IV_CORE", .udesc = "Snoops of processor cache", .ucode = 0x800, }, }; static const intel_x86_umask_t snbep_unc_c_ring_iv_used[]={ { .uname = "ANY", .udesc = "Any filter", .ucode = 0xf00, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_ext_starved[]={ { .uname = "IRQ", .udesc = "Irq externally starved, therefore blocking the IPQ", .ucode = 0x100, }, { .uname = "IPQ", .udesc = "IPQ externally starved, therefore blocking the IRQ", .ucode = 0x200, }, { .uname = "ISMQ", .udesc = "ISMQ externally starved, therefore blocking both IRQ and IPQ", .ucode = 0x400, }, { .uname = "ISMQ_BIDS", .udesc = "Number of time the ISMQ bids", .ucode = 0x800, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_inserts[]={ { .uname = "IPQ", .udesc = "IPQ", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IRQ", .udesc = "IRQ", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IRQ_REJECTED", .udesc = "IRQ rejected", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "VFIFO", .udesc = "Counts the number of allocated into the IRQ ordering FIFO", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_ipq_retry[]={ { .uname = "ADDR_CONFLICT", .udesc = "Address conflict", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "Any Reject", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FULL", .udesc = "No Egress credits", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "QPI_CREDITS", .udesc = "No QPI credits", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_irq_retry[]={ { .uname = "ADDR_CONFLICT", .udesc = "Address conflict", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "Any reject", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FULL", .udesc = "No Egress credits", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "QPI_CREDITS", .udesc = "No QPI credits", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RTID", .udesc = "No RTIDs", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_ismq_retry[]={ { .uname = "ANY", .udesc = "Any reject", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FULL", .udesc = "No Egress credits", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IIO_CREDITS", .udesc = "No IIO credits", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "QPI_CREDITS", .udesc = "NO QPI credits", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RTID", .udesc = "No RTIDs", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_c_rxr_occupancy[]={ { .uname = "IPQ", .udesc = "IPQ", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IRQ", .udesc = "IRQ", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IRQ_REJECTED", .udesc = "IRQ rejected", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "VFIFO", .udesc = "Number of used entries in the IRQ ordering FIFO in each cycle", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_c_tor_inserts[]={ { .uname = "EVICTION", .udesc = "Number of Evictions transactions inserted into TOR", .ucode = 0x400, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "MISS_ALL", .udesc = "Number of miss requests inserted into the TOR", .ucode = 0xa00, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "MISS_OPCODE", .udesc = "Number of miss transactions inserted into the TOR that match an opcode (must provide opc_* umask)", .ucode = 0x300, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NID_ALL", .udesc = "Number of NID-matched transactions inserted into the TOR", .ucode = 0x4800, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_EVICTION", .udesc = "Number of NID-matched eviction transactions inserted into the TOR (must provide nf=X modifier)", .ucode = 0x4400, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_MISS_ALL", .udesc = "Number of NID-matched miss transactions that were inserted into the TOR (must provide nf=X modifier)", .ucode = 0x4a00, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_MISS_OPCODE", .udesc = "Number of NID and opcode matched miss transactions inserted into the TOR (must provide opc_* umask and nf=X modifier)", .ucode = 0x4300, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NID_OPCODE", .udesc = "Number of transactions inserted into the TOR that match a NID and opcode (must provide opc_* umask and nf=X modifier)", .ucode = 0x4100, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NID_WB", .udesc = "Number of NID-matched write back transactions inserted into the TOR (must provide nf=X modifier)", .ucode = 0x5000, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "OPCODE", .udesc = "Number of transactions inserted into the TOR that match an opcode (must provide opc_* umask)", .ucode = 0x100, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WB", .udesc = "Number of write transactions inserted into the TOR", .ucode = 0x1000, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, CBO_FILT_OPC(1) }; static const intel_x86_umask_t snbep_unc_c_tor_occupancy[]={ { .uname = "ALL", .udesc = "All valid TOR entries", .ucode = 0x800, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_EXCL_GRP_GT, }, { .uname = "EVICTION", .udesc = "Number of outstanding eviction transactions in the TOR", .ucode = 0x400, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "MISS_ALL", .udesc = "Number of outstanding miss requests in the TOR", .ucode = 0xa00, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "MISS_OPCODE", .udesc = "Number of TOR entries that match a NID and an opcode (must provide opc_* umask)", .ucode = 0x300, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NID_ALL", .udesc = "Number of NID-matched outstanding requests in the TOR (must provide nf=X modifier)", .ucode = 0x4800, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_EVICTION", .udesc = "Number of NID-matched outstanding requests in the TOR (must provide a nf=X modifier)", .ucode = 0x4400, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_MISS_ALL", .udesc = "Number of NID-matched outstanding miss requests in the TOR (must provide a nf=X modifier)", .ucode = 0x4a00, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO | INTEL_X86_EXCL_GRP_GT, }, { .uname = "NID_MISS_OPCODE", .udesc = "Number of NID-matched outstanding miss requests in the TOR that an opcode (must provide nf=X modifier and opc_* umask)", .ucode = 0x4300, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NID_OPCODE", .udesc = "Number of NID-matched TOR entries that an opcode (must provide nf=X modifier and opc_* umask)", .ucode = 0x4100, .grpid = 0, .umodmsk_req = _SNBEP_UNC_ATTR_NF, .uflags = INTEL_X86_NCOMBO, }, { .uname = "OPCODE", .udesc = "Number of TOR entries that match an opcode (must provide opc_* umask)", .ucode = 0x100, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, CBO_FILT_OPC(1) }; static const intel_x86_umask_t snbep_unc_c_txr_inserts[]={ { .uname = "AD_CACHE", .udesc = "Counts the number of ring transactions from Cachebo to AD ring", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "AK_CACHE", .udesc = "Counts the number of ring transactions from Cachebo to AK ring", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "BL_CACHE", .udesc = "Counts the number of ring transactions from Cachebo to BL ring", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IV_CACHE", .udesc = "Counts the number of ring transactions from Cachebo ton IV ring", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "AD_CORE", .udesc = "Counts the number of ring transactions from Corebo to AD ring", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "AK_CORE", .udesc = "Counts the number of ring transactions from Corebo to AK ring", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "BL_CORE", .udesc = "Counts the number of ring transactions from Corebo to BL ring", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_c_pe[]={ { .name = "UNC_C_CLOCKTICKS", .desc = "C-box Uncore clockticks", .modmsk = 0x0, .cntmsk = 0xf, .code = 0x00, .flags = INTEL_X86_FIXED, }, { .name = "UNC_C_COUNTER0_OCCUPANCY", .desc = "Counter 0 occupancy. Counts the occupancy related information by filtering CB0 occupancy count captured in counter 0.", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0xe, .code = 0x1f, }, { .name = "UNC_C_ISMQ_DRD_MISS_OCC", .desc = "TBD", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x21, }, { .name = "UNC_C_LLC_LOOKUP", .desc = "Cache lookups. Counts number of times the LLC is accessed from L2 for code, data, prefetches (Must set filter mask bit 0 and select )", .modmsk = SNBEP_UNC_CBO_NID_ATTRS, .cntmsk = 0x3, .code = 0x34, .ngrp = 2, .flags = INTEL_X86_NO_AUTOENCODE, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_llc_lookup), .umasks = snbep_unc_c_llc_lookup, }, { .name = "UNC_C_LLC_VICTIMS", .desc = "Lines victimized", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x37, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_llc_victims), .ngrp = 1, .umasks = snbep_unc_c_llc_victims, }, { .name = "UNC_C_MISC", .desc = "Miscelleanous C-Box events", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x39, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_misc), .ngrp = 1, .umasks = snbep_unc_c_misc, }, { .name = "UNC_C_RING_AD_USED", .desc = "Address ring in use. Counts number of cycles ring is being used at this ring stop", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0xc, .code = 0x1b, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_ring_ad_used), .ngrp = 1, .umasks = snbep_unc_c_ring_ad_used, }, { .name = "UNC_C_RING_AK_USED", .desc = "Acknowledgement ring in use. Counts number of cycles ring is being used at this ring stop", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0xc, .code = 0x1c, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_ring_ad_used), /* identical to RING_AD_USED */ .ngrp = 1, .umasks = snbep_unc_c_ring_ad_used, }, { .name = "UNC_C_RING_BL_USED", .desc = "Bus or Data ring in use. Counts number of cycles ring is being used at this ring stop", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0xc, .code = 0x1d, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_ring_ad_used), /* identical to RING_AD_USED */ .ngrp = 1, .umasks = snbep_unc_c_ring_ad_used, }, { .name = "UNC_C_RING_BOUNCES", .desc = "Number of LLC responses that bounced in the ring", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x05, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_ring_bounces), .ngrp = 1, .umasks = snbep_unc_c_ring_bounces, }, { .name = "UNC_C_RING_IV_USED", .desc = "Invalidate ring in use. Counts number of cycles ring is being used at this ring stop", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0xc, .code = 0x1e, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_ring_iv_used), .ngrp = 1, .umasks = snbep_unc_c_ring_iv_used, }, { .name = "UNC_C_RING_SRC_THRTL", .desc = "TDB", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x07, }, { .name = "UNC_C_RXR_EXT_STARVED", .desc = "Ingress arbiter blockig cycles", .modmsk = SNBEP_UNC_CBO_ATTRS, .cntmsk = 0x3, .code = 0x12, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_ext_starved), .ngrp = 1, .umasks = snbep_unc_c_rxr_ext_starved, }, { .name = "UNC_C_RXR_INSERTS", .desc = "Ingress Allocations", .code = 0x13, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_inserts), .umasks = snbep_unc_c_rxr_inserts }, { .name = "UNC_C_RXR_IPQ_RETRY", .desc = "Probe Queue Retries", .code = 0x31, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_ipq_retry), .umasks = snbep_unc_c_rxr_ipq_retry }, { .name = "UNC_C_RXR_IRQ_RETRY", .desc = "Ingress Request Queue Rejects", .code = 0x32, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_irq_retry), .umasks = snbep_unc_c_rxr_irq_retry }, { .name = "UNC_C_RXR_ISMQ_RETRY", .desc = "ISMQ Retries", .code = 0x33, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_ismq_retry), .umasks = snbep_unc_c_rxr_ismq_retry }, { .name = "UNC_C_RXR_OCCUPANCY", .desc = "Ingress Occupancy", .code = 0x11, .cntmsk = 0x1, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_rxr_inserts), .umasks = snbep_unc_c_rxr_inserts, /* identical to snbep_unc_c_rxr_inserts */ }, { .name = "UNC_C_TOR_INSERTS", .desc = "TOR Inserts", .code = 0x35, .cntmsk = 0x3, .ngrp = 2, .modmsk = SNBEP_UNC_CBO_NID_ATTRS, .flags = INTEL_X86_NO_AUTOENCODE, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_tor_inserts), .umasks = snbep_unc_c_tor_inserts }, { .name = "UNC_C_TOR_OCCUPANCY", .desc = "TOR Occupancy", .code = 0x36, .cntmsk = 0x1, .ngrp = 2, .modmsk = SNBEP_UNC_CBO_NID_ATTRS, .flags = INTEL_X86_NO_AUTOENCODE, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_tor_occupancy), .umasks = snbep_unc_c_tor_occupancy }, { .name = "UNC_C_TXR_ADS_USED", .desc = "Egress events", .code = 0x04, .cntmsk = 0x3, .modmsk = SNBEP_UNC_CBO_ATTRS, }, { .name = "UNC_C_TXR_INSERTS", .desc = "Egress allocations", .code = 0x02, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_CBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_c_txr_inserts), .umasks = snbep_unc_c_txr_inserts }, }; papi-5.3.0/src/libpfm4/lib/events/intel_p6_events.h0000600003276200002170000006105412247131123021617 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: p6 (Intel P6 Processor Family) */ static const intel_x86_umask_t p6_l2_ifetch[]={ { .uname = "I", .udesc = "Invalid state", .ucode = 0x100, }, { .uname = "S", .udesc = "Shared state", .ucode = 0x200, }, { .uname = "E", .udesc = "Exclusive state", .ucode = 0x400, }, { .uname = "M", .udesc = "Modified state", .ucode = 0x800, }, }; static const intel_x86_umask_t p6_bus_drdy_clocks[]={ { .uname = "SELF", .udesc = "Clocks when processor is driving bus", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "Clocks when any agent is driving bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t p6_mmx_instr_type_exec[]={ { .uname = "MUL", .udesc = "MMX packed multiply instructions executed", .ucode = 0x100, }, { .uname = "SHIFT", .udesc = "MMX packed shift instructions executed", .ucode = 0x200, }, { .uname = "PACK", .udesc = "MMX pack operation instructions executed", .ucode = 0x400, }, { .uname = "UNPACK", .udesc = "MMX unpack operation instructions executed", .ucode = 0x800, }, { .uname = "LOGICAL", .udesc = "MMX packed logical instructions executed", .ucode = 0x1000, }, { .uname = "ARITH", .udesc = "MMX packed arithmetic instructions executed", .ucode = 0x2000, }, }; static const intel_x86_umask_t p6_fp_mmx_trans[]={ { .uname = "TO_FP", .udesc = "From MMX instructions to floating-point instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TO_MMX", .udesc = "From floating-point instructions to MMX instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t p6_seg_rename_stalls[]={ { .uname = "ES", .udesc = "Segment register ES", .ucode = 0x100, }, { .uname = "DS", .udesc = "Segment register DS", .ucode = 0x200, }, { .uname = "FS", .udesc = "Segment register FS", .ucode = 0x400, }, { .uname = "GS", .udesc = "Segment register GS", .ucode = 0x800, }, }; static const intel_x86_umask_t p6_emon_kni_pref_dispatched[]={ { .uname = "NTA", .udesc = "Prefetch NTA", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T1", .udesc = "Prefetch T1", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T2", .udesc = "Prefetch T2", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WEAK", .udesc = "Weakly ordered stores", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t p6_emon_kni_inst_retired[]={ { .uname = "PACKED_SCALAR", .udesc = "Packed and scalar instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SCALAR", .udesc = "Scalar only", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_p6_pe[]={ { .name = "CPU_CLK_UNHALTED", .desc = "Number cycles during which the processor is not halted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x79, }, { .name = "INST_RETIRED", .desc = "Number of instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, }, { .name = "DATA_MEM_REFS", .desc = "All loads from any memory type. All stores to any memory typeEach part of a split is counted separately. The internal logic counts not only memory loads and stores but also internal retries. 80-bit floating point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performe, is only counted once). Does ot include I/O accesses or other non-memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x43, }, { .name = "DCU_LINES_IN", .desc = "Total lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x45, }, { .name = "DCU_M_LINES_IN", .desc = "Number of M state lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "DCU_M_LINES_OUT", .desc = "Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "DCU_MISS_OUTSTANDING", .desc = "Weighted number of cycle while a DCU miss is outstanding, incremented by the number of cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded Read-for-ownerships are counted, as well as line fills, invalidates, and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "IFU_IFETCH", .desc = "Number of instruction fetches, both cacheable and noncacheable including UC fetches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "IFU_IFETCH_MISS", .desc = "Number of instruction fetch misses. All instructions fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB_MISS", .desc = "Number of ITLB misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x85, }, { .name = "IFU_MEM_STALL", .desc = "Number of cycles instruction fetch is stalled for any reason. Includs IFU cache misses, ITLB misses, ITLB faults, and other minor stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Number of cycles that the instruction length decoder is stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "L2_IFETCH", .desc = "Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches: it does not include UC instruction fetches It does not include ITLB miss accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(p6_l2_ifetch), .ngrp = 1, .umasks = p6_l2_ifetch, }, { .name = "L2_ST", .desc = "Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indictes that the DCU sent a read-for ownership request to the L2. It also includes Invalid to Modified reqyests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(p6_l2_ifetch), .ngrp = 1, .umasks = p6_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_INM", .desc = "Number of modified lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, }, { .name = "L2_RQSTS", .desc = "Total number of L2 requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(p6_l2_ifetch), .ngrp = 1, .umasks = p6_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_ADS", .desc = "Number of L2 address strobes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, }, { .name = "L2_DBUS_BUSY", .desc = "Number of cycles during which the L2 cache data bus was busy", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x22, }, { .name = "L2_DBUS_BUSY_RD", .desc = "Number of cycles during which the data bus was busy transferring read data from L2 to the processor", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, }, { .name = "BUS_DRDY_CLOCKS", .desc = "Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, }, { .name = "BUS_LOCK_CLOCKS", .desc = "Number of clocks during which LOCK# is asserted on the external system bus", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_REQ_OUTSTANDING", .desc = "Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, }, { .name = "BUS_TRANS_BRD", .desc = "Number of burst read transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "Number of completed read for ownership transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Number of completed write back transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_IFETCH", .desc = "Number of completed instruction fetch transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_INVAL", .desc = "Number of completed invalidate transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_PWR", .desc = "Number of completed partial write transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Number of completed partial transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "Number of completed I/O transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_DEF", .desc = "Number of completed deferred transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_BURST", .desc = "Number of completed burst transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_ANY", .desc = "Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_MEM", .desc = "Number of completed memory transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(p6_bus_drdy_clocks), .ngrp = 1, .umasks = p6_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RECV", .desc = "Number of bus clock cycles during which this processor is receiving data", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x64, }, { .name = "BUS_BNR_DRV", .desc = "Number of bus clock cycles during which this processor is driving the BNR# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, }, { .name = "BUS_HIT_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HIT# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7a, }, { .name = "BUS_HITM_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HITM# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7b, }, { .name = "BUS_SNOOP_STALL", .desc = "Number of clock cycles during which the bus is snoop stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, }, { .name = "FLOPS", .desc = "Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating point loads and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xc1, }, { .name = "FP_COMP_OPS_EXE", .desc = "Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "Number of floating-point exception cases handled by microcode.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Number of multiplies.This count includes integer as well as FP multiplies and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Number of divides.This count includes integer as well as FP divides and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Number of cycles during which the divider is busy, and cannot accept new divides. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "LD_BLOCKS", .desc = "Number of load operations delayed due to store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, }, { .name = "SB_DRAINS", .desc = "Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, }, { .name = "MISALIGN_MEM_REF", .desc = "Number of misaligned data memory references. Incremented by 1 every cycle during which, either the processor's load or store pipeline dispatches a misaligned micro-op Counting is performed if it is the first or second half or if it is blocked, squashed, or missed. In this context, misaligned means crossing a 64-bit boundary", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x5, }, { .name = "UOPS_RETIRED", .desc = "Number of micro-ops retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, }, { .name = "INST_DECODED", .desc = "Number of instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd0, }, { .name = "HW_INT_RX", .desc = "Number of hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "CYCLES_INT_MASKED", .desc = "Number of processor cycles for which interrupts are disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc6, }, { .name = "CYCLES_INT_PENDING_AND_MASKED", .desc = "Number of processor cycles for which interrupts are disabled and interrupts are pending.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, }, { .name = "BR_INST_RETIRED", .desc = "Number of branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, }, { .name = "BR_MISS_PRED_RETIRED", .desc = "Number of mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, }, { .name = "BR_TAKEN_RETIRED", .desc = "Number of taken branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "BR_MISS_PRED_TAKEN_RET", .desc = "Number of taken mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, }, { .name = "BR_INST_DECODED", .desc = "Number of branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BTB_MISSES", .desc = "Number of branches for which the BTB did not produce a prediction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe2, }, { .name = "BR_BOGUS", .desc = "Number of bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "RESOURCE_STALLS", .desc = "Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa2, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Number of cycles or events for partial stalls. This includes flag partial stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd2, }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "MMX_SAT_INSTR_EXEC", .desc = "Number of MMX saturating instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb1, }, { .name = "MMX_UOPS_EXEC", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb2, }, { .name = "MMX_INSTR_TYPE_EXEC", .desc = "Number of MMX instructions executed by type", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(p6_mmx_instr_type_exec), .ngrp = 1, .umasks = p6_mmx_instr_type_exec, }, { .name = "FP_MMX_TRANS", .desc = "Number of MMX transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(p6_fp_mmx_trans), .ngrp = 1, .umasks = p6_fp_mmx_trans, }, { .name = "MMX_ASSIST", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "SEG_RENAME_STALLS", .desc = "Number of Segment Register Renaming Stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd4, .numasks = LIBPFM_ARRAY_SIZE(p6_seg_rename_stalls), .ngrp = 1, .umasks = p6_seg_rename_stalls, }, { .name = "SEG_REG_RENAMES", .desc = "Number of Segment Register Renames", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd5, .numasks = LIBPFM_ARRAY_SIZE(p6_seg_rename_stalls), .ngrp = 1, .umasks = p6_seg_rename_stalls, /* identical to actual umasks list for this event */ }, { .name = "RET_SEG_RENAMES", .desc = "Number of segment register rename events retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd6, }, { .name = "EMON_KNI_PREF_DISPATCHED", .desc = "Number of Streaming SIMD extensions prefetch/weakly-ordered instructions dispatched (speculative prefetches are included in counting). Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(p6_emon_kni_pref_dispatched), .ngrp = 1, .umasks = p6_emon_kni_pref_dispatched, }, { .name = "EMON_KNI_PREF_MISS", .desc = "Number of prefetch/weakly-ordered instructions that miss all caches. Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(p6_emon_kni_pref_dispatched), .ngrp = 1, .umasks = p6_emon_kni_pref_dispatched, /* identical to actual umasks list for this event */ }, { .name = "L2_LD", .desc = "Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(p6_l2_ifetch), .ngrp = 1, .umasks = p6_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "Number of lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, }, { .name = "L2_LINES_OUT", .desc = "Number of lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, }, { .name = "L2_M_LINES_OUTM", .desc = "Number of modified lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, }, { .name = "EMON_KNI_INST_RETIRED", .desc = "Number of SSE instructions retired. Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd8, .numasks = LIBPFM_ARRAY_SIZE(p6_emon_kni_inst_retired), .ngrp = 1, .umasks = p6_emon_kni_inst_retired, }, { .name = "EMON_KNI_COMP_INST_RET", .desc = "Number of SSE computation instructions retired. Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd9, .numasks = LIBPFM_ARRAY_SIZE(p6_emon_kni_inst_retired), .ngrp = 1, .umasks = p6_emon_kni_inst_retired, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/ppc970mp_events.h0000600003276200002170000021202212247131124021450 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __PPC970MP_EVENTS_H__ #define __PPC970MP_EVENTS_H__ /* * File: ppc970mp_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define PPC970MP_PME_PM_LSU_REJECT_RELOAD_CDF 0 #define PPC970MP_PME_PM_MRK_LSU_SRQ_INST_VALID 1 #define PPC970MP_PME_PM_FPU1_SINGLE 2 #define PPC970MP_PME_PM_FPU0_STALL3 3 #define PPC970MP_PME_PM_TB_BIT_TRANS 4 #define PPC970MP_PME_PM_GPR_MAP_FULL_CYC 5 #define PPC970MP_PME_PM_MRK_ST_CMPL 6 #define PPC970MP_PME_PM_FPU0_STF 7 #define PPC970MP_PME_PM_FPU1_FMA 8 #define PPC970MP_PME_PM_LSU1_FLUSH_ULD 9 #define PPC970MP_PME_PM_MRK_INST_FIN 10 #define PPC970MP_PME_PM_MRK_LSU0_FLUSH_UST 11 #define PPC970MP_PME_PM_LSU_LRQ_S0_ALLOC 12 #define PPC970MP_PME_PM_FPU_FDIV 13 #define PPC970MP_PME_PM_FPU0_FULL_CYC 14 #define PPC970MP_PME_PM_FPU_SINGLE 15 #define PPC970MP_PME_PM_FPU0_FMA 16 #define PPC970MP_PME_PM_MRK_LSU1_FLUSH_ULD 17 #define PPC970MP_PME_PM_LSU1_FLUSH_LRQ 18 #define PPC970MP_PME_PM_DTLB_MISS 19 #define PPC970MP_PME_PM_CMPLU_STALL_FXU 20 #define PPC970MP_PME_PM_MRK_ST_MISS_L1 21 #define PPC970MP_PME_PM_EXT_INT 22 #define PPC970MP_PME_PM_MRK_LSU1_FLUSH_LRQ 23 #define PPC970MP_PME_PM_MRK_ST_GPS 24 #define PPC970MP_PME_PM_GRP_DISP_SUCCESS 25 #define PPC970MP_PME_PM_LSU1_LDF 26 #define PPC970MP_PME_PM_LSU0_SRQ_STFWD 27 #define PPC970MP_PME_PM_CR_MAP_FULL_CYC 28 #define PPC970MP_PME_PM_MRK_LSU0_FLUSH_ULD 29 #define PPC970MP_PME_PM_LSU_DERAT_MISS 30 #define PPC970MP_PME_PM_FPU0_SINGLE 31 #define PPC970MP_PME_PM_FPU1_FDIV 32 #define PPC970MP_PME_PM_FPU1_FEST 33 #define PPC970MP_PME_PM_FPU0_FRSP_FCONV 34 #define PPC970MP_PME_PM_GCT_EMPTY_SRQ_FULL 35 #define PPC970MP_PME_PM_MRK_ST_CMPL_INT 36 #define PPC970MP_PME_PM_FLUSH_BR_MPRED 37 #define PPC970MP_PME_PM_FXU_FIN 38 #define PPC970MP_PME_PM_FPU_STF 39 #define PPC970MP_PME_PM_DSLB_MISS 40 #define PPC970MP_PME_PM_FXLS1_FULL_CYC 41 #define PPC970MP_PME_PM_CMPLU_STALL_FPU 42 #define PPC970MP_PME_PM_LSU_LMQ_LHR_MERGE 43 #define PPC970MP_PME_PM_MRK_STCX_FAIL 44 #define PPC970MP_PME_PM_FXU0_BUSY_FXU1_IDLE 45 #define PPC970MP_PME_PM_CMPLU_STALL_LSU 46 #define PPC970MP_PME_PM_MRK_DATA_FROM_L25_SHR 47 #define PPC970MP_PME_PM_LSU_FLUSH_ULD 48 #define PPC970MP_PME_PM_MRK_BRU_FIN 49 #define PPC970MP_PME_PM_IERAT_XLATE_WR 50 #define PPC970MP_PME_PM_GCT_EMPTY_BR_MPRED 51 #define PPC970MP_PME_PM_LSU0_BUSY 52 #define PPC970MP_PME_PM_DATA_FROM_MEM 53 #define PPC970MP_PME_PM_FPR_MAP_FULL_CYC 54 #define PPC970MP_PME_PM_FPU1_FULL_CYC 55 #define PPC970MP_PME_PM_FPU0_FIN 56 #define PPC970MP_PME_PM_GRP_BR_REDIR 57 #define PPC970MP_PME_PM_GCT_EMPTY_IC_MISS 58 #define PPC970MP_PME_PM_THRESH_TIMEO 59 #define PPC970MP_PME_PM_FPU_FSQRT 60 #define PPC970MP_PME_PM_MRK_LSU0_FLUSH_LRQ 61 #define PPC970MP_PME_PM_PMC1_OVERFLOW 62 #define PPC970MP_PME_PM_FXLS0_FULL_CYC 63 #define PPC970MP_PME_PM_FPU0_ALL 64 #define PPC970MP_PME_PM_DATA_TABLEWALK_CYC 65 #define PPC970MP_PME_PM_FPU0_FEST 66 #define PPC970MP_PME_PM_DATA_FROM_L25_MOD 67 #define PPC970MP_PME_PM_LSU0_REJECT_ERAT_MISS 68 #define PPC970MP_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 69 #define PPC970MP_PME_PM_LSU0_REJECT_RELOAD_CDF 70 #define PPC970MP_PME_PM_FPU_FEST 71 #define PPC970MP_PME_PM_0INST_FETCH 72 #define PPC970MP_PME_PM_LD_MISS_L1_LSU0 73 #define PPC970MP_PME_PM_LSU1_REJECT_RELOAD_CDF 74 #define PPC970MP_PME_PM_L1_PREF 75 #define PPC970MP_PME_PM_FPU1_STALL3 76 #define PPC970MP_PME_PM_BRQ_FULL_CYC 77 #define PPC970MP_PME_PM_PMC8_OVERFLOW 78 #define PPC970MP_PME_PM_PMC7_OVERFLOW 79 #define PPC970MP_PME_PM_WORK_HELD 80 #define PPC970MP_PME_PM_MRK_LD_MISS_L1_LSU0 81 #define PPC970MP_PME_PM_FXU_IDLE 82 #define PPC970MP_PME_PM_INST_CMPL 83 #define PPC970MP_PME_PM_LSU1_FLUSH_UST 84 #define PPC970MP_PME_PM_LSU0_FLUSH_ULD 85 #define PPC970MP_PME_PM_LSU_FLUSH 86 #define PPC970MP_PME_PM_INST_FROM_L2 87 #define PPC970MP_PME_PM_LSU1_REJECT_LMQ_FULL 88 #define PPC970MP_PME_PM_PMC2_OVERFLOW 89 #define PPC970MP_PME_PM_FPU0_DENORM 90 #define PPC970MP_PME_PM_FPU1_FMOV_FEST 91 #define PPC970MP_PME_PM_INST_FETCH_CYC 92 #define PPC970MP_PME_PM_GRP_DISP_REJECT 93 #define PPC970MP_PME_PM_LSU_LDF 94 #define PPC970MP_PME_PM_INST_DISP 95 #define PPC970MP_PME_PM_DATA_FROM_L25_SHR 96 #define PPC970MP_PME_PM_L1_DCACHE_RELOAD_VALID 97 #define PPC970MP_PME_PM_MRK_GRP_ISSUED 98 #define PPC970MP_PME_PM_FPU_FMA 99 #define PPC970MP_PME_PM_MRK_CRU_FIN 100 #define PPC970MP_PME_PM_CMPLU_STALL_REJECT 101 #define PPC970MP_PME_PM_MRK_LSU1_FLUSH_UST 102 #define PPC970MP_PME_PM_MRK_FXU_FIN 103 #define PPC970MP_PME_PM_LSU1_REJECT_ERAT_MISS 104 #define PPC970MP_PME_PM_BR_ISSUED 105 #define PPC970MP_PME_PM_PMC4_OVERFLOW 106 #define PPC970MP_PME_PM_EE_OFF 107 #define PPC970MP_PME_PM_INST_FROM_L25_MOD 108 #define PPC970MP_PME_PM_CMPLU_STALL_ERAT_MISS 109 #define PPC970MP_PME_PM_ITLB_MISS 110 #define PPC970MP_PME_PM_FXU1_BUSY_FXU0_IDLE 111 #define PPC970MP_PME_PM_GRP_DISP_VALID 112 #define PPC970MP_PME_PM_MRK_GRP_DISP 113 #define PPC970MP_PME_PM_LSU_FLUSH_UST 114 #define PPC970MP_PME_PM_FXU1_FIN 115 #define PPC970MP_PME_PM_GRP_CMPL 116 #define PPC970MP_PME_PM_FPU_FRSP_FCONV 117 #define PPC970MP_PME_PM_MRK_LSU0_FLUSH_SRQ 118 #define PPC970MP_PME_PM_CMPLU_STALL_OTHER 119 #define PPC970MP_PME_PM_LSU_LMQ_FULL_CYC 120 #define PPC970MP_PME_PM_ST_REF_L1_LSU0 121 #define PPC970MP_PME_PM_LSU0_DERAT_MISS 122 #define PPC970MP_PME_PM_LSU_SRQ_SYNC_CYC 123 #define PPC970MP_PME_PM_FPU_STALL3 124 #define PPC970MP_PME_PM_LSU_REJECT_ERAT_MISS 125 #define PPC970MP_PME_PM_MRK_DATA_FROM_L2 126 #define PPC970MP_PME_PM_LSU0_FLUSH_SRQ 127 #define PPC970MP_PME_PM_FPU0_FMOV_FEST 128 #define PPC970MP_PME_PM_IOPS_CMPL 129 #define PPC970MP_PME_PM_LD_REF_L1_LSU0 130 #define PPC970MP_PME_PM_LSU1_FLUSH_SRQ 131 #define PPC970MP_PME_PM_CMPLU_STALL_DIV 132 #define PPC970MP_PME_PM_GRP_BR_MPRED 133 #define PPC970MP_PME_PM_LSU_LMQ_S0_ALLOC 134 #define PPC970MP_PME_PM_LSU0_REJECT_LMQ_FULL 135 #define PPC970MP_PME_PM_ST_REF_L1 136 #define PPC970MP_PME_PM_MRK_VMX_FIN 137 #define PPC970MP_PME_PM_LSU_SRQ_EMPTY_CYC 138 #define PPC970MP_PME_PM_FPU1_STF 139 #define PPC970MP_PME_PM_RUN_CYC 140 #define PPC970MP_PME_PM_LSU_LMQ_S0_VALID 141 #define PPC970MP_PME_PM_LSU0_LDF 142 #define PPC970MP_PME_PM_LSU_LRQ_S0_VALID 143 #define PPC970MP_PME_PM_PMC3_OVERFLOW 144 #define PPC970MP_PME_PM_MRK_IMR_RELOAD 145 #define PPC970MP_PME_PM_MRK_GRP_TIMEO 146 #define PPC970MP_PME_PM_FPU_FMOV_FEST 147 #define PPC970MP_PME_PM_GRP_DISP_BLK_SB_CYC 148 #define PPC970MP_PME_PM_XER_MAP_FULL_CYC 149 #define PPC970MP_PME_PM_ST_MISS_L1 150 #define PPC970MP_PME_PM_STOP_COMPLETION 151 #define PPC970MP_PME_PM_MRK_GRP_CMPL 152 #define PPC970MP_PME_PM_ISLB_MISS 153 #define PPC970MP_PME_PM_SUSPENDED 154 #define PPC970MP_PME_PM_CYC 155 #define PPC970MP_PME_PM_LD_MISS_L1_LSU1 156 #define PPC970MP_PME_PM_STCX_FAIL 157 #define PPC970MP_PME_PM_LSU1_SRQ_STFWD 158 #define PPC970MP_PME_PM_GRP_DISP 159 #define PPC970MP_PME_PM_L2_PREF 160 #define PPC970MP_PME_PM_FPU1_DENORM 161 #define PPC970MP_PME_PM_DATA_FROM_L2 162 #define PPC970MP_PME_PM_FPU0_FPSCR 163 #define PPC970MP_PME_PM_MRK_DATA_FROM_L25_MOD 164 #define PPC970MP_PME_PM_FPU0_FSQRT 165 #define PPC970MP_PME_PM_LD_REF_L1 166 #define PPC970MP_PME_PM_MRK_L1_RELOAD_VALID 167 #define PPC970MP_PME_PM_1PLUS_PPC_CMPL 168 #define PPC970MP_PME_PM_INST_FROM_L1 169 #define PPC970MP_PME_PM_EE_OFF_EXT_INT 170 #define PPC970MP_PME_PM_PMC6_OVERFLOW 171 #define PPC970MP_PME_PM_LSU_LRQ_FULL_CYC 172 #define PPC970MP_PME_PM_IC_PREF_INSTALL 173 #define PPC970MP_PME_PM_DC_PREF_OUT_OF_STREAMS 174 #define PPC970MP_PME_PM_MRK_LSU1_FLUSH_SRQ 175 #define PPC970MP_PME_PM_GCT_FULL_CYC 176 #define PPC970MP_PME_PM_INST_FROM_MEM 177 #define PPC970MP_PME_PM_FLUSH_LSU_BR_MPRED 178 #define PPC970MP_PME_PM_FXU_BUSY 179 #define PPC970MP_PME_PM_ST_REF_L1_LSU1 180 #define PPC970MP_PME_PM_MRK_LD_MISS_L1 181 #define PPC970MP_PME_PM_L1_WRITE_CYC 182 #define PPC970MP_PME_PM_LSU1_BUSY 183 #define PPC970MP_PME_PM_LSU_REJECT_LMQ_FULL 184 #define PPC970MP_PME_PM_CMPLU_STALL_FDIV 185 #define PPC970MP_PME_PM_FPU_ALL 186 #define PPC970MP_PME_PM_LSU_SRQ_S0_ALLOC 187 #define PPC970MP_PME_PM_INST_FROM_L25_SHR 188 #define PPC970MP_PME_PM_GRP_MRK 189 #define PPC970MP_PME_PM_BR_MPRED_CR 190 #define PPC970MP_PME_PM_DC_PREF_STREAM_ALLOC 191 #define PPC970MP_PME_PM_FPU1_FIN 192 #define PPC970MP_PME_PM_LSU_REJECT_SRQ 193 #define PPC970MP_PME_PM_BR_MPRED_TA 194 #define PPC970MP_PME_PM_CRQ_FULL_CYC 195 #define PPC970MP_PME_PM_LD_MISS_L1 196 #define PPC970MP_PME_PM_INST_FROM_PREF 197 #define PPC970MP_PME_PM_STCX_PASS 198 #define PPC970MP_PME_PM_DC_INV_L2 199 #define PPC970MP_PME_PM_LSU_SRQ_FULL_CYC 200 #define PPC970MP_PME_PM_LSU0_FLUSH_LRQ 201 #define PPC970MP_PME_PM_LSU_SRQ_S0_VALID 202 #define PPC970MP_PME_PM_LARX_LSU0 203 #define PPC970MP_PME_PM_GCT_EMPTY_CYC 204 #define PPC970MP_PME_PM_FPU1_ALL 205 #define PPC970MP_PME_PM_FPU1_FSQRT 206 #define PPC970MP_PME_PM_FPU_FIN 207 #define PPC970MP_PME_PM_LSU_SRQ_STFWD 208 #define PPC970MP_PME_PM_MRK_LD_MISS_L1_LSU1 209 #define PPC970MP_PME_PM_FXU0_FIN 210 #define PPC970MP_PME_PM_MRK_FPU_FIN 211 #define PPC970MP_PME_PM_PMC5_OVERFLOW 212 #define PPC970MP_PME_PM_SNOOP_TLBIE 213 #define PPC970MP_PME_PM_FPU1_FRSP_FCONV 214 #define PPC970MP_PME_PM_FPU0_FDIV 215 #define PPC970MP_PME_PM_LD_REF_L1_LSU1 216 #define PPC970MP_PME_PM_HV_CYC 217 #define PPC970MP_PME_PM_LR_CTR_MAP_FULL_CYC 218 #define PPC970MP_PME_PM_FPU_DENORM 219 #define PPC970MP_PME_PM_LSU0_REJECT_SRQ 220 #define PPC970MP_PME_PM_LSU1_REJECT_SRQ 221 #define PPC970MP_PME_PM_LSU1_DERAT_MISS 222 #define PPC970MP_PME_PM_IC_PREF_REQ 223 #define PPC970MP_PME_PM_MRK_LSU_FIN 224 #define PPC970MP_PME_PM_MRK_DATA_FROM_MEM 225 #define PPC970MP_PME_PM_CMPLU_STALL_DCACHE_MISS 226 #define PPC970MP_PME_PM_LSU0_FLUSH_UST 227 #define PPC970MP_PME_PM_LSU_FLUSH_LRQ 228 #define PPC970MP_PME_PM_LSU_FLUSH_SRQ 229 static const pme_power_entry_t ppc970mp_pe[] = { [ PPC970MP_PME_PM_LSU_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU_REJECT_RELOAD_CDF", .pme_code = 0x6920, .pme_short_desc = "LSU reject due to reload CDF or tag update collision", .pme_long_desc = "LSU reject due to reload CDF or tag update collision", }, [ PPC970MP_PME_PM_MRK_LSU_SRQ_INST_VALID ] = { .pme_name = "PM_MRK_LSU_SRQ_INST_VALID", .pme_code = 0x936, .pme_short_desc = "Marked instruction valid in SRQ", .pme_long_desc = "This signal is asserted every cycle when a marked request is resident in the Store Request Queue", }, [ PPC970MP_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0x127, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing single precision instruction.", }, [ PPC970MP_PME_PM_FPU0_STALL3 ] = { .pme_name = "PM_FPU0_STALL3", .pme_code = 0x121, .pme_short_desc = "FPU0 stalled in pipe3", .pme_long_desc = "This signal indicates that fp0 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ PPC970MP_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x8005, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ PPC970MP_PME_PM_GPR_MAP_FULL_CYC ] = { .pme_name = "PM_GPR_MAP_FULL_CYC", .pme_code = 0x335, .pme_short_desc = "Cycles GPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the gpr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970MP_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x1003, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ PPC970MP_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0x122, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a store instruction.", }, [ PPC970MP_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0x105, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0x804, .pme_short_desc = "LSU1 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970MP_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x7005, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970MP_PME_PM_MRK_LSU0_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU0_FLUSH_UST", .pme_code = 0x711, .pme_short_desc = "LSU0 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 0 because it was unaligned", }, [ PPC970MP_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0x826, .pme_short_desc = "LRQ slot 0 allocated", .pme_long_desc = "LRQ slot zero was allocated", }, [ PPC970MP_PME_PM_FPU_FDIV ] = { .pme_name = "PM_FPU_FDIV", .pme_code = 0x1100, .pme_short_desc = "FPU executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_FPU0_FULL_CYC ] = { .pme_name = "PM_FPU0_FULL_CYC", .pme_code = 0x303, .pme_short_desc = "Cycles FPU0 issue queue full", .pme_long_desc = "The issue queue for FPU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970MP_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x5120, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0x101, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_MRK_LSU1_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU1_FLUSH_ULD", .pme_code = 0x714, .pme_short_desc = "LSU1 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970MP_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0x806, .pme_short_desc = "LSU1 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970MP_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x704, .pme_short_desc = "Data TLB misses", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ PPC970MP_PME_PM_CMPLU_STALL_FXU ] = { .pme_name = "PM_CMPLU_STALL_FXU", .pme_code = 0x508b, .pme_short_desc = "Completion stall caused by FXU instruction", .pme_long_desc = "Completion stall caused by FXU instruction", }, [ PPC970MP_PME_PM_MRK_ST_MISS_L1 ] = { .pme_name = "PM_MRK_ST_MISS_L1", .pme_code = 0x723, .pme_short_desc = "Marked L1 D cache store misses", .pme_long_desc = "A marked store missed the dcache", }, [ PPC970MP_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x8002, .pme_short_desc = "External interrupts", .pme_long_desc = "An external interrupt occurred", }, [ PPC970MP_PME_PM_MRK_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_LRQ", .pme_code = 0x716, .pme_short_desc = "LSU1 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970MP_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x6003, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ PPC970MP_PME_PM_GRP_DISP_SUCCESS ] = { .pme_name = "PM_GRP_DISP_SUCCESS", .pme_code = 0x5001, .pme_short_desc = "Group dispatch success", .pme_long_desc = "Number of groups sucessfully dispatched (not rejected)", }, [ PPC970MP_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0x734, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 1", }, [ PPC970MP_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0x820, .pme_short_desc = "LSU0 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0", }, [ PPC970MP_PME_PM_CR_MAP_FULL_CYC ] = { .pme_name = "PM_CR_MAP_FULL_CYC", .pme_code = 0x304, .pme_short_desc = "Cycles CR logical operation mapper full", .pme_long_desc = "The ISU sends a signal indicating that the cr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970MP_PME_PM_MRK_LSU0_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU0_FLUSH_ULD", .pme_code = 0x710, .pme_short_desc = "LSU0 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970MP_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x6700, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total D-ERAT Misses (Unit 0 + Unit 1). Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ PPC970MP_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0x123, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing single precision instruction.", }, [ PPC970MP_PME_PM_FPU1_FDIV ] = { .pme_name = "PM_FPU1_FDIV", .pme_code = 0x104, .pme_short_desc = "FPU1 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ PPC970MP_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0x116, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ PPC970MP_PME_PM_FPU0_FRSP_FCONV ] = { .pme_name = "PM_FPU0_FRSP_FCONV", .pme_code = 0x111, .pme_short_desc = "FPU0 executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when fp0 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_GCT_EMPTY_SRQ_FULL ] = { .pme_name = "PM_GCT_EMPTY_SRQ_FULL", .pme_code = 0x200b, .pme_short_desc = "GCT empty caused by SRQ full", .pme_long_desc = "GCT empty caused by SRQ full", }, [ PPC970MP_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x3003, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ PPC970MP_PME_PM_FLUSH_BR_MPRED ] = { .pme_name = "PM_FLUSH_BR_MPRED", .pme_code = 0x316, .pme_short_desc = "Flush caused by branch mispredict", .pme_long_desc = "Flush caused by branch mispredict", }, [ PPC970MP_PME_PM_FXU_FIN ] = { .pme_name = "PM_FXU_FIN", .pme_code = 0x3330, .pme_short_desc = "FXU produced a result", .pme_long_desc = "The fixed point unit (Unit 0 + Unit 1) finished an instruction. Instructions that finish may not necessary complete.", }, [ PPC970MP_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x6120, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU is executing a store instruction. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x705, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve", }, [ PPC970MP_PME_PM_FXLS1_FULL_CYC ] = { .pme_name = "PM_FXLS1_FULL_CYC", .pme_code = 0x314, .pme_short_desc = "Cycles FXU1/LS1 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970MP_PME_PM_CMPLU_STALL_FPU ] = { .pme_name = "PM_CMPLU_STALL_FPU", .pme_code = 0x704b, .pme_short_desc = "Completion stall caused by FPU instruction", .pme_long_desc = "Completion stall caused by FPU instruction", }, [ PPC970MP_PME_PM_LSU_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU_LMQ_LHR_MERGE", .pme_code = 0x935, .pme_short_desc = "LMQ LHR merges", .pme_long_desc = "A dcache miss occured for the same real cache line address as an earlier request already in the Load Miss Queue and was merged into the LMQ entry.", }, [ PPC970MP_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x726, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ PPC970MP_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x7002, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ PPC970MP_PME_PM_CMPLU_STALL_LSU ] = { .pme_name = "PM_CMPLU_STALL_LSU", .pme_code = 0x504b, .pme_short_desc = "Completion stall caused by LSU instruction", .pme_long_desc = "Completion stall caused by LSU instruction", }, [ PPC970MP_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x5937, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a marked demand load", }, [ PPC970MP_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0x1800, .pme_short_desc = "LRQ unaligned load flushes", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970MP_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x2005, .pme_short_desc = "Marked instruction BRU processing finished", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970MP_PME_PM_IERAT_XLATE_WR ] = { .pme_name = "PM_IERAT_XLATE_WR", .pme_code = 0x430, .pme_short_desc = "Translation written to ierat", .pme_long_desc = "This signal will be asserted each time the I-ERAT is written. This indicates that an ERAT miss has been serviced. ERAT misses will initiate a sequence resulting in the ERAT being written. ERAT misses that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed, This should be a fairly accurate count of ERAT missed (best available).", }, [ PPC970MP_PME_PM_GCT_EMPTY_BR_MPRED ] = { .pme_name = "PM_GCT_EMPTY_BR_MPRED", .pme_code = 0x708c, .pme_short_desc = "GCT empty due to branch mispredict", .pme_long_desc = "GCT empty due to branch mispredict", }, [ PPC970MP_PME_PM_LSU0_BUSY ] = { .pme_name = "PM_LSU0_BUSY", .pme_code = 0x823, .pme_short_desc = "LSU0 busy", .pme_long_desc = "LSU unit 0 is busy rejecting instructions", }, [ PPC970MP_PME_PM_DATA_FROM_MEM ] = { .pme_name = "PM_DATA_FROM_MEM", .pme_code = 0x2837, .pme_short_desc = "Data loaded from memory", .pme_long_desc = "Data loaded from memory", }, [ PPC970MP_PME_PM_FPR_MAP_FULL_CYC ] = { .pme_name = "PM_FPR_MAP_FULL_CYC", .pme_code = 0x301, .pme_short_desc = "Cycles FPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the FPR mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970MP_PME_PM_FPU1_FULL_CYC ] = { .pme_name = "PM_FPU1_FULL_CYC", .pme_code = 0x307, .pme_short_desc = "Cycles FPU1 issue queue full", .pme_long_desc = "The issue queue for FPU unit 1 cannot accept any more instructions. Issue is stopped", }, [ PPC970MP_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0x113, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "fp0 finished, produced a result This only indicates finish, not completion. ", }, [ PPC970MP_PME_PM_GRP_BR_REDIR ] = { .pme_name = "PM_GRP_BR_REDIR", .pme_code = 0x326, .pme_short_desc = "Group experienced branch redirect", .pme_long_desc = "Group experienced branch redirect", }, [ PPC970MP_PME_PM_GCT_EMPTY_IC_MISS ] = { .pme_name = "PM_GCT_EMPTY_IC_MISS", .pme_code = 0x508c, .pme_short_desc = "GCT empty due to I cache miss", .pme_long_desc = "GCT empty due to I cache miss", }, [ PPC970MP_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x2003, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ PPC970MP_PME_PM_FPU_FSQRT ] = { .pme_name = "PM_FPU_FSQRT", .pme_code = 0x6100, .pme_short_desc = "FPU executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_MRK_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_LRQ", .pme_code = 0x712, .pme_short_desc = "LSU0 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970MP_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x200a, .pme_short_desc = "PMC1 Overflow", .pme_long_desc = "PMC1 Overflow", }, [ PPC970MP_PME_PM_FXLS0_FULL_CYC ] = { .pme_name = "PM_FXLS0_FULL_CYC", .pme_code = 0x310, .pme_short_desc = "Cycles FXU0/LS0 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970MP_PME_PM_FPU0_ALL ] = { .pme_name = "PM_FPU0_ALL", .pme_code = 0x103, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ PPC970MP_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x707, .pme_short_desc = "Cycles doing data tablewalks", .pme_long_desc = "This signal is asserted every cycle when a tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ PPC970MP_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0x112, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ PPC970MP_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x6837, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a demand load", }, [ PPC970MP_PME_PM_LSU0_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU0_REJECT_ERAT_MISS", .pme_code = 0x923, .pme_short_desc = "LSU0 reject due to ERAT miss", .pme_long_desc = "LSU0 reject due to ERAT miss", }, [ PPC970MP_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x2002, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ PPC970MP_PME_PM_LSU0_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU0_REJECT_RELOAD_CDF", .pme_code = 0x922, .pme_short_desc = "LSU0 reject due to reload CDF or tag update collision", .pme_long_desc = "LSU0 reject due to reload CDF or tag update collision", }, [ PPC970MP_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x3110, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ PPC970MP_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x442d, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ PPC970MP_PME_PM_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_LD_MISS_L1_LSU0", .pme_code = 0x812, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 0, missed the dcache", }, [ PPC970MP_PME_PM_LSU1_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU1_REJECT_RELOAD_CDF", .pme_code = 0x926, .pme_short_desc = "LSU1 reject due to reload CDF or tag update collision", .pme_long_desc = "LSU1 reject due to reload CDF or tag update collision", }, [ PPC970MP_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0x731, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ PPC970MP_PME_PM_FPU1_STALL3 ] = { .pme_name = "PM_FPU1_STALL3", .pme_code = 0x125, .pme_short_desc = "FPU1 stalled in pipe3", .pme_long_desc = "This signal indicates that fp1 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ PPC970MP_PME_PM_BRQ_FULL_CYC ] = { .pme_name = "PM_BRQ_FULL_CYC", .pme_code = 0x305, .pme_short_desc = "Cycles branch queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu br unit cannot accept any more group (queue is full of groups).", }, [ PPC970MP_PME_PM_PMC8_OVERFLOW ] = { .pme_name = "PM_PMC8_OVERFLOW", .pme_code = 0x100a, .pme_short_desc = "PMC8 Overflow", .pme_long_desc = "PMC8 Overflow", }, [ PPC970MP_PME_PM_PMC7_OVERFLOW ] = { .pme_name = "PM_PMC7_OVERFLOW", .pme_code = 0x800a, .pme_short_desc = "PMC7 Overflow", .pme_long_desc = "PMC7 Overflow", }, [ PPC970MP_PME_PM_WORK_HELD ] = { .pme_name = "PM_WORK_HELD", .pme_code = 0x2001, .pme_short_desc = "Work held", .pme_long_desc = "RAS Unit has signaled completion to stop and there are groups waiting to complete", }, [ PPC970MP_PME_PM_MRK_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU0", .pme_code = 0x720, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 0, missed the dcache", }, [ PPC970MP_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x5002, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle", }, [ PPC970MP_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x1, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of Eligible Instructions that completed. ", }, [ PPC970MP_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0x805, .pme_short_desc = "LSU1 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ PPC970MP_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0x800, .pme_short_desc = "LSU0 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970MP_PME_PM_LSU_FLUSH ] = { .pme_name = "PM_LSU_FLUSH", .pme_code = 0x315, .pme_short_desc = "Flush initiated by LSU", .pme_long_desc = "Flush initiated by LSU", }, [ PPC970MP_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x1426, .pme_short_desc = "Instructions fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ PPC970MP_PME_PM_LSU1_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_LMQ_FULL", .pme_code = 0x925, .pme_short_desc = "LSU1 reject due to LMQ full or missed data coming", .pme_long_desc = "LSU1 reject due to LMQ full or missed data coming", }, [ PPC970MP_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x300a, .pme_short_desc = "PMC2 Overflow", .pme_long_desc = "PMC2 Overflow", }, [ PPC970MP_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0x120, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ PPC970MP_PME_PM_FPU1_FMOV_FEST ] = { .pme_name = "PM_FPU1_FMOV_FEST", .pme_code = 0x114, .pme_short_desc = "FPU1 executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ PPC970MP_PME_PM_INST_FETCH_CYC ] = { .pme_name = "PM_INST_FETCH_CYC", .pme_code = 0x424, .pme_short_desc = "Cycles at least 1 instruction fetched", .pme_long_desc = "Asserted each cycle when the IFU sends at least one instruction to the IDU. ", }, [ PPC970MP_PME_PM_GRP_DISP_REJECT ] = { .pme_name = "PM_GRP_DISP_REJECT", .pme_code = 0x324, .pme_short_desc = "Group dispatch rejected", .pme_long_desc = "A group that previously attempted dispatch was rejected.", }, [ PPC970MP_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x8730, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction", }, [ PPC970MP_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x320, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "The ISU sends the number of instructions dispatched.", }, [ PPC970MP_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x5837, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a demand load", }, [ PPC970MP_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0x834, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid", }, [ PPC970MP_PME_PM_MRK_GRP_ISSUED ] = { .pme_name = "PM_MRK_GRP_ISSUED", .pme_code = 0x6005, .pme_short_desc = "Marked group issued", .pme_long_desc = "A sampled instruction was issued", }, [ PPC970MP_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x2100, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_MRK_CRU_FIN ] = { .pme_name = "PM_MRK_CRU_FIN", .pme_code = 0x4005, .pme_short_desc = "Marked instruction CRU processing finished", .pme_long_desc = "The Condition Register Unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970MP_PME_PM_CMPLU_STALL_REJECT ] = { .pme_name = "PM_CMPLU_STALL_REJECT", .pme_code = 0x70cb, .pme_short_desc = "Completion stall caused by reject", .pme_long_desc = "Completion stall caused by reject", }, [ PPC970MP_PME_PM_MRK_LSU1_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU1_FLUSH_UST", .pme_code = 0x715, .pme_short_desc = "LSU1 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ PPC970MP_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x6004, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "Marked instruction FXU processing finished", }, [ PPC970MP_PME_PM_LSU1_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU1_REJECT_ERAT_MISS", .pme_code = 0x927, .pme_short_desc = "LSU1 reject due to ERAT miss", .pme_long_desc = "LSU1 reject due to ERAT miss", }, [ PPC970MP_PME_PM_BR_ISSUED ] = { .pme_name = "PM_BR_ISSUED", .pme_code = 0x431, .pme_short_desc = "Branches issued", .pme_long_desc = "This signal will be asserted each time the ISU issues a branch instruction. This signal will be asserted each time the ISU selects a branch instruction to issue.", }, [ PPC970MP_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x500a, .pme_short_desc = "PMC4 Overflow", .pme_long_desc = "PMC4 Overflow", }, [ PPC970MP_PME_PM_EE_OFF ] = { .pme_name = "PM_EE_OFF", .pme_code = 0x333, .pme_short_desc = "Cycles MSR(EE) bit off", .pme_long_desc = "The number of Cycles MSR(EE) bit was off.", }, [ PPC970MP_PME_PM_INST_FROM_L25_MOD ] = { .pme_name = "PM_INST_FROM_L25_MOD", .pme_code = 0x6426, .pme_short_desc = "Instruction fetched from L2.5 modified", .pme_long_desc = "Instruction fetched from L2.5 modified", }, [ PPC970MP_PME_PM_CMPLU_STALL_ERAT_MISS ] = { .pme_name = "PM_CMPLU_STALL_ERAT_MISS", .pme_code = 0x704c, .pme_short_desc = "Completion stall caused by ERAT miss", .pme_long_desc = "Completion stall caused by ERAT miss", }, [ PPC970MP_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x700, .pme_short_desc = "Instruction TLB misses", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ PPC970MP_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x4002, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy", }, [ PPC970MP_PME_PM_GRP_DISP_VALID ] = { .pme_name = "PM_GRP_DISP_VALID", .pme_code = 0x323, .pme_short_desc = "Group dispatch valid", .pme_long_desc = "Dispatch has been attempted for a valid group. Some groups may be rejected. The total number of successful dispatches is the number of dispatch valid minus dispatch reject.", }, [ PPC970MP_PME_PM_MRK_GRP_DISP ] = { .pme_name = "PM_MRK_GRP_DISP", .pme_code = 0x1002, .pme_short_desc = "Marked group dispatched", .pme_long_desc = "A group containing a sampled instruction was dispatched", }, [ PPC970MP_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0x2800, .pme_short_desc = "SRQ unaligned store flushes", .pme_long_desc = "A store was flushed because it was unaligned", }, [ PPC970MP_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x336, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result", }, [ PPC970MP_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x7003, .pme_short_desc = "Group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ PPC970MP_PME_PM_FPU_FRSP_FCONV ] = { .pme_name = "PM_FPU_FRSP_FCONV", .pme_code = 0x7110, .pme_short_desc = "FPU executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_MRK_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_SRQ", .pme_code = 0x713, .pme_short_desc = "LSU0 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970MP_PME_PM_CMPLU_STALL_OTHER ] = { .pme_name = "PM_CMPLU_STALL_OTHER", .pme_code = 0x100b, .pme_short_desc = "Completion stall caused by other reason", .pme_long_desc = "Completion stall caused by other reason", }, [ PPC970MP_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0x837, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The LMQ was full", }, [ PPC970MP_PME_PM_ST_REF_L1_LSU0 ] = { .pme_name = "PM_ST_REF_L1_LSU0", .pme_code = 0x811, .pme_short_desc = "LSU0 L1 D cache store references", .pme_long_desc = "A store executed on unit 0", }, [ PPC970MP_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x702, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 0 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ PPC970MP_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0x735, .pme_short_desc = "SRQ sync duration", .pme_long_desc = "This signal is asserted every cycle when a sync is in the SRQ.", }, [ PPC970MP_PME_PM_FPU_STALL3 ] = { .pme_name = "PM_FPU_STALL3", .pme_code = 0x2120, .pme_short_desc = "FPU stalled in pipe3", .pme_long_desc = "FPU has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU_REJECT_ERAT_MISS", .pme_code = 0x5920, .pme_short_desc = "LSU reject due to ERAT miss", .pme_long_desc = "LSU reject due to ERAT miss", }, [ PPC970MP_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1937, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a marked demand load", }, [ PPC970MP_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0x803, .pme_short_desc = "LSU0 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970MP_PME_PM_FPU0_FMOV_FEST ] = { .pme_name = "PM_FPU0_FMOV_FEST", .pme_code = 0x110, .pme_short_desc = "FPU0 executed FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ PPC970MP_PME_PM_IOPS_CMPL ] = { .pme_name = "PM_IOPS_CMPL", .pme_code = 0x1001, .pme_short_desc = "IOPS instructions completed", .pme_long_desc = "Number of IOPS Instructions that completed.", }, [ PPC970MP_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0x810, .pme_short_desc = "LSU0 L1 D cache load references", .pme_long_desc = "A load executed on unit 0", }, [ PPC970MP_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0x807, .pme_short_desc = "LSU1 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. ", }, [ PPC970MP_PME_PM_CMPLU_STALL_DIV ] = { .pme_name = "PM_CMPLU_STALL_DIV", .pme_code = 0x708b, .pme_short_desc = "Completion stall caused by DIV instruction", .pme_long_desc = "Completion stall caused by DIV instruction", }, [ PPC970MP_PME_PM_GRP_BR_MPRED ] = { .pme_name = "PM_GRP_BR_MPRED", .pme_code = 0x327, .pme_short_desc = "Group experienced a branch mispredict", .pme_long_desc = "Group experienced a branch mispredict", }, [ PPC970MP_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0x836, .pme_short_desc = "LMQ slot 0 allocated", .pme_long_desc = "The first entry in the LMQ was allocated.", }, [ PPC970MP_PME_PM_LSU0_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_LMQ_FULL", .pme_code = 0x921, .pme_short_desc = "LSU0 reject due to LMQ full or missed data coming", .pme_long_desc = "LSU0 reject due to LMQ full or missed data coming", }, [ PPC970MP_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x7810, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Total DL1 Store references", }, [ PPC970MP_PME_PM_MRK_VMX_FIN ] = { .pme_name = "PM_MRK_VMX_FIN", .pme_code = 0x3005, .pme_short_desc = "Marked instruction VMX processing finished", .pme_long_desc = "Marked instruction VMX processing finished", }, [ PPC970MP_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x4003, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "The Store Request Queue is empty", }, [ PPC970MP_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0x126, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a store instruction.", }, [ PPC970MP_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x1005, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch", }, [ PPC970MP_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0x835, .pme_short_desc = "LMQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle when the first entry in the LMQ is valid. The LMQ had eight entries that are allocated FIFO", }, [ PPC970MP_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0x730, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 0", }, [ PPC970MP_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0x822, .pme_short_desc = "LRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ PPC970MP_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x400a, .pme_short_desc = "PMC3 Overflow", .pme_long_desc = "PMC3 Overflow", }, [ PPC970MP_PME_PM_MRK_IMR_RELOAD ] = { .pme_name = "PM_MRK_IMR_RELOAD", .pme_code = 0x722, .pme_short_desc = "Marked IMR reloaded", .pme_long_desc = "A DL1 reload occured due to marked load", }, [ PPC970MP_PME_PM_MRK_GRP_TIMEO ] = { .pme_name = "PM_MRK_GRP_TIMEO", .pme_code = 0x5005, .pme_short_desc = "Marked group completion timeout", .pme_long_desc = "The sampling timeout expired indicating that the previously sampled instruction is no longer in the processor", }, [ PPC970MP_PME_PM_FPU_FMOV_FEST ] = { .pme_name = "PM_FPU_FMOV_FEST", .pme_code = 0x8110, .pme_short_desc = "FPU executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ . Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_GRP_DISP_BLK_SB_CYC ] = { .pme_name = "PM_GRP_DISP_BLK_SB_CYC", .pme_code = 0x331, .pme_short_desc = "Cycles group dispatch blocked by scoreboard", .pme_long_desc = "The ISU sends a signal indicating that dispatch is blocked by scoreboard.", }, [ PPC970MP_PME_PM_XER_MAP_FULL_CYC ] = { .pme_name = "PM_XER_MAP_FULL_CYC", .pme_code = 0x302, .pme_short_desc = "Cycles XER mapper full", .pme_long_desc = "The ISU sends a signal indicating that the xer mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970MP_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0x813, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache", }, [ PPC970MP_PME_PM_STOP_COMPLETION ] = { .pme_name = "PM_STOP_COMPLETION", .pme_code = 0x3001, .pme_short_desc = "Completion stopped", .pme_long_desc = "RAS Unit has signaled completion to stop", }, [ PPC970MP_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x4004, .pme_short_desc = "Marked group completed", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ PPC970MP_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x701, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ PPC970MP_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Suspended", .pme_long_desc = "Suspended", }, [ PPC970MP_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x7, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ PPC970MP_PME_PM_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_LD_MISS_L1_LSU1", .pme_code = 0x816, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 1, missed the dcache", }, [ PPC970MP_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x721, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ PPC970MP_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0x824, .pme_short_desc = "LSU1 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1", }, [ PPC970MP_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x2004, .pme_short_desc = "Group dispatches", .pme_long_desc = "A group was dispatched", }, [ PPC970MP_PME_PM_L2_PREF ] = { .pme_name = "PM_L2_PREF", .pme_code = 0x733, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "A request to prefetch data into L2 was made", }, [ PPC970MP_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0x124, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ PPC970MP_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1837, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a demand load", }, [ PPC970MP_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0x130, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing fpscr move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*. mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs", }, [ PPC970MP_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x6937, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a marked demand load", }, [ PPC970MP_PME_PM_FPU0_FSQRT ] = { .pme_name = "PM_FPU0_FSQRT", .pme_code = 0x102, .pme_short_desc = "FPU0 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x8810, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Total DL1 Load references", }, [ PPC970MP_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0x934, .pme_short_desc = "Marked L1 reload data source valid", .pme_long_desc = "The source information is valid and is for a marked load", }, [ PPC970MP_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x5003, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ PPC970MP_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x142d, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ PPC970MP_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x337, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles MSR(EE) bit off and external interrupt pending", }, [ PPC970MP_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x700a, .pme_short_desc = "PMC6 Overflow", .pme_long_desc = "PMC6 Overflow", }, [ PPC970MP_PME_PM_LSU_LRQ_FULL_CYC ] = { .pme_name = "PM_LSU_LRQ_FULL_CYC", .pme_code = 0x312, .pme_short_desc = "Cycles LRQ full", .pme_long_desc = "The ISU sends this signal when the LRQ is full.", }, [ PPC970MP_PME_PM_IC_PREF_INSTALL ] = { .pme_name = "PM_IC_PREF_INSTALL", .pme_code = 0x427, .pme_short_desc = "Instruction prefetched installed in prefetch", .pme_long_desc = "New line coming into the prefetch buffer", }, [ PPC970MP_PME_PM_DC_PREF_OUT_OF_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_OF_STREAMS", .pme_code = 0x732, .pme_short_desc = "D cache out of streams", .pme_long_desc = "out of streams", }, [ PPC970MP_PME_PM_MRK_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_SRQ", .pme_code = 0x717, .pme_short_desc = "LSU1 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970MP_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x300, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The ISU sends a signal indicating the gct is full. ", }, [ PPC970MP_PME_PM_INST_FROM_MEM ] = { .pme_name = "PM_INST_FROM_MEM", .pme_code = 0x2426, .pme_short_desc = "Instruction fetched from memory", .pme_long_desc = "Instruction fetched from memory", }, [ PPC970MP_PME_PM_FLUSH_LSU_BR_MPRED ] = { .pme_name = "PM_FLUSH_LSU_BR_MPRED", .pme_code = 0x317, .pme_short_desc = "Flush caused by LSU or branch mispredict", .pme_long_desc = "Flush caused by LSU or branch mispredict", }, [ PPC970MP_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x6002, .pme_short_desc = "FXU busy", .pme_long_desc = "FXU0 and FXU1 are both busy", }, [ PPC970MP_PME_PM_ST_REF_L1_LSU1 ] = { .pme_name = "PM_ST_REF_L1_LSU1", .pme_code = 0x815, .pme_short_desc = "LSU1 L1 D cache store references", .pme_long_desc = "A store executed on unit 1", }, [ PPC970MP_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x1720, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ PPC970MP_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x434, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "This signal is asserted each cycle a cache write is active.", }, [ PPC970MP_PME_PM_LSU1_BUSY ] = { .pme_name = "PM_LSU1_BUSY", .pme_code = 0x827, .pme_short_desc = "LSU1 busy", .pme_long_desc = "LSU unit 0 is busy rejecting instructions ", }, [ PPC970MP_PME_PM_LSU_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU_REJECT_LMQ_FULL", .pme_code = 0x2920, .pme_short_desc = "LSU reject due to LMQ full or missed data coming", .pme_long_desc = "LSU reject due to LMQ full or missed data coming", }, [ PPC970MP_PME_PM_CMPLU_STALL_FDIV ] = { .pme_name = "PM_CMPLU_STALL_FDIV", .pme_code = 0x504c, .pme_short_desc = "Completion stall caused by FDIV or FQRT instruction", .pme_long_desc = "Completion stall caused by FDIV or FQRT instruction", }, [ PPC970MP_PME_PM_FPU_ALL ] = { .pme_name = "PM_FPU_ALL", .pme_code = 0x5100, .pme_short_desc = "FPU executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0x825, .pme_short_desc = "SRQ slot 0 allocated", .pme_long_desc = "SRQ Slot zero was allocated", }, [ PPC970MP_PME_PM_INST_FROM_L25_SHR ] = { .pme_name = "PM_INST_FROM_L25_SHR", .pme_code = 0x5426, .pme_short_desc = "Instruction fetched from L2.5 shared", .pme_long_desc = "Instruction fetched from L2.5 shared", }, [ PPC970MP_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x5004, .pme_short_desc = "Group marked in IDU", .pme_long_desc = "A group was sampled (marked)", }, [ PPC970MP_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x432, .pme_short_desc = "Branch mispredictions due to CR bit setting", .pme_long_desc = "This signal is asserted when the branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This signal is asserted after a branch issue event and will result in a branch redirect flush if not overridden by a flush of an older instruction.", }, [ PPC970MP_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x737, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated", }, [ PPC970MP_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0x117, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "fp1 finished, produced a result. This only indicates finish, not completion. ", }, [ PPC970MP_PME_PM_LSU_REJECT_SRQ ] = { .pme_name = "PM_LSU_REJECT_SRQ", .pme_code = 0x1920, .pme_short_desc = "LSU SRQ rejects", .pme_long_desc = "LSU SRQ rejects", }, [ PPC970MP_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x433, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "branch miss predict due to a target address prediction. This signal will be asserted each time the branch execution unit detects an incorrect target address prediction. This signal will be asserted after a valid branch execution unit issue and will cause a branch mispredict flush unless a flush is detected from an older instruction.", }, [ PPC970MP_PME_PM_CRQ_FULL_CYC ] = { .pme_name = "PM_CRQ_FULL_CYC", .pme_code = 0x311, .pme_short_desc = "Cycles CR issue queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu cr unit cannot accept any more group (queue is full of groups).", }, [ PPC970MP_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x3810, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Total DL1 Load references that miss the DL1", }, [ PPC970MP_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x342d, .pme_short_desc = "Instructions fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch Groups can contain up to 8 instructions", }, [ PPC970MP_PME_PM_STCX_PASS ] = { .pme_name = "PM_STCX_PASS", .pme_code = 0x725, .pme_short_desc = "Stcx passes", .pme_long_desc = "A stcx (stwcx or stdcx) instruction was successful", }, [ PPC970MP_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0x817, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ PPC970MP_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x313, .pme_short_desc = "Cycles SRQ full", .pme_long_desc = "The ISU sends this signal when the srq is full.", }, [ PPC970MP_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0x802, .pme_short_desc = "LSU0 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970MP_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0x821, .pme_short_desc = "SRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ PPC970MP_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0x727, .pme_short_desc = "Larx executed on LSU0", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 (there is no coresponding unit 1 event since larx instructions can only execute on unit 0)", }, [ PPC970MP_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x1004, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ PPC970MP_PME_PM_FPU1_ALL ] = { .pme_name = "PM_FPU1_ALL", .pme_code = 0x107, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ PPC970MP_PME_PM_FPU1_FSQRT ] = { .pme_name = "PM_FPU1_FSQRT", .pme_code = 0x106, .pme_short_desc = "FPU1 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x4110, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result This only indicates finish, not completion. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0x1820, .pme_short_desc = "SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load", }, [ PPC970MP_PME_PM_MRK_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU1", .pme_code = 0x724, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 1, missed the dcache", }, [ PPC970MP_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x332, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result", }, [ PPC970MP_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x7004, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970MP_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x600a, .pme_short_desc = "PMC5 Overflow", .pme_long_desc = "PMC5 Overflow", }, [ PPC970MP_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0x703, .pme_short_desc = "Snoop TLBIE", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ PPC970MP_PME_PM_FPU1_FRSP_FCONV ] = { .pme_name = "PM_FPU1_FRSP_FCONV", .pme_code = 0x115, .pme_short_desc = "FPU1 executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when fp1 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970MP_PME_PM_FPU0_FDIV ] = { .pme_name = "PM_FPU0_FDIV", .pme_code = 0x100, .pme_short_desc = "FPU0 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ PPC970MP_PME_PM_LD_REF_L1_LSU1 ] = { .pme_name = "PM_LD_REF_L1_LSU1", .pme_code = 0x814, .pme_short_desc = "LSU1 L1 D cache load references", .pme_long_desc = "A load executed on unit 1", }, [ PPC970MP_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x3004, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ PPC970MP_PME_PM_LR_CTR_MAP_FULL_CYC ] = { .pme_name = "PM_LR_CTR_MAP_FULL_CYC", .pme_code = 0x306, .pme_short_desc = "Cycles LR/CTR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the lr/ctr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970MP_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x1120, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized. Combined Unit 0 + Unit 1", }, [ PPC970MP_PME_PM_LSU0_REJECT_SRQ ] = { .pme_name = "PM_LSU0_REJECT_SRQ", .pme_code = 0x920, .pme_short_desc = "LSU0 SRQ rejects", .pme_long_desc = "LSU0 SRQ rejects", }, [ PPC970MP_PME_PM_LSU1_REJECT_SRQ ] = { .pme_name = "PM_LSU1_REJECT_SRQ", .pme_code = 0x924, .pme_short_desc = "LSU1 SRQ rejects", .pme_long_desc = "LSU1 SRQ rejects", }, [ PPC970MP_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x706, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ PPC970MP_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x426, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "Asserted when a non-canceled prefetch is made to the cache interface unit (CIU).", }, [ PPC970MP_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x8004, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970MP_PME_PM_MRK_DATA_FROM_MEM ] = { .pme_name = "PM_MRK_DATA_FROM_MEM", .pme_code = 0x2937, .pme_short_desc = "Marked data loaded from memory", .pme_long_desc = "Marked data loaded from memory", }, [ PPC970MP_PME_PM_CMPLU_STALL_DCACHE_MISS ] = { .pme_name = "PM_CMPLU_STALL_DCACHE_MISS", .pme_code = 0x50cb, .pme_short_desc = "Completion stall caused by D cache miss", .pme_long_desc = "Completion stall caused by D cache miss", }, [ PPC970MP_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0x801, .pme_short_desc = "LSU0 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4k boundary)", }, [ PPC970MP_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0x6800, .pme_short_desc = "LRQ flushes", .pme_long_desc = "A load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970MP_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0x5800, .pme_short_desc = "SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", } }; #endif papi-5.3.0/src/libpfm4/lib/events/sparc_ultra3i_events.h0000600003276200002170000002413212247131124022647 0ustar ralphundrgradstatic const sparc_entry_t ultra3i_pe[] = { /* These two must always be first. */ { .name = "Cycle_cnt", .desc = "Accumulated cycles", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x1, }, /* PIC0 events common to all UltraSPARC processors */ { .name = "Dispatch0_IC_miss", .desc = "I-buffer is empty from I-Cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, { .name = "IC_ref", .desc = "I-cache refrences", .ctrl = PME_CTRL_S0, .code = 0x8, }, { .name = "DC_rd", .desc = "D-cache read references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0x9, }, { .name = "DC_wr", .desc = "D-cache store accesses (including cacheable stores that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0xa, }, { .name = "EC_ref", .desc = "E-cache references", .ctrl = PME_CTRL_S0, .code = 0xc, }, { .name = "EC_snoop_inv", .desc = "L2-cache invalidates generated from a snoop by a remote processor", .ctrl = PME_CTRL_S0, .code = 0xe, }, /* PIC1 events common to all UltraSPARC processors */ { .name = "Dispatch0_mispred", .desc = "I-buffer is empty from Branch misprediction", .ctrl = PME_CTRL_S1, .code = 0x2, }, { .name = "EC_wb", .desc = "Dirty sub-blocks that produce writebacks due to L2-cache miss events", .ctrl = PME_CTRL_S1, .code = 0xd, }, { .name = "EC_snoop_cb", .desc = "L2-cache copybacks generated from a snoop by a remote processor", .ctrl = PME_CTRL_S1, .code = 0xe, }, /* PIC0 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "Dispatch0_br_target", .desc = "I-buffer is empty due to a branch target address calculation", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "Dispatch0_2nd_br", .desc = "Stall cycles due to having two branch instructions line-up in one 4-instruction group causing the second branch in the group to be re-fetched, delaying it's entrance into the I-buffer", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "Rstall_storeQ", .desc = "R-stage stall for a store instruction which is the next instruction to be executed, but it stailled due to the store queue being full", .ctrl = PME_CTRL_S0, .code = 0x5, }, { .name = "Rstall_IU_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding integer instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S0, .code = 0x6, }, { .name = "EC_write_hit_RTO", .desc = "W-cache exclusive requests that hit L2-cache in S, O, or Os state and thus, do a read-to-own bus transaction", .ctrl = PME_CTRL_S0, .code = 0xd, }, { .name = "EC_rd_miss", .desc = "L2-cache miss events (including atomics) from D-cache events", .ctrl = PME_CTRL_S0, .code = 0xf, }, { .name = "PC_port0_rd", .desc = "P-cache cacheable FP loads to the first port (general purpose load path to D-cache and P-cache via MS pipeline)", .ctrl = PME_CTRL_S0, .code = 0x10, }, { .name = "SI_snoop", .desc = "Counts snoops from remote processor(s) including RTS, RTSR, RTO, RTOR, RS, RSR, RTSM, and WS", .ctrl = PME_CTRL_S0, .code = 0x11, }, { .name = "SI_ciq_flow", .desc = "Counts system clock cycles when the flow control (PauseOut) signal is asserted", .ctrl = PME_CTRL_S0, .code = 0x12, }, { .name = "SI_owned", .desc = "Counts events where owned_in is asserted on bus requests from the local processor", .ctrl = PME_CTRL_S0, .code = 0x13, }, { .name = "SW_count0", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S0, .code = 0x14, }, { .name = "IU_Stat_Br_miss_taken", .desc = "Retired branches that were predicted to be taken, but in fact were not taken", .ctrl = PME_CTRL_S0, .code = 0x15, }, { .name = "IU_Stat_Br_Count_taken", .desc = "Retired taken branches", .ctrl = PME_CTRL_S0, .code = 0x16, }, { .name = "Dispatch0_rs_mispred", .desc = "I-buffer is empty due to a Return Address Stack misprediction", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "FA_pipe_completion", .desc = "Instructions that complete execution on the FPG ALU pipelines", .ctrl = PME_CTRL_S0, .code = 0x18, }, /* PIC1 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "IC_miss_cancelled", .desc = "I-cache misses cancelled due to mis-speculation, recycle, or other events", .ctrl = PME_CTRL_S1, .code = 0x3, }, { .name = "Re_FPU_bypass", .desc = "Stall due to recirculation when an FPU bypass condition that does not have a direct bypass path occurs", .ctrl = PME_CTRL_S1, .code = 0x5, }, { .name = "Re_DC_miss", .desc = "Stall due to loads that miss D-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x6, }, { .name = "Re_EC_miss", .desc = "Stall due to loads that miss L2-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x7, }, { .name = "IC_miss", .desc = "I-cache misses, including fetches from mis-speculated execution paths which are later cancelled", .ctrl = PME_CTRL_S1, .code = 0x8, }, { .name = "DC_rd_miss", .desc = "Recirculated loads that miss the D-cache", .ctrl = PME_CTRL_S1, .code = 0x9, }, { .name = "DC_wr_miss", .desc = "D-cache store accesses that miss D-cache", .ctrl = PME_CTRL_S1, .code = 0xa, }, { .name = "Rstall_FP_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding floating-point instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S1, .code = 0xb, }, { .name = "EC_misses", .desc = "E-cache misses", .ctrl = PME_CTRL_S1, .code = 0xc, }, { .name = "EC_ic_miss", .desc = "L2-cache read misses from I-cache requests", .ctrl = PME_CTRL_S1, .code = 0xf, }, { .name = "Re_PC_miss", .desc = "Stall due to recirculation when a prefetch cache miss occurs on a prefetch predicted second load", .ctrl = PME_CTRL_S1, .code = 0x10, }, { .name = "ITLB_miss", .desc = "I-TLB miss traps taken", .ctrl = PME_CTRL_S1, .code = 0x11, }, { .name = "DTLB_miss", .desc = "Memory reference instructions which trap due to D-TLB miss", .ctrl = PME_CTRL_S1, .code = 0x12, }, { .name = "WC_miss", .desc = "W-cache misses", .ctrl = PME_CTRL_S1, .code = 0x13, }, { .name = "WC_snoop_cb", .desc = "W-cache copybacks generated by a snoop from a remote processor", .ctrl = PME_CTRL_S1, .code = 0x14, }, { .name = "WC_scrubbed", .desc = "W-cache hits to clean lines", .ctrl = PME_CTRL_S1, .code = 0x15, }, { .name = "WC_wb_wo_read", .desc = "W-cache writebacks not requiring a read", .ctrl = PME_CTRL_S1, .code = 0x16, }, { .name = "PC_soft_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a software-prefetch instruction", .ctrl = PME_CTRL_S1, .code = 0x18, }, { .name = "PC_snoop_inv", .desc = "P-cache invalidates that were generated by a snoop from a remote processor and stores by a local processor", .ctrl = PME_CTRL_S1, .code = 0x19, }, { .name = "PC_hard_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a hardware prefetch", .ctrl = PME_CTRL_S1, .code = 0x1a, }, { .name = "PC_port1_rd", .desc = "P-cache cacheable FP loads to the second port (memory and out-of-pipeline instruction execution loads via the A0 and A1 pipelines)", .ctrl = PME_CTRL_S1, .code = 0x1b, }, { .name = "SW_count1", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S1, .code = 0x1c, }, { .name = "IU_Stat_Br_miss_untaken", .desc = "Retired branches that were predicted to be untaken, but in fact were taken", .ctrl = PME_CTRL_S1, .code = 0x1d, }, { .name = "IU_Stat_Br_Count_untaken", .desc = "Retired untaken branches", .ctrl = PME_CTRL_S1, .code = 0x1e, }, { .name = "PC_MS_miss", .desc = "FP loads through the MS pipeline that miss P-cache", .ctrl = PME_CTRL_S1, .code = 0x1f, }, { .name = "Re_RAW_miss", .desc = "Stall due to recirculation when there is a load in the E-stage which has a non-bypassable read-after-write hazard with an earlier store instruction", .ctrl = PME_CTRL_S1, .code = 0x26, }, { .name = "FM_pipe_completion", .desc = "Instructions that complete execution on the FPG Multiply pipelines", .ctrl = PME_CTRL_S0, .code = 0x27, }, /* PIC0 memory controller events specific to UltraSPARC-IIIi processors */ { .name = "MC_read_dispatched", .desc = "DDR 64-byte reads dispatched by the MIU", .ctrl = PME_CTRL_S0, .code = 0x20, }, { .name = "MC_write_dispatched", .desc = "DDR 64-byte writes dispatched by the MIU", .ctrl = PME_CTRL_S0, .code = 0x21, }, { .name = "MC_read_returned_to_JBU", .desc = "64-byte reads that return data to JBU", .ctrl = PME_CTRL_S0, .code = 0x22, }, { .name = "MC_msl_busy_stall", .desc = "Stall cycles due to msl_busy", .ctrl = PME_CTRL_S0, .code = 0x23, }, { .name = "MC_mdb_overflow_stall", .desc = "Stall cycles due to potential memory data buffer overflow", .ctrl = PME_CTRL_S0, .code = 0x24, }, { .name = "MC_miu_spec_request", .desc = "Speculative requests accepted by MIU", .ctrl = PME_CTRL_S0, .code = 0x25, }, /* PIC1 memory controller events specific to UltraSPARC-IIIi processors */ { .name = "MC_reads", .desc = "64-byte reads by the MSL", .ctrl = PME_CTRL_S1, .code = 0x20, }, { .name = "MC_writes", .desc = "64-byte writes by the MSL", .ctrl = PME_CTRL_S1, .code = 0x21, }, { .name = "MC_page_close_stall", .desc = "DDR page conflicts", .ctrl = PME_CTRL_S1, .code = 0x22, }, /* PIC1 events specific to UltraSPARC-III+/IIIi */ { .name = "Re_DC_missovhd", .desc = "Used to measure D-cache stall counts seperatedly for L2-cache hits and misses. This counter is used with the recirculation and cache access events to seperately calculate the D-cache loads that hit and miss the L2-cache", .ctrl = PME_CTRL_S1, .code = 0x4, }, }; #define PME_SPARC_ULTRA3I_EVENT_COUNT (sizeof(ultra3i_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/amd64_events_fam15h.h0000600003276200002170000022000212247131123022141 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: amd64_fam15h (AMD64 Fam15h Interlagos) * * Based on libpfm patch by Robert Richter : * Family 15h Microarchitecture performance monitor events * * History: * * Apr 29 2011 -- Robert Richter, robert.richter@amd.com: * Source: BKDG for AMD Family 15h Models 00h-0Fh Processors, * 42301, Rev 1.15, April 18, 2011 * * Dec 09 2010 -- Robert Richter, robert.richter@amd.com: * Source: BIOS and Kernel Developer's Guide for the AMD Family 15h * Processors, Rev 0.90, May 18, 2010 */ #define CORE_SELECT(b) \ { .uname = "CORE_0",\ .udesc = "Measure on Core0",\ .ucode = 0 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_1",\ .udesc = "Measure on Core1",\ .ucode = 1 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_2",\ .udesc = "Measure on Core2",\ .ucode = 2 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_3",\ .udesc = "Measure on Core3",\ .ucode = 3 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_4",\ .udesc = "Measure on Core4",\ .ucode = 4 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_5",\ .udesc = "Measure on Core5",\ .ucode = 5 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_6",\ .udesc = "Measure on Core6",\ .ucode = 6 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "CORE_7",\ .udesc = "Measure on Core7",\ .ucode = 7 << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO,\ },\ { .uname = "ANY_CORE",\ .udesc = "Measure on any core",\ .ucode = 0xf << 4,\ .grpid = b,\ .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL,\ } static const amd64_umask_t amd64_fam15h_dispatched_fpu_ops[]={ { .uname = "OPS_PIPE0", .udesc = "Total number uops assigned to Pipe 0", .ucode = 0x1, }, { .uname = "OPS_PIPE1", .udesc = "Total number uops assigned to Pipe 1", .ucode = 0x2, }, { .uname = "OPS_PIPE2", .udesc = "Total number uops assigned to Pipe 2", .ucode = 0x4, }, { .uname = "OPS_PIPE3", .udesc = "Total number uops assigned to Pipe 3", .ucode = 0x8, }, { .uname = "OPS_DUAL_PIPE0", .udesc = "Total number dual-pipe uops assigned to Pipe 0", .ucode = 0x10, }, { .uname = "OPS_DUAL_PIPE1", .udesc = "Total number dual-pipe uops assigned to Pipe 1", .ucode = 0x20, }, { .uname = "OPS_DUAL_PIPE2", .udesc = "Total number dual-pipe uops assigned to Pipe 2", .ucode = 0x40, }, { .uname = "OPS_DUAL_PIPE3", .udesc = "Total number dual-pipe uops assigned to Pipe 3", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_retired_sse_ops[]={ { .uname = "SINGLE_ADD_SUB_OPS", .udesc = "Single-precision add/subtract FLOPS", .ucode = 0x1, }, { .uname = "SINGLE_MUL_OPS", .udesc = "Single-precision multiply FLOPS", .ucode = 0x2, }, { .uname = "SINGLE_DIV_OPS", .udesc = "Single-precision divide/square root FLOPS", .ucode = 0x4, }, { .uname = "SINGLE_MUL_ADD_OPS", .udesc = "Single precision multiply-add FLOPS. Multiply-add counts as 2 FLOPS", .ucode = 0x8, }, { .uname = "DOUBLE_ADD_SUB_OPS", .udesc = "Double precision add/subtract FLOPS", .ucode = 0x10, }, { .uname = "DOUBLE_MUL_OPS", .udesc = "Double precision multiply FLOPS", .ucode = 0x20, }, { .uname = "DOUBLE_DIV_OPS", .udesc = "Double precision divide/square root FLOPS", .ucode = 0x40, }, { .uname = "DOUBLE_MUL_ADD_OPS", .udesc = "Double precision multiply-add FLOPS. Multiply-add counts as 2 FLOPS", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_move_scalar_optimization[]={ { .uname = "SSE_MOVE_OPS", .udesc = "Number of SSE Move Ops", .ucode = 0x1, }, { .uname = "SSE_MOVE_OPS_ELIM", .udesc = "Number of SSE Move Ops eliminated", .ucode = 0x2, }, { .uname = "OPT_CAND", .udesc = "Number of Ops that are candidates for optimization (Z-bit set or pass)", .ucode = 0x4, }, { .uname = "SCALAR_OPS_OPTIMIZED", .udesc = "Number of Scalar ops optimized", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_retired_serializing_ops[]={ { .uname = "SSE_RETIRED", .udesc = "SSE bottom-executing uops retired", .ucode = 0x1, }, { .uname = "SSE_MISPREDICTED", .udesc = "SSE control word mispredict traps due to mispredictions", .ucode = 0x2, }, { .uname = "X87_RETIRED", .udesc = "X87 bottom-executing uops retired", .ucode = 0x4, }, { .uname = "X87_MISPREDICTED", .udesc = "X87 control word mispredict traps due to mispredictions", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_load_q_store_q_full[]={ { .uname = "LOAD_QUEUE", .udesc = "The number of cycles that the load buffer is full", .ucode = 0x1, }, { .uname = "STORE_QUEUE", .udesc = "The number of cycles that the store buffer is full", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_locked_ops[]={ { .uname = "EXECUTED", .udesc = "Number of locked instructions executed", .ucode = 0x1, }, { .uname = "CYCLES_NON_SPECULATIVE_PHASE", .udesc = "Number of cycles spent in non-speculative phase, excluding cache miss penalty", .ucode = 0x4, }, { .uname = "CYCLES_WAITING", .udesc = "Number of cycles spent in non-speculative phase, including the cache miss penalty", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xd, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cancelled_store_to_load[]={ { .uname = "SIZE_ADDRESS_MISMATCHES", .udesc = "Store is smaller than load or different starting byte but partial overlap", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_data_cache_misses[]={ { .uname = "DC_MISS_STREAMING_STORE", .udesc = "First data cache miss or streaming store to a 64B cache line", .ucode = 0x1, }, { .uname = "STREAMING_STORE", .udesc = "First streaming store to a 64B cache line", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_data_cache_refills_from_l2_or_northbridge[]={ { .uname = "GOOD", .udesc = "Fill with good data. (Final valid status is valid)", .ucode = 0x1, }, { .uname = "INVALID", .udesc = "Early valid status turned out to be invalid", .ucode = 0x2, }, { .uname = "POISON", .udesc = "Fill with poison data", .ucode = 0x4, }, { .uname = "READ_ERROR", .udesc = "Fill with read data error", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_unified_tlb_hit[]={ { .uname = "4K_DATA", .udesc = "4 KB unified TLB hit for data", .ucode = 0x1, }, { .uname = "2M_DATA", .udesc = "2 MB unified TLB hit for data", .ucode = 0x2, }, { .uname = "1G_DATA", .udesc = "1 GB unified TLB hit for data", .ucode = 0x4, }, { .uname = "4K_INST", .udesc = "4 KB unified TLB hit for instruction", .ucode = 0x10, }, { .uname = "2M_INST", .udesc = "2 MB unified TLB hit for instruction", .ucode = 0x20, }, { .uname = "1G_INST", .udesc = "1 GB unified TLB hit for instruction", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x77, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_unified_tlb_miss[]={ { .uname = "4K_DATA", .udesc = "4 KB unified TLB miss for data", .ucode = 0x1, }, { .uname = "2M_DATA", .udesc = "2 MB unified TLB miss for data", .ucode = 0x2, }, { .uname = "1GB_DATA", .udesc = "1 GB unified TLB miss for data", .ucode = 0x4, }, { .uname = "4K_INST", .udesc = "4 KB unified TLB miss for instruction", .ucode = 0x10, }, { .uname = "2M_INST", .udesc = "2 MB unified TLB miss for instruction", .ucode = 0x20, }, { .uname = "1G_INST", .udesc = "1 GB unified TLB miss for instruction", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x77, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_ineffective_sw_prefetches[]={ { .uname = "SW_PREFETCH_HIT_IN_L1", .udesc = "Software prefetch hit in the L1", .ucode = 0x1, }, { .uname = "SW_PREFETCH_HIT_IN_L2", .udesc = "Software prefetch hit in the L2", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x9, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to non-cacheable (WC, but not WC+/SS) memory", .ucode = 0x2, }, { .uname = "STREAMING_STORE", .udesc = "Requests to non-cacheable (WC+/SS, but not WC) memory", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x83, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_data_prefetcher[]={ { .uname = "ATTEMPTED", .udesc = "Prefetch attempts", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_mab_reqs[]={ { .uname = "BUFFER_BIT_0", .udesc = "Buffer entry index bit 0", .ucode = 0x1, }, { .uname = "BUFFER_BIT_1", .udesc = "Buffer entry index bit 1", .ucode = 0x2, }, { .uname = "BUFFER_BIT_2", .udesc = "Buffer entry index bit 2", .ucode = 0x4, }, { .uname = "BUFFER_BIT_3", .udesc = "Buffer entry index bit 3", .ucode = 0x8, }, { .uname = "BUFFER_BIT_4", .udesc = "Buffer entry index bit 4", .ucode = 0x10, }, { .uname = "BUFFER_BIT_5", .udesc = "Buffer entry index bit 5", .ucode = 0x20, }, { .uname = "BUFFER_BIT_6", .udesc = "Buffer entry index bit 6", .ucode = 0x40, }, { .uname = "BUFFER_BIT_7", .udesc = "Buffer entry index bit 7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_system_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified (D18F0x68[ATMModeEn]==0), Modified written (D18F0x68[ATMModeEn]==1)", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "DATA_ERROR", .udesc = "Data Error", .ucode = 0x10, }, { .uname = "MODIFIED_UNWRITTEN", .udesc = "Modified unwritten", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_octword_write_transfers[]={ { .uname = "OCTWORD_WRITE_TRANSFER", .udesc = "OW write transfer", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB fill (page table walks)", .ucode = 0x4, }, { .uname = "SNOOP", .udesc = "NB probe request", .ucode = 0x8, }, { .uname = "CANCELLED", .udesc = "Canceled request", .ucode = 0x10, }, { .uname = "PREFETCHER", .udesc = "L2 cache prefetcher request", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x5f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas PMCx041 does not)", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB page table walk", .ucode = 0x4, }, { .uname = "PREFETCHER", .udesc = "L2 Cache Prefetcher request", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x17, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_l2_cache_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills from system", .ucode = 0x1, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system (Clean and Dirty)", .ucode = 0x2, }, { .uname = "L2_WRITEBACKS_CLEAN", .udesc = "L2 Clean Writebacks to system", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_page_splintering[]={ { .uname = "GUEST_LARGER", .udesc = "Guest page size is larger than host page size when nested paging is enabled", .ucode = 0x1, }, { .uname = "MTRR_MISMATCH", .udesc = "Splintering due to MTRRs, IORRs, APIC, TOMs or other special address region", .ucode = 0x2, }, { .uname = "HOST_LARGER", .udesc = "Host page size is larger than the guest page size", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_l1_itlb_miss_and_l2_itlb_miss[]={ { .uname = "4K_PAGE_FETCHES", .udesc = "Instruction fetches to a 4 KB page", .ucode = 0x1, }, { .uname = "2M_PAGE_FETCHES", .udesc = "Instruction fetches to a 2 MB page", .ucode = 0x2, }, { .uname = "1G_PAGE_FETCHES", .udesc = "Instruction fetches to a 1 GB page", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_instruction_cache_invalidated[]={ { .uname = "NON_SMC_PROBE_MISS", .udesc = "Non-SMC invalidating probe that missed on in-flight instructions", .ucode = 0x1, }, { .uname = "NON_SMC_PROBE_HIT", .udesc = "Non-SMC invalidating probe that hit on in-flight instructions", .ucode = 0x2, }, { .uname = "SMC_PROBE_MISS", .udesc = "SMC invalidating probe that missed on in-flight instructions", .ucode = 0x4, }, { .uname = "SMC_PROBE_HIT", .udesc = "SMC invalidating probe that hit on in-flight instructions", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_retired_mmx_fp_instructions[]={ { .uname = "X87", .udesc = "X87 instructions", .ucode = 0x1, }, { .uname = "MMX", .udesc = "MMX(tm) instructions", .ucode = 0x2, }, { .uname = "SSE", .udesc = "SSE instructions (SSE,SSE2,SSE3,SSSE3,SSE4A,SSE4.1,SSE4.2,AVX,XOP,FMA4)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_fpu_exceptions[]={ { .uname = "TOTAL_FAULTS", .udesc = "Total microfaults", .ucode = 0x1, }, { .uname = "TOTAL_TRAPS", .udesc = "Total microtraps", .ucode = 0x2, }, { .uname = "INT2EXT_FAULTS", .udesc = "Int2Ext faults", .ucode = 0x4, }, { .uname = "EXT2INT_FAULTS", .udesc = "Ext2Int faults", .ucode = 0x8, }, { .uname = "BYPASS_FAULTS", .udesc = "Bypass faults", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_ibs_ops_tagged[]={ { .uname = "TAGGED", .udesc = "Number of ops tagged by IBS", .ucode = 0x1, }, { .uname = "RETIRED", .udesc = "Number of ops tagged by IBS that retired", .ucode = 0x2, }, { .uname = "IGNORED", .udesc = "Number of times an op could not be tagged by IBS because of a previous tagged op that has not retired", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_ls_dispatch[]={ { .uname = "LOADS", .udesc = "Loads", .ucode = 0x1, }, { .uname = "STORES", .udesc = "Stores", .ucode = 0x2, }, { .uname = "LOAD_OP_STORES", .udesc = "Load-op-Stores", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_l2_prefetcher_trigger_events[]={ { .uname = "LOAD_L1_MISS_SEEN_BY_PREFETCHER", .udesc = "Load L1 miss seen by prefetcher", .ucode = 0x1, }, { .uname = "STORE_L1_MISS_SEEN_BY_PREFETCHER", .udesc = "Store L1 miss seen by prefetcher", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_dram_accesses[]={ { .uname = "DCT0_PAGE_HIT", .udesc = "DCT0 Page hit", .ucode = 0x1, }, { .uname = "DCT0_PAGE_MISS", .udesc = "DCT0 Page Miss", .ucode = 0x2, }, { .uname = "DCT0_PAGE_CONFLICT", .udesc = "DCT0 Page Conflict", .ucode = 0x4, }, { .uname = "DCT1_PAGE_HIT", .udesc = "DCT1 Page hit", .ucode = 0x8, }, { .uname = "DCT1_PAGE_MISS", .udesc = "DCT1 Page Miss", .ucode = 0x10, }, { .uname = "DCT1_PAGE_CONFLICT", .udesc = "DCT1 Page Conflict", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_dram_controller_page_table_overflows[]={ { .uname = "DCT0_PAGE_TABLE_OVERFLOW", .udesc = "DCT0 Page Table Overflow", .ucode = 0x1, }, { .uname = "DCT1_PAGE_TABLE_OVERFLOW", .udesc = "DCT1 Page Table Overflow", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_memory_controller_dram_command_slots_missed[]={ { .uname = "DCT0_COMMAND_SLOTS_MISSED", .udesc = "DCT0 Command Slots Missed (in MemClks)", .ucode = 0x1, }, { .uname = "DCT1_COMMAND_SLOTS_MISSED", .udesc = "DCT1 Command Slots Missed (in MemClks)", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_memory_controller_turnarounds[]={ { .uname = "DCT0_DIMM_TURNAROUND", .udesc = "DCT0 DIMM (chip select) turnaround", .ucode = 0x1, }, { .uname = "DCT0_READ_WRITE_TURNAROUND", .udesc = "DCT0 Read to write turnaround", .ucode = 0x2, }, { .uname = "DCT0_WRITE_READ_TURNAROUND", .udesc = "DCT0 Write to read turnaround", .ucode = 0x4, }, { .uname = "DCT1_DIMM_TURNAROUND", .udesc = "DCT1 DIMM (chip select) turnaround", .ucode = 0x8, }, { .uname = "DCT1_READ_WRITE_TURNAROUND", .udesc = "DCT1 Read to write turnaround", .ucode = 0x10, }, { .uname = "DCT1_WRITE_READ_TURNAROUND", .udesc = "DCT1 Write to read turnaround", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_memory_controller_bypass_counter_saturation[]={ { .uname = "MEMORY_CONTROLLER_HIGH_PRIORITY_BYPASS", .udesc = "Memory controller high priority bypass", .ucode = 0x1, }, { .uname = "MEMORY_CONTROLLER_MEDIUM_PRIORITY_BYPASS", .udesc = "Memory controller medium priority bypass", .ucode = 0x2, }, { .uname = "DCT0_DCQ_BYPASS", .udesc = "DCT0 DCQ bypass", .ucode = 0x4, }, { .uname = "DCT1_DCQ_BYPASS", .udesc = "DCT1 DCQ bypass", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_thermal_status[]={ { .uname = "NUM_HTC_TRIP_POINT_CROSSED", .udesc = "Number of times the HTC trip point is crossed", .ucode = 0x4, }, { .uname = "NUM_CLOCKS_HTC_PSTATE_INACTIVE", .udesc = "Number of clocks HTC P-state is inactive", .ucode = 0x20, }, { .uname = "NUM_CLOCKS_HTC_PSTATE_ACTIVE", .udesc = "Number of clocks HTC P-state is active", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x64, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cpu_io_requests_to_memory_io[]={ { .uname = "REMOTE_IO_TO_LOCAL_IO", .udesc = "Remote IO to Local IO", .ucode = 0x61, .uflags= AMD64_FL_NCOMBO, }, { .uname = "REMOTE_CPU_TO_LOCAL_IO", .udesc = "Remote CPU to Local IO", .ucode = 0x64, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_IO_TO_REMOTE_IO", .udesc = "Local IO to Remote IO", .ucode = 0x91, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_IO_TO_REMOTE_MEM", .udesc = "Local IO to Remote Mem", .ucode = 0x92, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_CPU_TO_REMOTE_IO", .udesc = "Local CPU to Remote IO", .ucode = 0x94, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_CPU_TO_REMOTE_MEM", .udesc = "Local CPU to Remote Mem", .ucode = 0x98, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_IO_TO_LOCAL_IO", .udesc = "Local IO to Local IO", .ucode = 0xa1, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_IO_TO_LOCAL_MEM", .udesc = "Local IO to Local Mem", .ucode = 0xa2, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_CPU_TO_LOCAL_IO", .udesc = "Local CPU to Local IO", .ucode = 0xa4, .uflags= AMD64_FL_NCOMBO, }, { .uname = "LOCAL_CPU_TO_LOCAL_MEM", .udesc = "Local CPU to Local Mem", .ucode = 0xa8, .uflags= AMD64_FL_NCOMBO, }, }; static const amd64_umask_t amd64_fam15h_cache_block_commands[]={ { .uname = "VICTIM_BLOCK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "READ_BLOCK", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "CHANGE_TO_DIRTY", .udesc = "Change-to-Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags = AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_sized_commands[]={ { .uname = "NON-POSTED_SZWR_BYTE", .udesc = "Non-Posted SzWr Byte (1-32 bytes). Typical Usage: Legacy or mapped IO, typically 1-4 bytes.", .ucode = 0x1, }, { .uname = "NON-POSTED_SZWR_DW", .udesc = "Non-Posted SzWr DW (1-16 dwords). Typical Usage: Legacy or mapped IO, typically 1", .ucode = 0x2, }, { .uname = "POSTED_SZWR_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes). Typical Usage: Subcache-line DMA writes, size varies; also", .ucode = 0x4, }, { .uname = "POSTED_SZWR_DW", .udesc = "Posted SzWr DW (1-16 dwords). Typical Usage: Block-oriented DMA writes, often cache-line", .ucode = 0x8, }, { .uname = "SZRD_BYTE", .udesc = "SzRd Byte (4 bytes). Typical Usage: Legacy or mapped IO.", .ucode = 0x10, }, { .uname = "SZRD_DW", .udesc = "SzRd DW (1-16 dwords). Typical Usage: Block-oriented DMA reads, typically cache-line size.", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_probe_responses_and_upstream_requests[]={ { .uname = "PROBE_MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "PROBE_HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "PROBE_HIT_DIRTY_WITHOUT_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "PROBE_HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_DISPLAY_REFRESH_ISOC_READS", .udesc = "Upstream display refresh/ISOC reads", .ucode = 0x10, }, { .uname = "UPSTREAM_NON-DISPLAY_REFRESH_READS", .udesc = "Upstream non-display refresh reads", .ucode = 0x20, }, { .uname = "UPSTREAM_ISOC_WRITES", .udesc = "Upstream ISOC writes", .ucode = 0x40, }, { .uname = "UPSTREAM_NON-ISOC_WRITES", .udesc = "Upstream non-ISOC writes", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_gart_events[]={ { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_CPU", .udesc = "GART aperture hit on access from CPU", .ucode = 0x1, }, { .uname = "GART_APERTURE_HIT_ON_ACCESS_FROM_IO", .udesc = "GART aperture hit on access from IO", .ucode = 0x2, }, { .uname = "GART_MISS", .udesc = "GART miss", .ucode = 0x4, }, { .uname = "GART_REQUEST_HIT_TABLE_WALK_IN_PROGRESS", .udesc = "GART Request hit table walk in progress", .ucode = 0x8, }, { .uname = "GART_MULTIPLE_TABLE_WALK_IN_PROGRESS", .udesc = "GART multiple table walk in progress", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x8f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_link_transmit_bandwidth[]={ { .uname = "COMMAND_DW_SENT", .udesc = "Command DW sent", .ucode = 0x1, .grpid = 0, }, { .uname = "DATA_DW_SENT", .udesc = "Data DW sent", .ucode = 0x2, .grpid = 0, }, { .uname = "BUFFER_RELEASE_DW_SENT", .udesc = "Buffer release DW sent", .ucode = 0x4, .grpid = 0, }, { .uname = "NOP_DW_SENT", .udesc = "NOP DW sent (idle)", .ucode = 0x8, .grpid = 0, }, { .uname = "ADDRESS_DW_SENT", .udesc = "Address (including extensions) DW sent", .ucode = 0x10, .grpid = 0, }, { .uname = "PER_PACKET_CRC_SENT", .udesc = "Per packet CRC sent", .ucode = 0x20, .grpid = 0, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "SUBLINK_1", .udesc = "When links are unganged, enable this umask to select sublink 1", .ucode = 0x80, .grpid = 1, .uflags= AMD64_FL_NCOMBO, }, { .uname = "SUBLINK_0", .udesc = "When links are unganged, enable this umask to select sublink 0 (default when links ganged)", .ucode = 0x00, .grpid = 1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cpu_to_dram_requests_to_target_node[]={ { .uname = "LOCAL_TO_NODE_0", .udesc = "From Local node to Node 0", .ucode = 0x1, }, { .uname = "LOCAL_TO_NODE_1", .udesc = "From Local node to Node 1", .ucode = 0x2, }, { .uname = "LOCAL_TO_NODE_2", .udesc = "From Local node to Node 2", .ucode = 0x4, }, { .uname = "LOCAL_TO_NODE_3", .udesc = "From Local node to Node 3", .ucode = 0x8, }, { .uname = "LOCAL_TO_NODE_4", .udesc = "From Local node to Node 4", .ucode = 0x10, }, { .uname = "LOCAL_TO_NODE_5", .udesc = "From Local node to Node 5", .ucode = 0x20, }, { .uname = "LOCAL_TO_NODE_6", .udesc = "From Local node to Node 6", .ucode = 0x40, }, { .uname = "LOCAL_TO_NODE_7", .udesc = "From Local node to Node 7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_io_to_dram_requests_to_target_node[]={ { .uname = "LOCAL_TO_NODE_0", .udesc = "From Local node to Node 0", .ucode = 0x1, }, { .uname = "LOCAL_TO_NODE_1", .udesc = "From Local node to Node 1", .ucode = 0x2, }, { .uname = "LOCAL_TO_NODE_2", .udesc = "From Local node to Node 2", .ucode = 0x4, }, { .uname = "LOCAL_TO_NODE_3", .udesc = "From Local node to Node 3", .ucode = 0x8, }, { .uname = "LOCAL_TO_NODE_4", .udesc = "From Local node to Node 4", .ucode = 0x10, }, { .uname = "LOCAL_TO_NODE_5", .udesc = "From Local node to Node 5", .ucode = 0x20, }, { .uname = "LOCAL_TO_NODE_6", .udesc = "From Local node to Node 6", .ucode = 0x40, }, { .uname = "LOCAL_TO_NODE_7", .udesc = "From Local node to Node 7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_0_3[]={ { .uname = "READ_BLOCK_LOCAL_TO_NODE_0", .udesc = "Read block From Local node to Node 0", .ucode = 0x11, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_0", .udesc = "Read block shared From Local node to Node 0", .ucode = 0x12, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_0", .udesc = "Read block modified From Local node to Node 0", .ucode = 0x14, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_0", .udesc = "Change-to-Dirty From Local node to Node 0", .ucode = 0x18, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_1", .udesc = "Read block From Local node to Node 1", .ucode = 0x21, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_1", .udesc = "Read block shared From Local node to Node 1", .ucode = 0x22, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_1", .udesc = "Read block modified From Local node to Node 1", .ucode = 0x24, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_1", .udesc = "Change-to-Dirty From Local node to Node 1", .ucode = 0x28, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_2", .udesc = "Read block From Local node to Node 2", .ucode = 0x41, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_2", .udesc = "Read block shared From Local node to Node 2", .ucode = 0x42, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_2", .udesc = "Read block modified From Local node to Node 2", .ucode = 0x44, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_2", .udesc = "Change-to-Dirty From Local node to Node 2", .ucode = 0x48, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_3", .udesc = "Read block From Local node to Node 3", .ucode = 0x81, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_3", .udesc = "Read block shared From Local node to Node 3", .ucode = 0x82, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_3", .udesc = "Read block modified From Local node to Node 3", .ucode = 0x84, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_3", .udesc = "Change-to-Dirty From Local node to Node 3", .ucode = 0x88, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cpu_read_command_requests_to_target_node_4_7[]={ { .uname = "READ_BLOCK_LOCAL_TO_NODE_4", .udesc = "Read block From Local node to Node 4", .ucode = 0x11, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_4", .udesc = "Read block shared From Local node to Node 4", .ucode = 0x12, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_4", .udesc = "Read block modified From Local node to Node 4", .ucode = 0x14, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_4", .udesc = "Change-to-Dirty From Local node to Node 4", .ucode = 0x18, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_5", .udesc = "Read block From Local node to Node 5", .ucode = 0x21, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_5", .udesc = "Read block shared From Local node to Node 5", .ucode = 0x22, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_5", .udesc = "Read block modified From Local node to Node 5", .ucode = 0x24, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_5", .udesc = "Change-to-Dirty From Local node to Node 5", .ucode = 0x28, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_6", .udesc = "Read block From Local node to Node 6", .ucode = 0x41, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_6", .udesc = "Read block shared From Local node to Node 6", .ucode = 0x42, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_6", .udesc = "Read block modified From Local node to Node 6", .ucode = 0x44, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_6", .udesc = "Change-to-Dirty From Local node to Node 6", .ucode = 0x48, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_LOCAL_TO_NODE_7", .udesc = "Read block From Local node to Node 7", .ucode = 0x81, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_SHARED_LOCAL_TO_NODE_7", .udesc = "Read block shared From Local node to Node 7", .ucode = 0x82, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_BLOCK_MODIFIED_LOCAL_TO_NODE_7", .udesc = "Read block modified From Local node to Node 7", .ucode = 0x84, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CHANGE_TO_DIRTY_LOCAL_TO_NODE_7", .udesc = "Change-to-Dirty From Local node to Node 7", .ucode = 0x88, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_cpu_command_requests_to_target_node[]={ { .uname = "READ_SIZED_LOCAL_TO_NODE_0", .udesc = "Read Sized From Local node to Node 0", .ucode = 0x11, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_0", .udesc = "Write Sized From Local node to Node 0", .ucode = 0x12, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_0", .udesc = "Victim Block From Local node to Node 0", .ucode = 0x14, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_1", .udesc = "Read Sized From Local node to Node 1", .ucode = 0x21, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_1", .udesc = "Write Sized From Local node to Node 1", .ucode = 0x22, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_1", .udesc = "Victim Block From Local node to Node 1", .ucode = 0x24, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_2", .udesc = "Read Sized From Local node to Node 2", .ucode = 0x41, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_2", .udesc = "Write Sized From Local node to Node 2", .ucode = 0x42, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_2", .udesc = "Victim Block From Local node to Node 2", .ucode = 0x44, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_3", .udesc = "Read Sized From Local node to Node 3", .ucode = 0x81, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_3", .udesc = "Write Sized From Local node to Node 3", .ucode = 0x82, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_3", .udesc = "Victim Block From Local node to Node 3", .ucode = 0x84, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_4", .udesc = "Read Sized From Local node to Node 4", .ucode = 0x19, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_4", .udesc = "Write Sized From Local node to Node 4", .ucode = 0x1a, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_4", .udesc = "Victim Block From Local node to Node 4", .ucode = 0x1c, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_5", .udesc = "Read Sized From Local node to Node 5", .ucode = 0x29, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_5", .udesc = "Write Sized From Local node to Node 5", .ucode = 0x2a, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_5", .udesc = "Victim Block From Local node to Node 5", .ucode = 0x2c, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_6", .udesc = "Read Sized From Local node to Node 6", .ucode = 0x49, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_6", .udesc = "Write Sized From Local node to Node 6", .ucode = 0x4a, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_6", .udesc = "Victim Block From Local node to Node 6", .ucode = 0x4c, .uflags= AMD64_FL_NCOMBO, }, { .uname = "READ_SIZED_LOCAL_TO_NODE_7", .udesc = "Read Sized From Local node to Node 7", .ucode = 0x89, .uflags= AMD64_FL_NCOMBO, }, { .uname = "WRITE_SIZED_LOCAL_TO_NODE_7", .udesc = "Write Sized From Local node to Node 7", .ucode = 0x8a, .uflags= AMD64_FL_NCOMBO, }, { .uname = "VICTIM_BLOCK_LOCAL_TO_NODE_7", .udesc = "Victim Block From Local node to Node 7", .ucode = 0x8c, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ALL_LOCAL_TO_NODE_0_3", .udesc = "All From Local node to Node 0-3", .ucode = 0xf7, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ALL_LOCAL_TO_NODE_4_7", .udesc = "All From Local node to Node 4-7", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_request_cache_status_0[]={ { .uname = "PROBE_HIT_S", .udesc = "Probe Hit S", .ucode = 0x1, }, { .uname = "PROBE_HIT_E", .udesc = "Probe Hit E", .ucode = 0x2, }, { .uname = "PROBE_HIT_MUW_OR_O", .udesc = "Probe Hit MuW or O", .ucode = 0x4, }, { .uname = "PROBE_HIT_M", .udesc = "Probe Hit M", .ucode = 0x8, }, { .uname = "PROBE_MISS", .udesc = "Probe Miss", .ucode = 0x10, }, { .uname = "DIRECTED_PROBE", .udesc = "Directed Probe", .ucode = 0x20, }, { .uname = "TRACK_CACHE_STAT_FOR_RDBLK", .udesc = "Track Cache Stat for RdBlk", .ucode = 0x40, }, { .uname = "TRACK_CACHE_STAT_FOR_RDBLKS", .udesc = "Track Cache Stat for RdBlkS", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_request_cache_status_1[]={ { .uname = "PROBE_HIT_S", .udesc = "Probe Hit S", .ucode = 0x1, }, { .uname = "PROBE_HIT_E", .udesc = "Probe Hit E", .ucode = 0x2, }, { .uname = "PROBE_HIT_MUW_OR_O", .udesc = "Probe Hit MuW or O", .ucode = 0x4, }, { .uname = "PROBE_HIT_M", .udesc = "Probe Hit M", .ucode = 0x8, }, { .uname = "PROBE_MISS", .udesc = "Probe Miss", .ucode = 0x10, }, { .uname = "DIRECTED_PROBE", .udesc = "Directed Probe", .ucode = 0x20, }, { .uname = "TRACK_CACHE_STAT_FOR_CHGTODIRTY", .udesc = "Track Cache Stat for ChgToDirty", .ucode = 0x40, }, { .uname = "TRACK_CACHE_STAT_FOR_RDBLKM", .udesc = "Track Cache Stat for RdBlkM", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_memory_controller_requests[]={ { .uname = "WRITE_REQUESTS_TO_DCT", .udesc = "Write requests sent to the DCT", .ucode = 0x1, }, { .uname = "READ_REQUESTS_TO_DCT", .udesc = "Read requests (including prefetch requests) sent to the DCT", .ucode = 0x2, }, { .uname = "PREFETCH_REQUESTS_TO_DCT", .udesc = "Prefetch requests sent to the DCT", .ucode = 0x4, }, { .uname = "32_BYTES_SIZED_WRITES", .udesc = "32 Bytes Sized Writes", .ucode = 0x8, }, { .uname = "64_BYTES_SIZED_WRITES", .udesc = "64 Bytes Sized Writes", .ucode = 0x10, }, { .uname = "32_BYTES_SIZED_READS", .udesc = "32 Bytes Sized Reads", .ucode = 0x20, }, { .uname = "64_BYTE_SIZED_READS", .udesc = "64 Byte Sized Reads", .ucode = 0x40, }, { .uname = "READ_REQUESTS_TO_DCT_WHILE_WRITES_PENDING", .udesc = "Read requests sent to the DCT while writes requests are pending in the DCT", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_read_request_to_l3_cache[]={ { .uname = "READ_BLOCK_EXCLUSIVE", .udesc = "Read Block Exclusive (Data cache read)", .ucode = 0x1, .grpid = 0, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read Block Shared (Instruction cache read)", .ucode = 0x2, .grpid = 0, }, { .uname = "READ_BLOCK_MODIFY", .udesc = "Read Block Modify", .ucode = 0x4, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "Count prefetches honly", .ucode = 0x8, .grpid = 0, }, { .uname = "READ_BLOCK_ANY", .udesc = "Count any read request", .ucode = 0x7, .grpid = 0, .uflags= AMD64_FL_DFL | AMD64_FL_NCOMBO, }, CORE_SELECT(1), }; static const amd64_umask_t amd64_fam15h_l3_fills_caused_by_l2_evictions[]={ { .uname = "SHARED", .udesc = "Shared", .ucode = 0x1, .grpid = 0, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x2, .grpid = 0, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x4, .grpid = 0, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x8, .grpid = 0, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, CORE_SELECT(1), }; static const amd64_umask_t amd64_fam15h_l3_evictions[]={ { .uname = "SHARED", .udesc = "Shared", .ucode = 0x1, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x2, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x4, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam15h_l3_latency[]={ { .uname = "L3_REQUEST_CYCLE", .udesc = "L3 Request cycle count.", .ucode = 0x1, }, { .uname = "L3_REQUEST", .udesc = "L3 request count.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_fam15h_pe[]={ { .name = "DISPATCHED_FPU_OPS", .desc = "FPU Pipe Assignment", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dispatched_fpu_ops), .ngrp = 1, .umasks = amd64_fam15h_dispatched_fpu_ops, }, { .name = "CYCLES_FPU_EMPTY", .desc = "FP Scheduler Empty", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x1, }, { .name = "RETIRED_SSE_OPS", .desc = "Retired SSE/BNI Ops", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_retired_sse_ops), .ngrp = 1, .umasks = amd64_fam15h_retired_sse_ops, }, { .name = "MOVE_SCALAR_OPTIMIZATION", .desc = "Number of Move Elimination and Scalar Op Optimization", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_move_scalar_optimization), .ngrp = 1, .umasks = amd64_fam15h_move_scalar_optimization, }, { .name = "RETIRED_SERIALIZING_OPS", .desc = "Retired Serializing Ops", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_retired_serializing_ops), .ngrp = 1, .umasks = amd64_fam15h_retired_serializing_ops, }, { .name = "BOTTOM_EXECUTE_OP", .desc = "Number of Cycles that a Bottom-Execute uop is in the FP Scheduler", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x6, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_segment_register_loads), .ngrp = 1, .umasks = amd64_fam15h_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline Restart Due to Self-Modifying Code", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline Restart Due to Probe Hit", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x22, }, { .name = "LOAD_Q_STORE_Q_FULL", .desc = "Load Queue/Store Queue Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x23, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_load_q_store_q_full), .ngrp = 1, .umasks = amd64_fam15h_load_q_store_q_full, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_locked_ops), .ngrp = 1, .umasks = amd64_fam15h_locked_ops, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x27, }, { .name = "CANCELLED_STORE_TO_LOAD", .desc = "Canceled Store to Load Forward Operations", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cancelled_store_to_load), .ngrp = 1, .umasks = amd64_fam15h_cancelled_store_to_load, }, { .name = "SMIS_RECEIVED", .desc = "SMIs Received", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x2b, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_data_cache_misses), .ngrp = 1, .umasks = amd64_fam15h_data_cache_misses, }, { .name = "DATA_CACHE_REFILLS_FROM_L2_OR_NORTHBRIDGE", .desc = "Data Cache Refills from L2 or System", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_data_cache_refills_from_l2_or_northbridge), .ngrp = 1, .umasks = amd64_fam15h_data_cache_refills_from_l2_or_northbridge, }, { .name = "DATA_CACHE_REFILLS_FROM_NORTHBRIDGE", .desc = "Data Cache Refills from System", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x43, }, { .name = "UNIFIED_TLB_HIT", .desc = "Unified TLB Hit", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x45, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_unified_tlb_hit), .ngrp = 1, .umasks = amd64_fam15h_unified_tlb_hit, }, { .name = "UNIFIED_TLB_MISS", .desc = "Unified TLB Miss", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x46, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_unified_tlb_miss), .ngrp = 1, .umasks = amd64_fam15h_unified_tlb_miss, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x47, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_fam15h_prefetch_instructions_dispatched, }, { .name = "INEFFECTIVE_SW_PREFETCHES", .desc = "Ineffective Software Prefetches", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x52, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_ineffective_sw_prefetches), .ngrp = 1, .umasks = amd64_fam15h_ineffective_sw_prefetches, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_requests), .ngrp = 1, .umasks = amd64_fam15h_memory_requests, }, { .name = "DATA_PREFETCHER", .desc = "Data Prefetcher", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_data_prefetcher), .ngrp = 1, .umasks = amd64_fam15h_data_prefetcher, }, { .name = "MAB_REQS", .desc = "MAB Requests", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_mab_reqs), .ngrp = 1, .umasks = amd64_fam15h_mab_reqs, }, { .name = "MAB_WAIT", .desc = "MAB Wait Cycles", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_mab_reqs), .ngrp = 1, .umasks = amd64_fam15h_mab_reqs, /* identical to actual umasks list for this event */ }, { .name = "SYSTEM_READ_RESPONSES", .desc = "Response From System on Cache Refills", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_system_read_responses), .ngrp = 1, .umasks = amd64_fam15h_system_read_responses, }, { .name = "OCTWORD_WRITE_TRANSFERS", .desc = "Octwords Written to System", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_octword_write_transfers), .ngrp = 1, .umasks = amd64_fam15h_octword_write_transfers, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x76, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_requests_to_l2), .ngrp = 1, .umasks = amd64_fam15h_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l2_cache_miss), .ngrp = 1, .umasks = amd64_fam15h_l2_cache_miss, }, { .name = "L2_CACHE_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l2_cache_fill_writeback), .ngrp = 1, .umasks = amd64_fam15h_l2_cache_fill_writeback, }, { .name = "PAGE_SPLINTERING", .desc = "Page Splintering", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x165, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_page_splintering), .ngrp = 1, .umasks = amd64_fam15h_page_splintering, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss, L2 ITLB Hit", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss, L2 ITLB Miss", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l1_itlb_miss_and_l2_itlb_miss), .ngrp = 1, .umasks = amd64_fam15h_l1_itlb_miss_and_l2_itlb_miss, }, { .name = "PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE", .desc = "Pipeline Restart Due to Instruction Stream Probe", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x86, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x89, }, { .name = "INSTRUCTION_CACHE_VICTIMS", .desc = "Instruction Cache Victims", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x8b, }, { .name = "INSTRUCTION_CACHE_INVALIDATED", .desc = "Instruction Cache Lines Invalidated", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x8c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_instruction_cache_invalidated), .ngrp = 1, .umasks = amd64_fam15h_instruction_cache_invalidated, }, { .name = "ITLB_RELOADS", .desc = "ITLB Reloads", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x99, }, { .name = "ITLB_RELOADS_ABORTED", .desc = "ITLB Reloads Aborted", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x9a, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xca, }, { .name = "RETIRED_MMX_FP_INSTRUCTIONS", .desc = "Retired MMX/FP Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_retired_mmx_fp_instructions), .ngrp = 1, .umasks = amd64_fam15h_retired_mmx_fp_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xcf, }, { .name = "DECODER_EMPTY", .desc = "Decoder Empty", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd0, }, { .name = "DISPATCH_STALLS", .desc = "Dispatch Stalls", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd1, }, { .name = "DISPATCH_STALL_FOR_SERIALIZATION", .desc = "Microsequencer Stall due to Serialization", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd3, }, { .name = "DISPATCH_STALL_FOR_RETIRE_QUEUE_FULL", .desc = "Dispatch Stall for Instruction Retire Q Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd5, }, { .name = "DISPATCH_STALL_FOR_INT_SCHED_QUEUE_FULL", .desc = "Dispatch Stall for Integer Scheduler Queue Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd6, }, { .name = "DISPATCH_STALL_FOR_FPU_FULL", .desc = "Dispatch Stall for FP Scheduler Queue Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd7, }, { .name = "DISPATCH_STALL_FOR_LDQ_FULL", .desc = "Dispatch Stall for LDQ Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd8, }, { .name = "MICROSEQ_STALL_WAITING_FOR_ALL_QUIET", .desc = "Microsequencer Stall Waiting for All Quiet", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xd9, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_fpu_exceptions), .ngrp = 1, .umasks = amd64_fam15h_fpu_exceptions, }, { .name = "DR0_BREAKPOINTS", .desc = "DR0 Breakpoint Match", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINTS", .desc = "DR1 Breakpoint Match", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINTS", .desc = "DR2 Breakpoint Match", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINTS", .desc = "DR3 Breakpoint Match", .modmsk = AMD64_FAM15H_ATTRS, .code = 0xdf, }, { .name = "IBS_OPS_TAGGED", .desc = "Tagged IBS Ops", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x1cf, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_ibs_ops_tagged), .ngrp = 1, .umasks = amd64_fam15h_ibs_ops_tagged, }, { .name = "LS_DISPATCH", .desc = "LS Dispatch", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_ls_dispatch), .ngrp = 1, .umasks = amd64_fam15h_ls_dispatch, }, { .name = "EXECUTED_CLFLUSH_INSTRUCTIONS", .desc = "Executed CLFLUSH Instructions", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x30, }, { .name = "L2_PREFETCHER_TRIGGER_EVENTS", .desc = "L2 Prefetcher Trigger Events", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x16c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l2_prefetcher_trigger_events), .ngrp = 1, .umasks = amd64_fam15h_l2_prefetcher_trigger_events, }, { .name = "DISPATCH_STALL_FOR_STQ_FULL", .desc = "Dispatch Stall for STQ Full", .modmsk = AMD64_FAM15H_ATTRS, .code = 0x1d8, }, { .name = "DRAM_ACCESSES", .desc = "DRAM Accesses", .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_accesses), .ngrp = 1, .umasks = amd64_fam15h_dram_accesses, }, { .name = "DRAM_CONTROLLER_PAGE_TABLE_OVERFLOWS", .desc = "DRAM Controller Page Table Overflows", .code = 0xe1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_dram_controller_page_table_overflows), .ngrp = 1, .umasks = amd64_fam15h_dram_controller_page_table_overflows, }, { .name = "MEMORY_CONTROLLER_DRAM_COMMAND_SLOTS_MISSED", .desc = "Memory Controller DRAM Command Slots Missed", .code = 0xe2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_dram_command_slots_missed), .ngrp = 1, .umasks = amd64_fam15h_memory_controller_dram_command_slots_missed, }, { .name = "MEMORY_CONTROLLER_TURNAROUNDS", .desc = "Memory Controller Turnarounds", .code = 0xe3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_turnarounds), .ngrp = 1, .umasks = amd64_fam15h_memory_controller_turnarounds, }, { .name = "MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION", .desc = "Memory Controller Bypass Counter Saturation", .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_bypass_counter_saturation), .ngrp = 1, .umasks = amd64_fam15h_memory_controller_bypass_counter_saturation, }, { .name = "THERMAL_STATUS", .desc = "Thermal Status", .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_thermal_status), .ngrp = 1, .umasks = amd64_fam15h_thermal_status, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .code = 0xe9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_fam15h_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK_COMMANDS", .desc = "Cache Block Commands", .code = 0xea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cache_block_commands), .ngrp = 1, .umasks = amd64_fam15h_cache_block_commands, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_sized_commands), .ngrp = 1, .umasks = amd64_fam15h_sized_commands, }, { .name = "PROBE_RESPONSES_AND_UPSTREAM_REQUESTS", .desc = "Probe Responses and Upstream Requests", .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_probe_responses_and_upstream_requests), .ngrp = 1, .umasks = amd64_fam15h_probe_responses_and_upstream_requests, }, { .name = "GART_EVENTS", .desc = "GART Events", .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_gart_events), .ngrp = 1, .umasks = amd64_fam15h_gart_events, }, { .name = "LINK_TRANSMIT_BANDWIDTH_LINK_0", .desc = "Link Transmit Bandwidth Link 0", .code = 0xf6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth), .ngrp = 2, .umasks = amd64_fam15h_link_transmit_bandwidth, }, { .name = "LINK_TRANSMIT_BANDWIDTH_LINK_1", .desc = "Link Transmit Bandwidth Link 1", .code = 0xf7, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth), .ngrp = 2, .umasks = amd64_fam15h_link_transmit_bandwidth, }, { .name = "LINK_TRANSMIT_BANDWIDTH_LINK_2", .desc = "Link Transmit Bandwidth Link 2", .code = 0xf8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth), .ngrp = 2, .umasks = amd64_fam15h_link_transmit_bandwidth, }, { .name = "LINK_TRANSMIT_BANDWIDTH_LINK_3", .desc = "Link Transmit Bandwidth Link 3", .code = 0x1f9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_link_transmit_bandwidth), .ngrp = 2, .umasks = amd64_fam15h_link_transmit_bandwidth, }, { .name = "CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE", .desc = "CPU to DRAM Requests to Target Node", .code = 0x1e0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_to_dram_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam15h_cpu_to_dram_requests_to_target_node, }, { .name = "IO_TO_DRAM_REQUESTS_TO_TARGET_NODE", .desc = "IO to DRAM Requests to Target Node", .code = 0x1e1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_io_to_dram_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam15h_io_to_dram_requests_to_target_node, }, { .name = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_0_3", .desc = "CPU Read Command Latency to Target Node 0-3", .code = 0x1e2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3), .ngrp = 1, .umasks = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3, }, { .name = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_0_3", .desc = "CPU Read Command Requests to Target Node 0-3", .code = 0x1e3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_0_3), .ngrp = 1, .umasks = amd64_fam15h_cpu_read_command_requests_to_target_node_0_3, }, { .name = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_4_7", .desc = "CPU Read Command Latency to Target Node 4-7", .code = 0x1e4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7), .ngrp = 1, .umasks = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7, }, { .name = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_4_7", .desc = "CPU Read Command Requests to Target Node 4-7", .code = 0x1e5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_read_command_requests_to_target_node_4_7), .ngrp = 1, .umasks = amd64_fam15h_cpu_read_command_requests_to_target_node_4_7, }, { .name = "CPU_COMMAND_LATENCY_TO_TARGET_NODE", .desc = "CPU Command Latency to Target Node", .code = 0x1e6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam15h_cpu_command_requests_to_target_node, }, { .name = "CPU_REQUESTS_TO_TARGET_NODE", .desc = "CPU Requests to Target Node", .code = 0x1e7, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_cpu_command_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam15h_cpu_command_requests_to_target_node, }, { .name = "REQUEST_CACHE_STATUS_0", .desc = "Request Cache Status 0", .code = 0x1ea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_0), .ngrp = 1, .umasks = amd64_fam15h_request_cache_status_0, }, { .name = "REQUEST_CACHE_STATUS_1", .desc = "Request Cache Status 1", .code = 0x1eb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_request_cache_status_1), .ngrp = 1, .umasks = amd64_fam15h_request_cache_status_1, }, { .name = "MEMORY_CONTROLLER_REQUESTS", .desc = "Memory Controller Requests", .code = 0x1f0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_memory_controller_requests), .ngrp = 1, .umasks = amd64_fam15h_memory_controller_requests, }, { .name = "READ_REQUEST_TO_L3_CACHE", .desc = "Read Request to L3 Cache", .code = 0x4e0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache), .ngrp = 2, .umasks = amd64_fam15h_read_request_to_l3_cache, }, { .name = "L3_CACHE_MISSES", .desc = "L3 Cache Misses", .code = 0x4e1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache), .ngrp = 2, .umasks = amd64_fam15h_read_request_to_l3_cache, }, { .name = "L3_FILLS_CAUSED_BY_L2_EVICTIONS", .desc = "L3 Fills caused by L2 Evictions", .code = 0x4e2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_fills_caused_by_l2_evictions), .ngrp = 2, .umasks = amd64_fam15h_l3_fills_caused_by_l2_evictions, }, { .name = "L3_EVICTIONS", .desc = "L3 Evictions", .code = 0x4e3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_evictions), .ngrp = 1, .umasks = amd64_fam15h_l3_evictions, }, { .name = "NON_CANCELED_L3_READ_REQUESTS", .desc = "Non-canceled L3 Read Requests", .code = 0x4ed, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_read_request_to_l3_cache), .ngrp = 2, .umasks = amd64_fam15h_read_request_to_l3_cache, }, { .name = "L3_LATENCY", .desc = "L3 Latency", .code = 0x4ef, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam15h_l3_latency), .ngrp = 1, .umasks = amd64_fam15h_l3_latency, }, }; papi-5.3.0/src/libpfm4/lib/events/torrent_events.h0000600003276200002170000011443312247131124021575 0ustar ralphundrgrad/* Power Torrent PMU event codes */ #ifndef __POWER_TORRENT_EVENTS_H__ #define __POWER_TORRENT_EVENTS_H__ /* PRELIMINARY EVENT ENCODING * 0x0000_0000 - 0x00FF_FFFF = PowerPC core events * 0x0100_0000 - 0x01FF_FFFF = Torrent events * 0x0200_0000 - 0xFFFF_FFFF = reserved * For Torrent events: * Reserve encodings 0x0..0x00FF_FFFF for core PowerPC events. * For Torrent events * 0x00F0_0000 = Torrent PMU id * 0x000F_0000 = PMU unit number (e.g. 0 for MCD0, 1 for MCD1) * 0x0000_FF00 = virtual counter number (unused on MCD) * 0x0000_00FF = PMC mux value (unused on Util, MMU, CAU) * (Note that some of these fields are wider than necessary) * * The upper bits 0xFFFF_FFFF_0000_0000 are reserved for attribute * fields. */ #define PMU_SPACE_MASK 0xFF000000 #define POWERPC_CORE_SPACE 0x00000000 #define TORRENT_SPACE 0x01000000 #define IS_CORE_EVENT(x) ((x & PMU_SPACE_MASK) == POWERPC_CORE_SPACE) #define IS_TORRENT_EVENT(x) ((x & PMU_SPACE_MASK) == TORRENT_SPACE) #define TORRENT_PMU_SHIFT 20 #define TORRENT_PMU_MASK (0xF << TORRENT_PMU_SHIFT) #define TORRENT_PMU_GET(x) ((x & TORRENT_PMU_MASK) >> TORRENT_PMU_SHIFT) #define TORRENT_UNIT_SHIFT 16 #define TORRENT_UNIT_MASK (0xF << TORRENT_UNIT_SHIFT) #define TORRENT_UNIT_GET(x) ((x & TORRENT_UNIT_MASK) >> TORRENT_UNIT_SHIFT) #define TORRENT_VIRT_CTR_SHIFT 8 #define TORRENT_VIRT_CTR_MASK (0xFF << TORRENT_VIRT_CTR_SHIFT) #define TORRENT_VIRT_CTR_GET(x) ((x & TORRENT_VIRT_CTR_MASK) >> TORRENT_VIRT_CTR_SHIFT) #define TORRENT_MUX_SHIFT 0 #define TORRENT_MUX_MASK 0xFF #define TORRENT_MUX_GET(x) ((x & TORRENT_MUX_MASK) >> TORRENT_MUX_SHIFT) #define TORRENT_PBUS_WXYZ_ID 0x0 #define TORRENT_PBUS_LL_ID 0x1 #define TORRENT_PBUS_MCD_ID 0x2 #define TORRENT_PBUS_UTIL_ID 0x3 #define TORRENT_MMU_ID 0x4 #define TORRENT_CAU_ID 0x5 #define TORRENT_LAST_ID (TORRENT_CAU_ID) #define TORRENT_NUM_PMU_TYPES (TORRENT_LAST_ID + 1) /* TORRENT_DEVEL_NUM_PMU_TYPES is so that we don't try to call functions in * PMUs which are not currently supported. When all Torrent PMUs are * supported, we NEED to remove this definition and replace the usages of it * with TORRENT_NUM_PMU_TYPES. */ #define TORRENT_DEVEL_NUM_PMU_TYPES (TORRENT_PBUS_WXYZ_ID + 1) #define TORRENT_PMU(pmu) (TORRENT_SPACE | \ TORRENT_##pmu##_ID << TORRENT_PMU_SHIFT) #define TORRENT_PBUS_WXYZ TORRENT_PMU(PBUS_WXYZ) #define TORRENT_PBUS_LL TORRENT_PMU(PBUS_LL) #define TORRENT_PBUS_MCD TORRENT_PMU(PBUS_MCD) #define TORRENT_PBUS_UTIL TORRENT_PMU(PBUS_UTIL) #define TORRENT_MMU TORRENT_PMU(MMU) #define TORRENT_CAU TORRENT_PMU(CAU) #define COUNTER_W (0 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_X (1 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_Y (2 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_Z (3 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL0 (0 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL1 (1 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL2 (2 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL3 (3 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL4 (4 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL5 (5 << TORRENT_VIRT_CTR_SHIFT) #define COUNTER_LL6 (6 << TORRENT_VIRT_CTR_SHIFT) /* Attributes */ #define TORRENT_ATTR_MCD_TYPE_SHIFT 32 #define TORRENT_ATTR_MCD_TYPE_MASK (0x3ULL << TORRENT_ATTR_MCD_TYPE_SHIFT) #define TORRENT_ATTR_UTIL_SEL_SHIFT 32 #define TORRENT_ATTR_UTIL_SEL_MASK (0x3ULL << TORRENT_ATTR_UTIL_SEL_SHIFT) #define TORRENT_ATTR_UTIL_CMP_SHIFT 34 #define TORRENT_ATTR_UTIL_CMP_MASK (0x1FULL << TORRENT_ATTR_UTIL_CMP_SHIFT) static const pme_torrent_entry_t torrent_pe[] = { { .pme_name = "PM_PBUS_W_DISABLED", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x0, .pme_desc = "The W Link event counter is disabled" }, { .pme_name = "PM_PBUS_W_IN_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x1, .pme_desc = "Bus cycles that the W Link \"in\" channel is idle" }, { .pme_name = "PM_PBUS_W_IN_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the W Link \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_W_IN_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x3, .pme_desc = "Bus cycles that the W Link \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_W_OUT_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x5, .pme_desc = "Bus cycles that the W Link \"out\" channel is idle" }, { .pme_name = "PM_PBUS_W_OUT_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the W Link \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_W_OUT_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_W | 0x7, .pme_desc = "Bus cycles that the W Link \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_X_DISABLED", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x0, .pme_desc = "The X Link event counter is disabled" }, { .pme_name = "PM_PBUS_X_IN_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x1, .pme_desc = "Bus cycles that the X Link \"in\" channel is idle" }, { .pme_name = "PM_PBUS_X_IN_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the X Link \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_X_IN_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x3, .pme_desc = "Bus cycles that the X Link \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_X_OUT_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x5, .pme_desc = "Bus cycles that the X Link \"out\" channel is idle" }, { .pme_name = "PM_PBUS_X_OUT_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the X Link \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_X_OUT_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_X | 0x7, .pme_desc = "Bus cycles that the X Link \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_Y_DISABLED", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x0, .pme_desc = "The Y Link event counter is disabled" }, { .pme_name = "PM_PBUS_Y_IN_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x1, .pme_desc = "Bus cycles that the Y Link \"in\" channel is idle" }, { .pme_name = "PM_PBUS_Y_IN_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Y Link \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_Y_IN_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x3, .pme_desc = "Bus cycles that the Y Link \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_Y_OUT_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x5, .pme_desc = "Bus cycles that the Y Link \"out\" channel is idle", }, { .pme_name = "PM_PBUS_Y_OUT_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Y Link \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_Y_OUT_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Y | 0x7, .pme_desc = "Bus cycles that the W Link \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_Z_DISABLED", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x0, .pme_desc = "The Z Link event counter is disabled" }, { .pme_name = "PM_PBUS_Z_IN_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x1, .pme_desc = "Bus cycles that the Z Link \"in\" channel is idle" }, { .pme_name = "PM_PBUS_Z_IN_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Z Link \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_Z_IN_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x3, .pme_desc = "Bus cycles that the Z Link \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_Z_OUT_IDLE", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x5, .pme_desc = "Bus cycles that the Z Link \"out\" channel is idle" }, { .pme_name = "PM_PBUS_Z_OUT_CMDRSP", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Z Link \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_Z_OUT_DATA", .pme_code = TORRENT_PBUS_WXYZ | COUNTER_Z | 0x7, .pme_desc = "Bus cycles that the Z Link \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL0_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x0, .pme_desc = "The Local Link 0 event counter is disabled" }, { .pme_name = "PM_PBUS_LL0_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x1, .pme_desc = "Bus cycles that the Local Link 0 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL0_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 0 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL0_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x3, .pme_desc = "Bus cycles that the Local Link 0 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL0_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x5, .pme_desc = "Bus cycles that the Local Link 0 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL0_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 0 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL0_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x7, .pme_desc = "Bus cycles that the Local Link 0 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL0_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0x9, .pme_desc = "Bus cycles that the Local Link 0 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL0_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL0 | 0xd, .pme_desc = "Bus cycles that the Local Link 0 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL1_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x0, .pme_desc = "The Local Link 1 event counter is disabled" }, { .pme_name = "PM_PBUS_LL1_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x1, .pme_desc = "Bus cycles that the Local Link 1 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL1_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 1 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL1_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x3, .pme_desc = "Bus cycles that the Local Link 1 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL1_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x5, .pme_desc = "Bus cycles that the Local Link 1 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL1_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 1 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL1_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x7, .pme_desc = "Bus cycles that the Local Link 1 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL1_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0x9, .pme_desc = "Bus cycles that the Local Link 1 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL1_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL1 | 0xd, .pme_desc = "Bus cycles that the Local Link 1 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL2_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x0, .pme_desc = "The Local Link 2 event counter is disabled" }, { .pme_name = "PM_PBUS_LL2_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x1, .pme_desc = "Bus cycles that the Local Link 2 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL2_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 2 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL2_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x3, .pme_desc = "Bus cycles that the Local Link 2 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL2_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x5, .pme_desc = "Bus cycles that the Local Link 2 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL2_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 2 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL2_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x7, .pme_desc = "Bus cycles that the Local Link 2 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL2_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0x9, .pme_desc = "Bus cycles that the Local Link 2 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL2_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL2 | 0xd, .pme_desc = "Bus cycles that the Local Link 2 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL3_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x0, .pme_desc = "The Local Link 3 event counter is disabled" }, { .pme_name = "PM_PBUS_LL3_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x1, .pme_desc = "Bus cycles that the Local Link 3 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL3_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 3 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL3_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x3, .pme_desc = "Bus cycles that the Local Link 3 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL3_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x5, .pme_desc = "Bus cycles that the Local Link 3 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL3_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 3 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL3_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x7, .pme_desc = "Bus cycles that the Local Link 3 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL3_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0x9, .pme_desc = "Bus cycles that the Local Link 3 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL3_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL3 | 0xd, .pme_desc = "Bus cycles that the Local Link 3 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL4_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x0, .pme_desc = "The Local Link 4 event counter is disabled" }, { .pme_name = "PM_PBUS_LL4_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x1, .pme_desc = "Bus cycles that the Local Link 4 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL4_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 4 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL4_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x3, .pme_desc = "Bus cycles that the Local Link 4 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL4_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x5, .pme_desc = "Bus cycles that the Local Link 4 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL4_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 4 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL4_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x7, .pme_desc = "Bus cycles that the Local Link 4 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL4_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0x9, .pme_desc = "Bus cycles that the Local Link 4 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL4_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL4 | 0xd, .pme_desc = "Bus cycles that the Local Link 4 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL5_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x0, .pme_desc = "The Local Link 5 event counter is disabled" }, { .pme_name = "PM_PBUS_LL5_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x1, .pme_desc = "Bus cycles that the Local Link 5 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL5_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 5 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL5_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x3, .pme_desc = "Bus cycles that the Local Link 5 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL5_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x5, .pme_desc = "Bus cycles that the Local Link 5 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL5_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 5 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL5_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x7, .pme_desc = "Bus cycles that the Local Link 5 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL5_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0x9, .pme_desc = "Bus cycles that the Local Link 5 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL5_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL5 | 0xd, .pme_desc = "Bus cycles that the Local Link 5 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL6_DISABLED", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x0, .pme_desc = "The Local Link 6 event counter is disabled" }, { .pme_name = "PM_PBUS_LL6_IN_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x1, .pme_desc = "Bus cycles that the Local Link 6 \"in\" channel is idle" }, { .pme_name = "PM_PBUS_LL6_IN_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x2, .pme_desc = "Number of commands, partial responses, and combined responses received on the Local Link 6 \"in\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL6_IN_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x3, .pme_desc = "Bus cycles that the Local Link 6 \"in\" channel is receiving data or a data header" }, { .pme_name = "PM_PBUS_LL6_OUT_IDLE", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x5, .pme_desc = "Bus cycles that the Local Link 6 \"out\" channel is idle" }, { .pme_name = "PM_PBUS_LL6_OUT_CMDRSP", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x6, .pme_desc = "Number of commands, partial responses, and combined responses sent on the Local Link 6 \"out\" channel (Note: multiple events can occur in one cycle)" }, { .pme_name = "PM_PBUS_LL6_OUT_DATA", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x7, .pme_desc = "Bus cycles that the Local Link 6 \"out\" channel is sending data or a data header" }, { .pme_name = "PM_PBUS_LL6_IN_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0x9, .pme_desc = "Bus cycles that the Local Link 6 \"in\" channel is receiving ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_LL6_OUT_ISR", .pme_code = TORRENT_PBUS_LL | COUNTER_LL6 | 0xd, .pme_desc = "Bus cycles that the Local Link 6 \"out\" channel is sending ISR data or an ISR data header" }, { .pme_name = "PM_PBUS_MCD0_PROBE_ISSUED", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x00, .pme_desc = "cl_probe command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_PROBE_CRESP_GOOD", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x01, .pme_desc = "cResp for a cl_probe was addr_ack_done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_PROBE_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x02, .pme_desc = "cResp for a cl_probe was rty_sp or addr_error or unexpected cResp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH1_ISSUED", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x03, .pme_desc = "dcbfk command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH0_ISSUED", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x04, .pme_desc = "dcbf command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_BKILL_ISSUED", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x05, .pme_desc = "bkill command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH1_GOOD_COMP", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x06, .pme_desc = "cResp for a dcbfk was addr_ack_done and no collision", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH1_COLLISION", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x07, .pme_desc = "dcbfk had a collision", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH1_BAD_CRESP", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x08, .pme_desc = "cResp for a dcbfk was rty_sp or fl_addr_ack_bk_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_FLUSH0_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x09, .pme_desc = "cResp for a dcbf was rty_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_BKILL_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0A, .pme_desc = "cResp for a bkill was rty_sp or fl_addr_ack_bk_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_HIT", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0B, .pme_desc = "a reflected command got a hit", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_MISS", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0C, .pme_desc = "a reflected command got a miss", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_HIT_MD", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0D, .pme_desc = "a reflected command got a hit in the main directory", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_HIT_NE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0E, .pme_desc = "a reflected command got a hit in the new entry buffer", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_HIT_CO", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x0F, .pme_desc = "a reflected command got a hit in the castout buffer", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_MISS_CREATE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x10, .pme_desc = "a reflected command with a miss should create an entry", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RCMD_MISS_CREATED", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x11, .pme_desc = "a new entry was created", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RTY_DINC", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x12, .pme_desc = "MCD responded rty_dinc", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_RTY_FULL", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x13, .pme_desc = "MCD responded rty_lpc", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_BK_RTY", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x14, .pme_desc = "MCD responded with a master retry (rty_other or rty_lost_claim)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_NE_FULL", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x15, .pme_desc = "The new entry buffer is full", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_DEMAND_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x16, .pme_desc = "A demand castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_OTHER_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x17, .pme_desc = "A non-demand castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x18, .pme_desc = "A castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_CO_MOVE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x19, .pme_desc = "A castout entry was moved to the main directory", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_NE_MOVE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1A, .pme_desc = "A new entry movement was processed", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_PAGE_CREATE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1B, .pme_desc = "A new entry movement created a page (got a miss)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_NE_MOVE_MERGE", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1C, .pme_desc = "A new entry movement merged with an existing page (got a hit)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_NE_MOVE_ABORT_FLUSH", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1D, .pme_desc = "A new entry movement was aborted due to flush in progress", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_NE_MOVE_ABORT_COQ", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1E, .pme_desc = "A new entry movement was aborted due to castout buffer full", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_EM_HOLDOFF", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x1F, .pme_desc = "An entry movement was held off", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD0_EMQ_NOT_MT", .pme_code = TORRENT_PBUS_MCD | 0 << TORRENT_UNIT_SHIFT | 0x21, .pme_desc = "The entry movement queue is not empty", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_PROBE_ISSUED", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x00, .pme_desc = "cl_probe command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_PROBE_CRESP_GOOD", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x01, .pme_desc = "cResp for a cl_probe was addr_ack_done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_PROBE_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x02, .pme_desc = "cResp for a cl_probe was rty_sp or addr_error or unexpected cResp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH1_ISSUED", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x03, .pme_desc = "dcbfk command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH0_ISSUED", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x04, .pme_desc = "dcbf command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_BKILL_ISSUED", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x05, .pme_desc = "bkill command issued", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH1_GOOD_COMP", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x06, .pme_desc = "cResp for a dcbfk was addr_ack_done and no collision", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH1_COLLISION", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x07, .pme_desc = "dcbfk had a collision", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH1_BAD_CRESP", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x08, .pme_desc = "cResp for a dcbfk was rty_sp or fl_addr_ack_bk_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_FLUSH0_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x09, .pme_desc = "cResp for a dcbf was rty_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_BKILL_CRESP_RETRY", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0A, .pme_desc = "cResp for a bkill was rty_sp or fl_addr_ack_bk_sp", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_HIT", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0B, .pme_desc = "a reflected command got a hit", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_MISS", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0C, .pme_desc = "a reflected command got a miss", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_HIT_MD", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0D, .pme_desc = "a reflected command got a hit in the main directory", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_HIT_NE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0E, .pme_desc = "a reflected command got a hit in the new entry buffer", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_HIT_CO", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x0F, .pme_desc = "a reflected command got a hit in the castout buffer", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_MISS_CREATE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x10, .pme_desc = "a reflected command with a miss should create an entry", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RCMD_MISS_CREATED", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x11, .pme_desc = "a new entry was created", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RTY_DINC", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x12, .pme_desc = "MCD responded rty_dinc", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_RTY_FULL", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x13, .pme_desc = "MCD responded rty_lpc", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_BK_RTY", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x14, .pme_desc = "MCD responded with a master retry (rty_other or rty_lost_claim)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_NE_FULL", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x15, .pme_desc = "The new entry buffer is full", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_DEMAND_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x16, .pme_desc = "A demand castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_OTHER_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x17, .pme_desc = "A non-demand castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_CASTOUT", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x18, .pme_desc = "A castout was done", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_CO_MOVE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x19, .pme_desc = "A castout entry was moved to the main directory", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_NE_MOVE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1A, .pme_desc = "A new entry movement was processed", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_PAGE_CREATE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1B, .pme_desc = "A new entry movement created a page (got a miss)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_NE_MOVE_MERGE", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1C, .pme_desc = "A new entry movement merged with an existing page (got a hit)", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_NE_MOVE_ABORT_FLUSH", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1D, .pme_desc = "A new entry movement was aborted due to flush in progress", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_NE_MOVE_ABORT_COQ", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1E, .pme_desc = "A new entry movement was aborted due to castout buffer full", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_EM_HOLDOFF", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x1F, .pme_desc = "An entry movement was held off", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_MCD1_EMQ_NOT_MT", .pme_code = TORRENT_PBUS_MCD | 1 << TORRENT_UNIT_SHIFT | 0x21, .pme_desc = "The entry movement queue is not empty", .pme_modmsk = _TORRENT_ATTR_MCD }, { .pme_name = "PM_PBUS_UTIL_PB_APM_NM_HI_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x0 << TORRENT_VIRT_CTR_SHIFT | 0x0, .pme_desc = "Node Master High Threshold Counter", .pme_modmsk = _TORRENT_ATTR_UTIL_HI }, { .pme_name = "PM_PBUS_UTIL_PB_APM_NM_LO_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x1 << TORRENT_VIRT_CTR_SHIFT | 0x0, .pme_desc = "Node Master Low Threshold Counter", .pme_modmsk = _TORRENT_ATTR_UTIL_LO }, { .pme_name = "PM_PBUS_UTIL_PB_APM_LM_HI_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x2 << TORRENT_VIRT_CTR_SHIFT | 0x0, .pme_desc = "Local Master High Threshold Counter", .pme_modmsk = _TORRENT_ATTR_UTIL_HI }, { .pme_name = "PM_PBUS_UTIL_PB_APM_LM_LO_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x3 << TORRENT_VIRT_CTR_SHIFT | 0x0, .pme_desc = "Local Master Low Threshold Counter", .pme_modmsk = _TORRENT_ATTR_UTIL_LO }, { .pme_name = "PM_PBUS_UTIL_NODE_MASTER_PUMPS", .pme_code = TORRENT_PBUS_UTIL | 0x0 << TORRENT_VIRT_CTR_SHIFT | 0x1, .pme_desc = "Node Master Pumps" }, { .pme_name = "PM_PBUS_UTIL_LOCAL_MASTER_PUMPS", .pme_code = TORRENT_PBUS_UTIL | 0x1 << TORRENT_VIRT_CTR_SHIFT | 0x1, .pme_desc = "Local Master Pumps" }, { .pme_name = "PM_PBUS_UTIL_RETRY_NODE_MASTER_PUMPS", .pme_code = TORRENT_PBUS_UTIL | 0x2 << TORRENT_VIRT_CTR_SHIFT | 0x1, .pme_desc = "Retry Node Master Pumps" }, { .pme_name = "PM_PBUS_UTIL_RETRY_LOCAL_MASTER_PUMPS", .pme_code = TORRENT_PBUS_UTIL | 0x3 << TORRENT_VIRT_CTR_SHIFT | 0x1, .pme_desc = "Retry Local Master Pumps" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_RCMD_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x4 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "rCmd Activity Counter" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_INTDATA_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x5 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "Internal Data Counter" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDATSND_W_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x6 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Send Activity Counter for WXYZ links" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDATRCV_W_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x7 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Receive Activity Counter for WXYZ links" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDATSND_LL_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x8 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Send Activity Counter for LL links" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDATRCV_LL_CNT", .pme_code = TORRENT_PBUS_UTIL | 0x9 << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Receive Activity Counter for LL links" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDAT_W_LL_CNT", .pme_code = TORRENT_PBUS_UTIL | 0xA << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Activity Counter from WXYZ to LL links" }, { .pme_name = "PM_PBUS_UTIL_PB_APM_EXTDAT_LL_W_CNT", .pme_code = TORRENT_PBUS_UTIL | 0xB << TORRENT_VIRT_CTR_SHIFT, .pme_desc = "External Data Activity Counter from LL to WXYZ links" }, { .pme_name = "PM_MMU_G_MMCHIT", .pme_code = TORRENT_MMU | (0 << TORRENT_VIRT_CTR_SHIFT), .pme_desc = "Memory Management Cache Hit Counter Register" }, { .pme_name = "PM_MMU_G_MMCMIS", .pme_code = TORRENT_MMU | (1 << TORRENT_VIRT_CTR_SHIFT), .pme_desc = "Memory Management Cache Miss Counter Register" }, { .pme_name = "PM_MMU_G_MMATHIT", .pme_code = TORRENT_MMU | (2 << TORRENT_VIRT_CTR_SHIFT), .pme_desc = "Memory Management AT Cache Hit Counter Register" }, { .pme_name = "PM_MMU_G_MMATMIS", .pme_code = TORRENT_MMU | (3 << TORRENT_VIRT_CTR_SHIFT), .pme_desc = "Memory Management AT Cache Miss Counter Register" }, { .pme_name = "PM_CAU_CYCLES_WAITING_ON_A_CREDIT", .pme_code = TORRENT_CAU | 0, .pme_desc = "Count of cycles spent waiting on a credit. Increments whenever any index has a packet to send, but nothing (from any index) can be sent." }, }; #define PME_TORRENT_EVENT_COUNT (sizeof(torrent_pe) / sizeof(pme_torrent_entry_t)) #endif papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_r3qpi_events.h0000600003276200002170000002232712247131123024364 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snbep_unc_r3qpi (Intel SandyBridge-EP R3QPI uncore) */ static const intel_x86_umask_t snbep_unc_r3_iio_credits_acquired[]={ { .uname = "DRS", .udesc = "DRS", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB", .udesc = "NCB", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "NCS", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_r3_ring_ad_used[]={ { .uname = "CCW_EVEN", .udesc = "Counter-Clockwise and even ring polarity", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CCW_ODD", .udesc = "Counter-Clockwise and odd ring polarity", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CW_EVEN", .udesc = "Clockwise and even ring polarity", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CW_ODD", .udesc = "Clockwise and odd ring polarity", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_r3_ring_iv_used[]={ { .uname = "ANY", .udesc = "Any polarity", .ucode = 0xf00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_r3_rxr_bypassed[]={ { .uname = "AD", .udesc = "Ingress Bypassed", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_r3_rxr_cycles_ne[]={ { .uname = "DRS", .udesc = "DRS Ingress queue", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM", .udesc = "HOM Ingress queue", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB", .udesc = "NCB Ingress queue", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "NCS Ingress queue", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NDR", .udesc = "NDR Ingress queue", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "SNP Ingress queue", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_r3_vn0_credits_reject[]={ { .uname = "DRS", .udesc = "Filter DRS message class", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM", .udesc = "Filter HOM message class", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB", .udesc = "Filter NCB message class", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "Filter NCS message class", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NDR", .udesc = "Filter NDR message class", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "Filter SNP message class", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_r3_pe[]={ { .name = "UNC_R3_CLOCKTICKS", .desc = "Number of uclks in domain", .code = 0x1, .cntmsk = 0x7, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_IIO_CREDITS_ACQUIRED", .desc = "to IIO BL Credit Acquired", .code = 0x20, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_iio_credits_acquired), .umasks = snbep_unc_r3_iio_credits_acquired }, { .name = "UNC_R3_IIO_CREDITS_REJECT", .desc = "to IIO BL Credit Rejected", .code = 0x21, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_iio_credits_acquired), .umasks = snbep_unc_r3_iio_credits_acquired /* shared */ }, { .name = "UNC_R3_IIO_CREDITS_USED", .desc = "to IIO BL Credit In Use", .code = 0x22, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_iio_credits_acquired), .umasks = snbep_unc_r3_iio_credits_acquired /* shared */ }, { .name = "UNC_R3_RING_AD_USED", .desc = "R3 AD Ring in Use", .code = 0x7, .cntmsk = 0x7, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_ring_ad_used), .umasks = snbep_unc_r3_ring_ad_used }, { .name = "UNC_R3_RING_AK_USED", .desc = "R3 AK Ring in Use", .code = 0x8, .cntmsk = 0x7, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_ring_ad_used), .umasks = snbep_unc_r3_ring_ad_used /* shared */ }, { .name = "UNC_R3_RING_BL_USED", .desc = "R3 BL Ring in Use", .code = 0x9, .cntmsk = 0x7, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_ring_ad_used), .umasks = snbep_unc_r3_ring_ad_used /* shared */ }, { .name = "UNC_R3_RING_IV_USED", .desc = "R3 IV Ring in Use", .code = 0xa, .cntmsk = 0x7, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_ring_iv_used), .umasks = snbep_unc_r3_ring_iv_used }, { .name = "UNC_R3_RXR_BYPASSED", .desc = "Ingress Bypassed", .code = 0x12, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_rxr_bypassed), .umasks = snbep_unc_r3_rxr_bypassed }, { .name = "UNC_R3_RXR_CYCLES_NE", .desc = "Ingress Cycles Not Empty", .code = 0x10, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_rxr_cycles_ne), .umasks = snbep_unc_r3_rxr_cycles_ne }, { .name = "UNC_R3_RXR_INSERTS", .desc = "Ingress Allocations", .code = 0x11, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_rxr_cycles_ne), .umasks = snbep_unc_r3_rxr_cycles_ne /* shared */ }, { .name = "UNC_R3_RXR_OCCUPANCY", .desc = "Ingress Occupancy Accumulator", .code = 0x13, .cntmsk = 0x1, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_rxr_cycles_ne), .umasks = snbep_unc_r3_rxr_cycles_ne /* shared */ }, { .name = "UNC_R3_TXR_CYCLES_FULL", .desc = "Egress cycles full", .code = 0x25, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_TXR_INSERTS", .desc = "Egress allocations", .code = 0x24, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_TXR_NACK", .desc = "Egress Nack", .code = 0x26, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_VN0_CREDITS_REJECT", .desc = "VN0 Credit Acquisition Failed on DRS", .code = 0x37, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_vn0_credits_reject), .umasks = snbep_unc_r3_vn0_credits_reject }, { .name = "UNC_R3_VN0_CREDITS_USED", .desc = "VN0 Credit Used", .code = 0x36, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_vn0_credits_reject), .umasks = snbep_unc_r3_vn0_credits_reject /* shared */ }, { .name = "UNC_R3_VNA_CREDITS_ACQUIRED", .desc = "VNA credit Acquisitions", .code = 0x33, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_VNA_CREDITS_REJECT", .desc = "VNA Credit Reject", .code = 0x34, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R3QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r3_vn0_credits_reject), .umasks = snbep_unc_r3_vn0_credits_reject /* shared */ }, { .name = "UNC_R3_VNA_CREDIT_CYCLES_OUT", .desc = "Cycles with no VNA credits available", .code = 0x31, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, { .name = "UNC_R3_VNA_CREDIT_CYCLES_USED", .desc = "Cycles with 1 or more VNA credits in use", .code = 0x32, .cntmsk = 0x3, .modmsk = SNBEP_UNC_R3QPI_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/power7_events.h0000600003276200002170000050771112247131124021330 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER7_EVENTS_H__ #define __POWER7_EVENTS_H__ /* * File: power7_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * * Documentation on the PMU events can be found at: * http://www.power.org/documentation/comprehensive-pmu-event-reference-power7 */ #define POWER7_PME_PM_IC_DEMAND_L2_BR_ALL 0 #define POWER7_PME_PM_GCT_UTIL_7_TO_10_SLOTS 1 #define POWER7_PME_PM_PMC2_SAVED 2 #define POWER7_PME_PM_CMPLU_STALL_DFU 3 #define POWER7_PME_PM_VSU0_16FLOP 4 #define POWER7_PME_PM_MRK_LSU_DERAT_MISS 5 #define POWER7_PME_PM_MRK_ST_CMPL 6 #define POWER7_PME_PM_NEST_PAIR3_ADD 7 #define POWER7_PME_PM_L2_ST_DISP 8 #define POWER7_PME_PM_L2_CASTOUT_MOD 9 #define POWER7_PME_PM_ISEG 10 #define POWER7_PME_PM_MRK_INST_TIMEO 11 #define POWER7_PME_PM_L2_RCST_DISP_FAIL_ADDR 12 #define POWER7_PME_PM_LSU1_DC_PREF_STREAM_CONFIRM 13 #define POWER7_PME_PM_IERAT_WR_64K 14 #define POWER7_PME_PM_MRK_DTLB_MISS_16M 15 #define POWER7_PME_PM_IERAT_MISS 16 #define POWER7_PME_PM_MRK_PTEG_FROM_LMEM 17 #define POWER7_PME_PM_FLOP 18 #define POWER7_PME_PM_THRD_PRIO_4_5_CYC 19 #define POWER7_PME_PM_BR_PRED_TA 20 #define POWER7_PME_PM_CMPLU_STALL_FXU 21 #define POWER7_PME_PM_EXT_INT 22 #define POWER7_PME_PM_VSU_FSQRT_FDIV 23 #define POWER7_PME_PM_MRK_LD_MISS_EXPOSED_CYC 24 #define POWER7_PME_PM_LSU1_LDF 25 #define POWER7_PME_PM_IC_WRITE_ALL 26 #define POWER7_PME_PM_LSU0_SRQ_STFWD 27 #define POWER7_PME_PM_PTEG_FROM_RL2L3_MOD 28 #define POWER7_PME_PM_MRK_DATA_FROM_L31_SHR 29 #define POWER7_PME_PM_DATA_FROM_L21_MOD 30 #define POWER7_PME_PM_VSU1_SCAL_DOUBLE_ISSUED 31 #define POWER7_PME_PM_VSU0_8FLOP 32 #define POWER7_PME_PM_POWER_EVENT1 33 #define POWER7_PME_PM_DISP_CLB_HELD_BAL 34 #define POWER7_PME_PM_VSU1_2FLOP 35 #define POWER7_PME_PM_LWSYNC_HELD 36 #define POWER7_PME_PM_PTEG_FROM_DL2L3_SHR 37 #define POWER7_PME_PM_INST_FROM_L21_MOD 38 #define POWER7_PME_PM_IERAT_XLATE_WR_16MPLUS 39 #define POWER7_PME_PM_IC_REQ_ALL 40 #define POWER7_PME_PM_DSLB_MISS 41 #define POWER7_PME_PM_L3_MISS 42 #define POWER7_PME_PM_LSU0_L1_PREF 43 #define POWER7_PME_PM_VSU_SCALAR_SINGLE_ISSUED 44 #define POWER7_PME_PM_LSU1_DC_PREF_STREAM_CONFIRM_STRIDE 45 #define POWER7_PME_PM_L2_INST 46 #define POWER7_PME_PM_VSU0_FRSP 47 #define POWER7_PME_PM_FLUSH_DISP 48 #define POWER7_PME_PM_PTEG_FROM_L2MISS 49 #define POWER7_PME_PM_VSU1_DQ_ISSUED 50 #define POWER7_PME_PM_CMPLU_STALL_LSU 51 #define POWER7_PME_PM_MRK_DATA_FROM_DMEM 52 #define POWER7_PME_PM_LSU_FLUSH_ULD 53 #define POWER7_PME_PM_PTEG_FROM_LMEM 54 #define POWER7_PME_PM_MRK_DERAT_MISS_16M 55 #define POWER7_PME_PM_THRD_ALL_RUN_CYC 56 #define POWER7_PME_PM_MEM0_PREFETCH_DISP 57 #define POWER7_PME_PM_MRK_STALL_CMPLU_CYC_COUNT 58 #define POWER7_PME_PM_DATA_FROM_DL2L3_MOD 59 #define POWER7_PME_PM_VSU_FRSP 60 #define POWER7_PME_PM_MRK_DATA_FROM_L21_MOD 61 #define POWER7_PME_PM_PMC1_OVERFLOW 62 #define POWER7_PME_PM_VSU0_SINGLE 63 #define POWER7_PME_PM_MRK_PTEG_FROM_L3MISS 64 #define POWER7_PME_PM_MRK_PTEG_FROM_L31_SHR 65 #define POWER7_PME_PM_VSU0_VECTOR_SP_ISSUED 66 #define POWER7_PME_PM_VSU1_FEST 67 #define POWER7_PME_PM_MRK_INST_DISP 68 #define POWER7_PME_PM_VSU0_COMPLEX_ISSUED 69 #define POWER7_PME_PM_LSU1_FLUSH_UST 70 #define POWER7_PME_PM_INST_CMPL 71 #define POWER7_PME_PM_FXU_IDLE 72 #define POWER7_PME_PM_LSU0_FLUSH_ULD 73 #define POWER7_PME_PM_MRK_DATA_FROM_DL2L3_MOD 74 #define POWER7_PME_PM_LSU_LMQ_SRQ_EMPTY_ALL_CYC 75 #define POWER7_PME_PM_LSU1_REJECT_LMQ_FULL 76 #define POWER7_PME_PM_INST_PTEG_FROM_L21_MOD 77 #define POWER7_PME_PM_INST_FROM_RL2L3_MOD 78 #define POWER7_PME_PM_SHL_CREATED 79 #define POWER7_PME_PM_L2_ST_HIT 80 #define POWER7_PME_PM_DATA_FROM_DMEM 81 #define POWER7_PME_PM_L3_LD_MISS 82 #define POWER7_PME_PM_FXU1_BUSY_FXU0_IDLE 83 #define POWER7_PME_PM_DISP_CLB_HELD_RES 84 #define POWER7_PME_PM_L2_SN_SX_I_DONE 85 #define POWER7_PME_PM_GRP_CMPL 86 #define POWER7_PME_PM_STCX_CMPL 87 #define POWER7_PME_PM_VSU0_2FLOP 88 #define POWER7_PME_PM_L3_PREF_MISS 89 #define POWER7_PME_PM_LSU_SRQ_SYNC_CYC 90 #define POWER7_PME_PM_LSU_REJECT_ERAT_MISS 91 #define POWER7_PME_PM_L1_ICACHE_MISS 92 #define POWER7_PME_PM_LSU1_FLUSH_SRQ 93 #define POWER7_PME_PM_LD_REF_L1_LSU0 94 #define POWER7_PME_PM_VSU0_FEST 95 #define POWER7_PME_PM_VSU_VECTOR_SINGLE_ISSUED 96 #define POWER7_PME_PM_FREQ_UP 97 #define POWER7_PME_PM_DATA_FROM_LMEM 98 #define POWER7_PME_PM_LSU1_LDX 99 #define POWER7_PME_PM_PMC3_OVERFLOW 100 #define POWER7_PME_PM_MRK_BR_MPRED 101 #define POWER7_PME_PM_SHL_MATCH 102 #define POWER7_PME_PM_MRK_BR_TAKEN 103 #define POWER7_PME_PM_CMPLU_STALL_BRU 104 #define POWER7_PME_PM_ISLB_MISS 105 #define POWER7_PME_PM_CYC 106 #define POWER7_PME_PM_DISP_HELD_THERMAL 107 #define POWER7_PME_PM_INST_PTEG_FROM_RL2L3_SHR 108 #define POWER7_PME_PM_LSU1_SRQ_STFWD 109 #define POWER7_PME_PM_GCT_NOSLOT_BR_MPRED 110 #define POWER7_PME_PM_1PLUS_PPC_CMPL 111 #define POWER7_PME_PM_PTEG_FROM_DMEM 112 #define POWER7_PME_PM_VSU_2FLOP 113 #define POWER7_PME_PM_GCT_FULL_CYC 114 #define POWER7_PME_PM_MRK_DATA_FROM_L3_CYC 115 #define POWER7_PME_PM_LSU_SRQ_S0_ALLOC 116 #define POWER7_PME_PM_MRK_DERAT_MISS_4K 117 #define POWER7_PME_PM_BR_MPRED_TA 118 #define POWER7_PME_PM_INST_PTEG_FROM_L2MISS 119 #define POWER7_PME_PM_DPU_HELD_POWER 120 #define POWER7_PME_PM_RUN_INST_CMPL 121 #define POWER7_PME_PM_MRK_VSU_FIN 122 #define POWER7_PME_PM_LSU_SRQ_S0_VALID 123 #define POWER7_PME_PM_GCT_EMPTY_CYC 124 #define POWER7_PME_PM_IOPS_DISP 125 #define POWER7_PME_PM_RUN_SPURR 126 #define POWER7_PME_PM_PTEG_FROM_L21_MOD 127 #define POWER7_PME_PM_VSU0_1FLOP 128 #define POWER7_PME_PM_SNOOP_TLBIE 129 #define POWER7_PME_PM_DATA_FROM_L3MISS 130 #define POWER7_PME_PM_VSU_SINGLE 131 #define POWER7_PME_PM_DTLB_MISS_16G 132 #define POWER7_PME_PM_CMPLU_STALL_VECTOR 133 #define POWER7_PME_PM_FLUSH 134 #define POWER7_PME_PM_L2_LD_HIT 135 #define POWER7_PME_PM_NEST_PAIR2_AND 136 #define POWER7_PME_PM_VSU1_1FLOP 137 #define POWER7_PME_PM_IC_PREF_REQ 138 #define POWER7_PME_PM_L3_LD_HIT 139 #define POWER7_PME_PM_GCT_NOSLOT_IC_MISS 140 #define POWER7_PME_PM_DISP_HELD 141 #define POWER7_PME_PM_L2_LD 142 #define POWER7_PME_PM_LSU_FLUSH_SRQ 143 #define POWER7_PME_PM_BC_PLUS_8_CONV 144 #define POWER7_PME_PM_MRK_DATA_FROM_L31_MOD_CYC 145 #define POWER7_PME_PM_CMPLU_STALL_VECTOR_LONG 146 #define POWER7_PME_PM_L2_RCST_BUSY_RC_FULL 147 #define POWER7_PME_PM_TB_BIT_TRANS 148 #define POWER7_PME_PM_THERMAL_MAX 149 #define POWER7_PME_PM_LSU1_FLUSH_ULD 150 #define POWER7_PME_PM_LSU1_REJECT_LHS 151 #define POWER7_PME_PM_LSU_LRQ_S0_ALLOC 152 #define POWER7_PME_PM_L3_CO_L31 153 #define POWER7_PME_PM_POWER_EVENT4 154 #define POWER7_PME_PM_DATA_FROM_L31_SHR 155 #define POWER7_PME_PM_BR_UNCOND 156 #define POWER7_PME_PM_LSU1_DC_PREF_STREAM_ALLOC 157 #define POWER7_PME_PM_PMC4_REWIND 158 #define POWER7_PME_PM_L2_RCLD_DISP 159 #define POWER7_PME_PM_THRD_PRIO_2_3_CYC 160 #define POWER7_PME_PM_MRK_PTEG_FROM_L2MISS 161 #define POWER7_PME_PM_IC_DEMAND_L2_BHT_REDIRECT 162 #define POWER7_PME_PM_LSU_DERAT_MISS 163 #define POWER7_PME_PM_IC_PREF_CANCEL_L2 164 #define POWER7_PME_PM_MRK_FIN_STALL_CYC_COUNT 165 #define POWER7_PME_PM_BR_PRED_CCACHE 166 #define POWER7_PME_PM_GCT_UTIL_1_TO_2_SLOTS 167 #define POWER7_PME_PM_MRK_ST_CMPL_INT 168 #define POWER7_PME_PM_LSU_TWO_TABLEWALK_CYC 169 #define POWER7_PME_PM_MRK_DATA_FROM_L3MISS 170 #define POWER7_PME_PM_GCT_NOSLOT_CYC 171 #define POWER7_PME_PM_LSU_SET_MPRED 172 #define POWER7_PME_PM_FLUSH_DISP_TLBIE 173 #define POWER7_PME_PM_VSU1_FCONV 174 #define POWER7_PME_PM_DERAT_MISS_16G 175 #define POWER7_PME_PM_INST_FROM_LMEM 176 #define POWER7_PME_PM_IC_DEMAND_L2_BR_REDIRECT 177 #define POWER7_PME_PM_CMPLU_STALL_SCALAR_LONG 178 #define POWER7_PME_PM_INST_PTEG_FROM_L2 179 #define POWER7_PME_PM_PTEG_FROM_L2 180 #define POWER7_PME_PM_MRK_DATA_FROM_L21_SHR_CYC 181 #define POWER7_PME_PM_MRK_DTLB_MISS_4K 182 #define POWER7_PME_PM_VSU0_FPSCR 183 #define POWER7_PME_PM_VSU1_VECT_DOUBLE_ISSUED 184 #define POWER7_PME_PM_MRK_PTEG_FROM_RL2L3_MOD 185 #define POWER7_PME_PM_MEM0_RQ_DISP 186 #define POWER7_PME_PM_L2_LD_MISS 187 #define POWER7_PME_PM_VMX_RESULT_SAT_1 188 #define POWER7_PME_PM_L1_PREF 189 #define POWER7_PME_PM_MRK_DATA_FROM_LMEM_CYC 190 #define POWER7_PME_PM_GRP_IC_MISS_NONSPEC 191 #define POWER7_PME_PM_PB_NODE_PUMP 192 #define POWER7_PME_PM_SHL_MERGED 193 #define POWER7_PME_PM_NEST_PAIR1_ADD 194 #define POWER7_PME_PM_DATA_FROM_L3 195 #define POWER7_PME_PM_LSU_FLUSH 196 #define POWER7_PME_PM_LSU_SRQ_SYNC_COUNT 197 #define POWER7_PME_PM_PMC2_OVERFLOW 198 #define POWER7_PME_PM_LSU_LDF 199 #define POWER7_PME_PM_POWER_EVENT3 200 #define POWER7_PME_PM_DISP_WT 201 #define POWER7_PME_PM_CMPLU_STALL_REJECT 202 #define POWER7_PME_PM_IC_BANK_CONFLICT 203 #define POWER7_PME_PM_BR_MPRED_CR_TA 204 #define POWER7_PME_PM_L2_INST_MISS 205 #define POWER7_PME_PM_CMPLU_STALL_ERAT_MISS 206 #define POWER7_PME_PM_NEST_PAIR2_ADD 207 #define POWER7_PME_PM_MRK_LSU_FLUSH 208 #define POWER7_PME_PM_L2_LDST 209 #define POWER7_PME_PM_INST_FROM_L31_SHR 210 #define POWER7_PME_PM_VSU0_FIN 211 #define POWER7_PME_PM_LARX_LSU 212 #define POWER7_PME_PM_INST_FROM_RMEM 213 #define POWER7_PME_PM_DISP_CLB_HELD_TLBIE 214 #define POWER7_PME_PM_MRK_DATA_FROM_DMEM_CYC 215 #define POWER7_PME_PM_BR_PRED_CR 216 #define POWER7_PME_PM_LSU_REJECT 217 #define POWER7_PME_PM_GCT_UTIL_3_TO_6_SLOTS 218 #define POWER7_PME_PM_CMPLU_STALL_END_GCT_NOSLOT 219 #define POWER7_PME_PM_LSU0_REJECT_LMQ_FULL 220 #define POWER7_PME_PM_VSU_FEST 221 #define POWER7_PME_PM_NEST_PAIR0_AND 222 #define POWER7_PME_PM_PTEG_FROM_L3 223 #define POWER7_PME_PM_POWER_EVENT2 224 #define POWER7_PME_PM_IC_PREF_CANCEL_PAGE 225 #define POWER7_PME_PM_VSU0_FSQRT_FDIV 226 #define POWER7_PME_PM_MRK_GRP_CMPL 227 #define POWER7_PME_PM_VSU0_SCAL_DOUBLE_ISSUED 228 #define POWER7_PME_PM_GRP_DISP 229 #define POWER7_PME_PM_LSU0_LDX 230 #define POWER7_PME_PM_DATA_FROM_L2 231 #define POWER7_PME_PM_MRK_DATA_FROM_RL2L3_MOD 232 #define POWER7_PME_PM_LD_REF_L1 233 #define POWER7_PME_PM_VSU0_VECT_DOUBLE_ISSUED 234 #define POWER7_PME_PM_VSU1_2FLOP_DOUBLE 235 #define POWER7_PME_PM_THRD_PRIO_6_7_CYC 236 #define POWER7_PME_PM_BC_PLUS_8_RSLV_TAKEN 237 #define POWER7_PME_PM_BR_MPRED_CR 238 #define POWER7_PME_PM_L3_CO_MEM 239 #define POWER7_PME_PM_LD_MISS_L1 240 #define POWER7_PME_PM_DATA_FROM_RL2L3_MOD 241 #define POWER7_PME_PM_LSU_SRQ_FULL_CYC 242 #define POWER7_PME_PM_TABLEWALK_CYC 243 #define POWER7_PME_PM_MRK_PTEG_FROM_RMEM 244 #define POWER7_PME_PM_LSU_SRQ_STFWD 245 #define POWER7_PME_PM_INST_PTEG_FROM_RMEM 246 #define POWER7_PME_PM_FXU0_FIN 247 #define POWER7_PME_PM_LSU1_L1_SW_PREF 248 #define POWER7_PME_PM_PTEG_FROM_L31_MOD 249 #define POWER7_PME_PM_PMC5_OVERFLOW 250 #define POWER7_PME_PM_LD_REF_L1_LSU1 251 #define POWER7_PME_PM_INST_PTEG_FROM_L21_SHR 252 #define POWER7_PME_PM_CMPLU_STALL_THRD 253 #define POWER7_PME_PM_DATA_FROM_RMEM 254 #define POWER7_PME_PM_VSU0_SCAL_SINGLE_ISSUED 255 #define POWER7_PME_PM_BR_MPRED_LSTACK 256 #define POWER7_PME_PM_MRK_DATA_FROM_RL2L3_MOD_CYC 257 #define POWER7_PME_PM_LSU0_FLUSH_UST 258 #define POWER7_PME_PM_LSU_NCST 259 #define POWER7_PME_PM_BR_TAKEN 260 #define POWER7_PME_PM_INST_PTEG_FROM_LMEM 261 #define POWER7_PME_PM_GCT_NOSLOT_BR_MPRED_IC_MISS 262 #define POWER7_PME_PM_DTLB_MISS_4K 263 #define POWER7_PME_PM_PMC4_SAVED 264 #define POWER7_PME_PM_VSU1_PERMUTE_ISSUED 265 #define POWER7_PME_PM_SLB_MISS 266 #define POWER7_PME_PM_LSU1_FLUSH_LRQ 267 #define POWER7_PME_PM_DTLB_MISS 268 #define POWER7_PME_PM_VSU1_FRSP 269 #define POWER7_PME_PM_VSU_VECTOR_DOUBLE_ISSUED 270 #define POWER7_PME_PM_L2_CASTOUT_SHR 271 #define POWER7_PME_PM_DATA_FROM_DL2L3_SHR 272 #define POWER7_PME_PM_VSU1_STF 273 #define POWER7_PME_PM_ST_FIN 274 #define POWER7_PME_PM_PTEG_FROM_L21_SHR 275 #define POWER7_PME_PM_L2_LOC_GUESS_WRONG 276 #define POWER7_PME_PM_MRK_STCX_FAIL 277 #define POWER7_PME_PM_LSU0_REJECT_LHS 278 #define POWER7_PME_PM_IC_PREF_CANCEL_HIT 279 #define POWER7_PME_PM_L3_PREF_BUSY 280 #define POWER7_PME_PM_MRK_BRU_FIN 281 #define POWER7_PME_PM_LSU1_NCLD 282 #define POWER7_PME_PM_INST_PTEG_FROM_L31_MOD 283 #define POWER7_PME_PM_LSU_NCLD 284 #define POWER7_PME_PM_LSU_LDX 285 #define POWER7_PME_PM_L2_LOC_GUESS_CORRECT 286 #define POWER7_PME_PM_THRESH_TIMEO 287 #define POWER7_PME_PM_L3_PREF_ST 288 #define POWER7_PME_PM_DISP_CLB_HELD_SYNC 289 #define POWER7_PME_PM_VSU_SIMPLE_ISSUED 290 #define POWER7_PME_PM_VSU1_SINGLE 291 #define POWER7_PME_PM_DATA_TABLEWALK_CYC 292 #define POWER7_PME_PM_L2_RC_ST_DONE 293 #define POWER7_PME_PM_MRK_PTEG_FROM_L21_MOD 294 #define POWER7_PME_PM_LARX_LSU1 295 #define POWER7_PME_PM_MRK_DATA_FROM_RMEM 296 #define POWER7_PME_PM_DISP_CLB_HELD 297 #define POWER7_PME_PM_DERAT_MISS_4K 298 #define POWER7_PME_PM_L2_RCLD_DISP_FAIL_ADDR 299 #define POWER7_PME_PM_SEG_EXCEPTION 300 #define POWER7_PME_PM_FLUSH_DISP_SB 301 #define POWER7_PME_PM_L2_DC_INV 302 #define POWER7_PME_PM_PTEG_FROM_DL2L3_MOD 303 #define POWER7_PME_PM_DSEG 304 #define POWER7_PME_PM_BR_PRED_LSTACK 305 #define POWER7_PME_PM_VSU0_STF 306 #define POWER7_PME_PM_LSU_FX_FIN 307 #define POWER7_PME_PM_DERAT_MISS_16M 308 #define POWER7_PME_PM_MRK_PTEG_FROM_DL2L3_MOD 309 #define POWER7_PME_PM_GCT_UTIL_11_PLUS_SLOTS 310 #define POWER7_PME_PM_INST_FROM_L3 311 #define POWER7_PME_PM_MRK_IFU_FIN 312 #define POWER7_PME_PM_ITLB_MISS 313 #define POWER7_PME_PM_VSU_STF 314 #define POWER7_PME_PM_LSU_FLUSH_UST 315 #define POWER7_PME_PM_L2_LDST_MISS 316 #define POWER7_PME_PM_FXU1_FIN 317 #define POWER7_PME_PM_SHL_DEALLOCATED 318 #define POWER7_PME_PM_L2_SN_M_WR_DONE 319 #define POWER7_PME_PM_LSU_REJECT_SET_MPRED 320 #define POWER7_PME_PM_L3_PREF_LD 321 #define POWER7_PME_PM_L2_SN_M_RD_DONE 322 #define POWER7_PME_PM_MRK_DERAT_MISS_16G 323 #define POWER7_PME_PM_VSU_FCONV 324 #define POWER7_PME_PM_ANY_THRD_RUN_CYC 325 #define POWER7_PME_PM_LSU_LMQ_FULL_CYC 326 #define POWER7_PME_PM_MRK_LSU_REJECT_LHS 327 #define POWER7_PME_PM_MRK_LD_MISS_L1_CYC 328 #define POWER7_PME_PM_MRK_DATA_FROM_L2_CYC 329 #define POWER7_PME_PM_INST_IMC_MATCH_DISP 330 #define POWER7_PME_PM_MRK_DATA_FROM_RMEM_CYC 331 #define POWER7_PME_PM_VSU0_SIMPLE_ISSUED 332 #define POWER7_PME_PM_CMPLU_STALL_DIV 333 #define POWER7_PME_PM_MRK_PTEG_FROM_RL2L3_SHR 334 #define POWER7_PME_PM_VSU_FMA_DOUBLE 335 #define POWER7_PME_PM_VSU_4FLOP 336 #define POWER7_PME_PM_VSU1_FIN 337 #define POWER7_PME_PM_NEST_PAIR1_AND 338 #define POWER7_PME_PM_INST_PTEG_FROM_RL2L3_MOD 339 #define POWER7_PME_PM_RUN_CYC 340 #define POWER7_PME_PM_PTEG_FROM_RMEM 341 #define POWER7_PME_PM_LSU_LRQ_S0_VALID 342 #define POWER7_PME_PM_LSU0_LDF 343 #define POWER7_PME_PM_FLUSH_COMPLETION 344 #define POWER7_PME_PM_ST_MISS_L1 345 #define POWER7_PME_PM_L2_NODE_PUMP 346 #define POWER7_PME_PM_INST_FROM_DL2L3_SHR 347 #define POWER7_PME_PM_MRK_STALL_CMPLU_CYC 348 #define POWER7_PME_PM_VSU1_DENORM 349 #define POWER7_PME_PM_MRK_DATA_FROM_L31_SHR_CYC 350 #define POWER7_PME_PM_NEST_PAIR0_ADD 351 #define POWER7_PME_PM_INST_FROM_L3MISS 352 #define POWER7_PME_PM_EE_OFF_EXT_INT 353 #define POWER7_PME_PM_INST_PTEG_FROM_DMEM 354 #define POWER7_PME_PM_INST_FROM_DL2L3_MOD 355 #define POWER7_PME_PM_PMC6_OVERFLOW 356 #define POWER7_PME_PM_VSU_2FLOP_DOUBLE 357 #define POWER7_PME_PM_TLB_MISS 358 #define POWER7_PME_PM_FXU_BUSY 359 #define POWER7_PME_PM_L2_RCLD_DISP_FAIL_OTHER 360 #define POWER7_PME_PM_LSU_REJECT_LMQ_FULL 361 #define POWER7_PME_PM_IC_RELOAD_SHR 362 #define POWER7_PME_PM_GRP_MRK 363 #define POWER7_PME_PM_MRK_ST_NEST 364 #define POWER7_PME_PM_VSU1_FSQRT_FDIV 365 #define POWER7_PME_PM_LSU0_FLUSH_LRQ 366 #define POWER7_PME_PM_LARX_LSU0 367 #define POWER7_PME_PM_IBUF_FULL_CYC 368 #define POWER7_PME_PM_MRK_DATA_FROM_DL2L3_SHR_CYC 369 #define POWER7_PME_PM_LSU_DC_PREF_STREAM_ALLOC 370 #define POWER7_PME_PM_GRP_MRK_CYC 371 #define POWER7_PME_PM_MRK_DATA_FROM_RL2L3_SHR_CYC 372 #define POWER7_PME_PM_L2_GLOB_GUESS_CORRECT 373 #define POWER7_PME_PM_LSU_REJECT_LHS 374 #define POWER7_PME_PM_MRK_DATA_FROM_LMEM 375 #define POWER7_PME_PM_INST_PTEG_FROM_L3 376 #define POWER7_PME_PM_FREQ_DOWN 377 #define POWER7_PME_PM_PB_RETRY_NODE_PUMP 378 #define POWER7_PME_PM_INST_FROM_RL2L3_SHR 379 #define POWER7_PME_PM_MRK_INST_ISSUED 380 #define POWER7_PME_PM_PTEG_FROM_L3MISS 381 #define POWER7_PME_PM_RUN_PURR 382 #define POWER7_PME_PM_MRK_GRP_IC_MISS 383 #define POWER7_PME_PM_MRK_DATA_FROM_L3 384 #define POWER7_PME_PM_CMPLU_STALL_DCACHE_MISS 385 #define POWER7_PME_PM_PTEG_FROM_RL2L3_SHR 386 #define POWER7_PME_PM_LSU_FLUSH_LRQ 387 #define POWER7_PME_PM_MRK_DERAT_MISS_64K 388 #define POWER7_PME_PM_INST_PTEG_FROM_DL2L3_MOD 389 #define POWER7_PME_PM_L2_ST_MISS 390 #define POWER7_PME_PM_MRK_PTEG_FROM_L21_SHR 391 #define POWER7_PME_PM_LWSYNC 392 #define POWER7_PME_PM_LSU0_DC_PREF_STREAM_CONFIRM_STRIDE 393 #define POWER7_PME_PM_MRK_LSU_FLUSH_LRQ 394 #define POWER7_PME_PM_INST_IMC_MATCH_CMPL 395 #define POWER7_PME_PM_NEST_PAIR3_AND 396 #define POWER7_PME_PM_PB_RETRY_SYS_PUMP 397 #define POWER7_PME_PM_MRK_INST_FIN 398 #define POWER7_PME_PM_MRK_PTEG_FROM_DL2L3_SHR 399 #define POWER7_PME_PM_INST_FROM_L31_MOD 400 #define POWER7_PME_PM_MRK_DTLB_MISS_64K 401 #define POWER7_PME_PM_LSU_FIN 402 #define POWER7_PME_PM_MRK_LSU_REJECT 403 #define POWER7_PME_PM_L2_CO_FAIL_BUSY 404 #define POWER7_PME_PM_MEM0_WQ_DISP 405 #define POWER7_PME_PM_DATA_FROM_L31_MOD 406 #define POWER7_PME_PM_THERMAL_WARN 407 #define POWER7_PME_PM_VSU0_4FLOP 408 #define POWER7_PME_PM_BR_MPRED_CCACHE 409 #define POWER7_PME_PM_CMPLU_STALL_IFU 410 #define POWER7_PME_PM_L1_DEMAND_WRITE 411 #define POWER7_PME_PM_FLUSH_BR_MPRED 412 #define POWER7_PME_PM_MRK_DTLB_MISS_16G 413 #define POWER7_PME_PM_MRK_PTEG_FROM_DMEM 414 #define POWER7_PME_PM_L2_RCST_DISP 415 #define POWER7_PME_PM_CMPLU_STALL 416 #define POWER7_PME_PM_LSU_PARTIAL_CDF 417 #define POWER7_PME_PM_DISP_CLB_HELD_SB 418 #define POWER7_PME_PM_VSU0_FMA_DOUBLE 419 #define POWER7_PME_PM_FXU0_BUSY_FXU1_IDLE 420 #define POWER7_PME_PM_IC_DEMAND_CYC 421 #define POWER7_PME_PM_MRK_DATA_FROM_L21_SHR 422 #define POWER7_PME_PM_MRK_LSU_FLUSH_UST 423 #define POWER7_PME_PM_INST_PTEG_FROM_L3MISS 424 #define POWER7_PME_PM_VSU_DENORM 425 #define POWER7_PME_PM_MRK_LSU_PARTIAL_CDF 426 #define POWER7_PME_PM_INST_FROM_L21_SHR 427 #define POWER7_PME_PM_IC_PREF_WRITE 428 #define POWER7_PME_PM_BR_PRED 429 #define POWER7_PME_PM_INST_FROM_DMEM 430 #define POWER7_PME_PM_IC_PREF_CANCEL_ALL 431 #define POWER7_PME_PM_LSU_DC_PREF_STREAM_CONFIRM 432 #define POWER7_PME_PM_MRK_LSU_FLUSH_SRQ 433 #define POWER7_PME_PM_MRK_FIN_STALL_CYC 434 #define POWER7_PME_PM_L2_RCST_DISP_FAIL_OTHER 435 #define POWER7_PME_PM_VSU1_DD_ISSUED 436 #define POWER7_PME_PM_PTEG_FROM_L31_SHR 437 #define POWER7_PME_PM_DATA_FROM_L21_SHR 438 #define POWER7_PME_PM_LSU0_NCLD 439 #define POWER7_PME_PM_VSU1_4FLOP 440 #define POWER7_PME_PM_VSU1_8FLOP 441 #define POWER7_PME_PM_VSU_8FLOP 442 #define POWER7_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 443 #define POWER7_PME_PM_DTLB_MISS_64K 444 #define POWER7_PME_PM_THRD_CONC_RUN_INST 445 #define POWER7_PME_PM_MRK_PTEG_FROM_L2 446 #define POWER7_PME_PM_PB_SYS_PUMP 447 #define POWER7_PME_PM_VSU_FIN 448 #define POWER7_PME_PM_MRK_DATA_FROM_L31_MOD 449 #define POWER7_PME_PM_THRD_PRIO_0_1_CYC 450 #define POWER7_PME_PM_DERAT_MISS_64K 451 #define POWER7_PME_PM_PMC2_REWIND 452 #define POWER7_PME_PM_INST_FROM_L2 453 #define POWER7_PME_PM_GRP_BR_MPRED_NONSPEC 454 #define POWER7_PME_PM_INST_DISP 455 #define POWER7_PME_PM_MEM0_RD_CANCEL_TOTAL 456 #define POWER7_PME_PM_LSU0_DC_PREF_STREAM_CONFIRM 457 #define POWER7_PME_PM_L1_DCACHE_RELOAD_VALID 458 #define POWER7_PME_PM_VSU_SCALAR_DOUBLE_ISSUED 459 #define POWER7_PME_PM_L3_PREF_HIT 460 #define POWER7_PME_PM_MRK_PTEG_FROM_L31_MOD 461 #define POWER7_PME_PM_CMPLU_STALL_STORE 462 #define POWER7_PME_PM_MRK_FXU_FIN 463 #define POWER7_PME_PM_PMC4_OVERFLOW 464 #define POWER7_PME_PM_MRK_PTEG_FROM_L3 465 #define POWER7_PME_PM_LSU0_LMQ_LHR_MERGE 466 #define POWER7_PME_PM_BTAC_HIT 467 #define POWER7_PME_PM_L3_RD_BUSY 468 #define POWER7_PME_PM_LSU0_L1_SW_PREF 469 #define POWER7_PME_PM_INST_FROM_L2MISS 470 #define POWER7_PME_PM_LSU0_DC_PREF_STREAM_ALLOC 471 #define POWER7_PME_PM_L2_ST 472 #define POWER7_PME_PM_VSU0_DENORM 473 #define POWER7_PME_PM_MRK_DATA_FROM_DL2L3_SHR 474 #define POWER7_PME_PM_BR_PRED_CR_TA 475 #define POWER7_PME_PM_VSU0_FCONV 476 #define POWER7_PME_PM_MRK_LSU_FLUSH_ULD 477 #define POWER7_PME_PM_BTAC_MISS 478 #define POWER7_PME_PM_MRK_LD_MISS_EXPOSED_CYC_COUNT 479 #define POWER7_PME_PM_MRK_DATA_FROM_L2 480 #define POWER7_PME_PM_LSU_DCACHE_RELOAD_VALID 481 #define POWER7_PME_PM_VSU_FMA 482 #define POWER7_PME_PM_LSU0_FLUSH_SRQ 483 #define POWER7_PME_PM_LSU1_L1_PREF 484 #define POWER7_PME_PM_IOPS_CMPL 485 #define POWER7_PME_PM_L2_SYS_PUMP 486 #define POWER7_PME_PM_L2_RCLD_BUSY_RC_FULL 487 #define POWER7_PME_PM_LSU_LMQ_S0_ALLOC 488 #define POWER7_PME_PM_FLUSH_DISP_SYNC 489 #define POWER7_PME_PM_MRK_DATA_FROM_DL2L3_MOD_CYC 490 #define POWER7_PME_PM_L2_IC_INV 491 #define POWER7_PME_PM_MRK_DATA_FROM_L21_MOD_CYC 492 #define POWER7_PME_PM_L3_PREF_LDST 493 #define POWER7_PME_PM_LSU_SRQ_EMPTY_CYC 494 #define POWER7_PME_PM_LSU_LMQ_S0_VALID 495 #define POWER7_PME_PM_FLUSH_PARTIAL 496 #define POWER7_PME_PM_VSU1_FMA_DOUBLE 497 #define POWER7_PME_PM_1PLUS_PPC_DISP 498 #define POWER7_PME_PM_DATA_FROM_L2MISS 499 #define POWER7_PME_PM_SUSPENDED 500 #define POWER7_PME_PM_VSU0_FMA 501 #define POWER7_PME_PM_CMPLU_STALL_SCALAR 502 #define POWER7_PME_PM_STCX_FAIL 503 #define POWER7_PME_PM_VSU0_FSQRT_FDIV_DOUBLE 504 #define POWER7_PME_PM_DC_PREF_DST 505 #define POWER7_PME_PM_VSU1_SCAL_SINGLE_ISSUED 506 #define POWER7_PME_PM_L3_HIT 507 #define POWER7_PME_PM_L2_GLOB_GUESS_WRONG 508 #define POWER7_PME_PM_MRK_DFU_FIN 509 #define POWER7_PME_PM_INST_FROM_L1 510 #define POWER7_PME_PM_BRU_FIN 511 #define POWER7_PME_PM_IC_DEMAND_REQ 512 #define POWER7_PME_PM_VSU1_FSQRT_FDIV_DOUBLE 513 #define POWER7_PME_PM_VSU1_FMA 514 #define POWER7_PME_PM_MRK_LD_MISS_L1 515 #define POWER7_PME_PM_VSU0_2FLOP_DOUBLE 516 #define POWER7_PME_PM_LSU_DC_PREF_STRIDED_STREAM_CONFIRM 517 #define POWER7_PME_PM_INST_PTEG_FROM_L31_SHR 518 #define POWER7_PME_PM_MRK_LSU_REJECT_ERAT_MISS 519 #define POWER7_PME_PM_MRK_DATA_FROM_L2MISS 520 #define POWER7_PME_PM_DATA_FROM_RL2L3_SHR 521 #define POWER7_PME_PM_INST_FROM_PREF 522 #define POWER7_PME_PM_VSU1_SQ 523 #define POWER7_PME_PM_L2_LD_DISP 524 #define POWER7_PME_PM_L2_DISP_ALL 525 #define POWER7_PME_PM_THRD_GRP_CMPL_BOTH_CYC 526 #define POWER7_PME_PM_VSU_FSQRT_FDIV_DOUBLE 527 #define POWER7_PME_PM_BR_MPRED 528 #define POWER7_PME_PM_INST_PTEG_FROM_DL2L3_SHR 529 #define POWER7_PME_PM_VSU_1FLOP 530 #define POWER7_PME_PM_HV_CYC 531 #define POWER7_PME_PM_MRK_LSU_FIN 532 #define POWER7_PME_PM_MRK_DATA_FROM_RL2L3_SHR 533 #define POWER7_PME_PM_DTLB_MISS_16M 534 #define POWER7_PME_PM_LSU1_LMQ_LHR_MERGE 535 #define POWER7_PME_PM_IFU_FIN 536 #define POWER7_PME_PM_1THRD_CON_RUN_INSTR 537 #define POWER7_PME_PM_CMPLU_STALL_COUNT 538 #define POWER7_PME_PM_MEM0_PB_RD_CL 539 #define POWER7_PME_PM_THRD_1_RUN_CYC 540 #define POWER7_PME_PM_THRD_2_CONC_RUN_INSTR 541 #define POWER7_PME_PM_THRD_2_RUN_CYC 542 #define POWER7_PME_PM_THRD_3_CONC_RUN_INST 543 #define POWER7_PME_PM_THRD_3_RUN_CYC 544 #define POWER7_PME_PM_THRD_4_CONC_RUN_INST 545 #define POWER7_PME_PM_THRD_4_RUN_CYC 546 static const pme_power_entry_t power7_pe[] = { [ POWER7_PME_PM_IC_DEMAND_L2_BR_ALL ] = { .pme_name = "PM_IC_DEMAND_L2_BR_ALL", .pme_code = 0x4898, .pme_short_desc = " L2 I cache demand request due to BHT or redirect", .pme_long_desc = " L2 I cache demand request due to BHT or redirect", }, [ POWER7_PME_PM_GCT_UTIL_7_TO_10_SLOTS ] = { .pme_name = "PM_GCT_UTIL_7_TO_10_SLOTS", .pme_code = 0x20a0, .pme_short_desc = "GCT Utilization 7-10 entries", .pme_long_desc = "GCT Utilization 7-10 entries", }, [ POWER7_PME_PM_PMC2_SAVED ] = { .pme_name = "PM_PMC2_SAVED", .pme_code = 0x10022, .pme_short_desc = "PMC2 Rewind Value saved", .pme_long_desc = "PMC2 was counting speculatively. The speculative condition was met and the counter value was committed by copying it to the backup register.", }, [ POWER7_PME_PM_CMPLU_STALL_DFU ] = { .pme_name = "PM_CMPLU_STALL_DFU", .pme_code = 0x2003c, .pme_short_desc = "Completion stall caused by Decimal Floating Point Unit", .pme_long_desc = "Completion stall caused by Decimal Floating Point Unit", }, [ POWER7_PME_PM_VSU0_16FLOP ] = { .pme_name = "PM_VSU0_16FLOP", .pme_code = 0xa0a4, .pme_short_desc = "Sixteen flops operation (SP vector versions of fdiv,fsqrt) ", .pme_long_desc = "Sixteen flops operation (SP vector versions of fdiv,fsqrt) ", }, [ POWER7_PME_PM_MRK_LSU_DERAT_MISS ] = { .pme_name = "PM_MRK_LSU_DERAT_MISS", .pme_code = 0x3d05a, .pme_short_desc = "Marked DERAT Miss", .pme_long_desc = "Marked DERAT Miss", }, [ POWER7_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x10034, .pme_short_desc = "marked store finished (was complete)", .pme_long_desc = "A sampled store has completed (data home)", }, [ POWER7_PME_PM_NEST_PAIR3_ADD ] = { .pme_name = "PM_NEST_PAIR3_ADD", .pme_code = 0x40881, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair3 ADD", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair3 ADD", }, [ POWER7_PME_PM_L2_ST_DISP ] = { .pme_name = "PM_L2_ST_DISP", .pme_code = 0x46180, .pme_short_desc = "All successful store dispatches", .pme_long_desc = "All successful store dispatches", }, [ POWER7_PME_PM_L2_CASTOUT_MOD ] = { .pme_name = "PM_L2_CASTOUT_MOD", .pme_code = 0x16180, .pme_short_desc = "L2 Castouts - Modified (M, Mu, Me)", .pme_long_desc = "An L2 line in the Modified state was castout. Total for all slices.", }, [ POWER7_PME_PM_ISEG ] = { .pme_name = "PM_ISEG", .pme_code = 0x20a4, .pme_short_desc = "ISEG Exception", .pme_long_desc = "ISEG Exception", }, [ POWER7_PME_PM_MRK_INST_TIMEO ] = { .pme_name = "PM_MRK_INST_TIMEO", .pme_code = 0x40034, .pme_short_desc = "marked Instruction finish timeout ", .pme_long_desc = "The number of instructions finished since the last progress indicator from a marked instruction exceeded the threshold. The marked instruction was flushed.", }, [ POWER7_PME_PM_L2_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2_RCST_DISP_FAIL_ADDR", .pme_code = 0x36282, .pme_short_desc = " L2 RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = " L2 RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", }, [ POWER7_PME_PM_LSU1_DC_PREF_STREAM_CONFIRM ] = { .pme_name = "PM_LSU1_DC_PREF_STREAM_CONFIRM", .pme_code = 0xd0b6, .pme_short_desc = "LS1 'Dcache prefetch stream confirmed", .pme_long_desc = "LS1 'Dcache prefetch stream confirmed", }, [ POWER7_PME_PM_IERAT_WR_64K ] = { .pme_name = "PM_IERAT_WR_64K", .pme_code = 0x40be, .pme_short_desc = "large page 64k ", .pme_long_desc = "large page 64k ", }, [ POWER7_PME_PM_MRK_DTLB_MISS_16M ] = { .pme_name = "PM_MRK_DTLB_MISS_16M", .pme_code = 0x4d05e, .pme_short_desc = "Marked Data TLB misses for 16M page", .pme_long_desc = "Data TLB references to 16M pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_IERAT_MISS ] = { .pme_name = "PM_IERAT_MISS", .pme_code = 0x100f6, .pme_short_desc = "IERAT Miss (Not implemented as DI on POWER6)", .pme_long_desc = "A translation request missed the Instruction Effective to Real Address Translation (ERAT) table", }, [ POWER7_PME_PM_MRK_PTEG_FROM_LMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_LMEM", .pme_code = 0x4d052, .pme_short_desc = "Marked PTEG loaded from local memory", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from memory attached to the same module this proccessor is located on due to a marked load or store.", }, [ POWER7_PME_PM_FLOP ] = { .pme_name = "PM_FLOP", .pme_code = 0x100f4, .pme_short_desc = "Floating Point Operation Finished", .pme_long_desc = "A floating point operation has completed", }, [ POWER7_PME_PM_THRD_PRIO_4_5_CYC ] = { .pme_name = "PM_THRD_PRIO_4_5_CYC", .pme_code = 0x40b4, .pme_short_desc = " Cycles thread running at priority level 4 or 5", .pme_long_desc = " Cycles thread running at priority level 4 or 5", }, [ POWER7_PME_PM_BR_PRED_TA ] = { .pme_name = "PM_BR_PRED_TA", .pme_code = 0x40aa, .pme_short_desc = "Branch predict - target address", .pme_long_desc = "The target address of a branch instruction was predicted.", }, [ POWER7_PME_PM_CMPLU_STALL_FXU ] = { .pme_name = "PM_CMPLU_STALL_FXU", .pme_code = 0x20014, .pme_short_desc = "Completion stall caused by FXU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point instruction.", }, [ POWER7_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x200f8, .pme_short_desc = "external interrupt", .pme_long_desc = "An interrupt due to an external exception occurred", }, [ POWER7_PME_PM_VSU_FSQRT_FDIV ] = { .pme_name = "PM_VSU_FSQRT_FDIV", .pme_code = 0xa888, .pme_short_desc = "four flops operation (fdiv,fsqrt) Scalar Instructions only!", .pme_long_desc = "DP vector versions of fdiv,fsqrt ", }, [ POWER7_PME_PM_MRK_LD_MISS_EXPOSED_CYC ] = { .pme_name = "PM_MRK_LD_MISS_EXPOSED_CYC", .pme_code = 0x1003e, .pme_short_desc = "Marked Load exposed Miss ", .pme_long_desc = "Marked Load exposed Miss ", }, [ POWER7_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0xc086, .pme_short_desc = "LS1 Scalar Loads ", .pme_long_desc = "A floating point load was executed by LSU1", }, [ POWER7_PME_PM_IC_WRITE_ALL ] = { .pme_name = "PM_IC_WRITE_ALL", .pme_code = 0x488c, .pme_short_desc = "Icache sectors written, prefetch + demand", .pme_long_desc = "Icache sectors written, prefetch + demand", }, [ POWER7_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0xc0a0, .pme_short_desc = "LS0 SRQ forwarded data to a load", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER7_PME_PM_PTEG_FROM_RL2L3_MOD ] = { .pme_name = "PM_PTEG_FROM_RL2L3_MOD", .pme_code = 0x1c052, .pme_short_desc = "PTEG loaded from remote L2 or L3 modified", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with modified (M) data from an L2 or L3 on a remote module due to a demand load or store.", }, [ POWER7_PME_PM_MRK_DATA_FROM_L31_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L31_SHR", .pme_code = 0x1d04e, .pme_short_desc = "Marked data loaded from another L3 on same chip shared", .pme_long_desc = "Marked data loaded from another L3 on same chip shared", }, [ POWER7_PME_PM_DATA_FROM_L21_MOD ] = { .pme_name = "PM_DATA_FROM_L21_MOD", .pme_code = 0x3c046, .pme_short_desc = "Data loaded from another L2 on same chip modified", .pme_long_desc = "Data loaded from another L2 on same chip modified", }, [ POWER7_PME_PM_VSU1_SCAL_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU1_SCAL_DOUBLE_ISSUED", .pme_code = 0xb08a, .pme_short_desc = "Double Precision scalar instruction issued on Pipe1", .pme_long_desc = "Double Precision scalar instruction issued on Pipe1", }, [ POWER7_PME_PM_VSU0_8FLOP ] = { .pme_name = "PM_VSU0_8FLOP", .pme_code = 0xa0a0, .pme_short_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", .pme_long_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", }, [ POWER7_PME_PM_POWER_EVENT1 ] = { .pme_name = "PM_POWER_EVENT1", .pme_code = 0x1006e, .pme_short_desc = "Power Management Event 1", .pme_long_desc = "Power Management Event 1", }, [ POWER7_PME_PM_DISP_CLB_HELD_BAL ] = { .pme_name = "PM_DISP_CLB_HELD_BAL", .pme_code = 0x2092, .pme_short_desc = "Dispatch/CLB Hold: Balance", .pme_long_desc = "Dispatch/CLB Hold: Balance", }, [ POWER7_PME_PM_VSU1_2FLOP ] = { .pme_name = "PM_VSU1_2FLOP", .pme_code = 0xa09a, .pme_short_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", .pme_long_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", }, [ POWER7_PME_PM_LWSYNC_HELD ] = { .pme_name = "PM_LWSYNC_HELD", .pme_code = 0x209a, .pme_short_desc = "LWSYNC held at dispatch", .pme_long_desc = "Cycles a LWSYNC instruction was held at dispatch. LWSYNC instructions are held at dispatch until all previous loads are done and all previous stores have issued. LWSYNC enters the Store Request Queue and is sent to the storage subsystem but does not wait for a response.", }, [ POWER7_PME_PM_PTEG_FROM_DL2L3_SHR ] = { .pme_name = "PM_PTEG_FROM_DL2L3_SHR", .pme_code = 0x3c054, .pme_short_desc = "PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with shared (T or SL) data from an L2 or L3 on a remote module due to a demand load or store.", }, [ POWER7_PME_PM_INST_FROM_L21_MOD ] = { .pme_name = "PM_INST_FROM_L21_MOD", .pme_code = 0x34046, .pme_short_desc = "Instruction fetched from another L2 on same chip modified", .pme_long_desc = "Instruction fetched from another L2 on same chip modified", }, [ POWER7_PME_PM_IERAT_XLATE_WR_16MPLUS ] = { .pme_name = "PM_IERAT_XLATE_WR_16MPLUS", .pme_code = 0x40bc, .pme_short_desc = "large page 16M+", .pme_long_desc = "large page 16M+", }, [ POWER7_PME_PM_IC_REQ_ALL ] = { .pme_name = "PM_IC_REQ_ALL", .pme_code = 0x4888, .pme_short_desc = "Icache requests, prefetch + demand", .pme_long_desc = "Icache requests, prefetch + demand", }, [ POWER7_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0xd090, .pme_short_desc = "Data SLB Miss - Total of all segment sizes", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve.", }, [ POWER7_PME_PM_L3_MISS ] = { .pme_name = "PM_L3_MISS", .pme_code = 0x1f082, .pme_short_desc = "L3 Misses ", .pme_long_desc = "L3 Misses ", }, [ POWER7_PME_PM_LSU0_L1_PREF ] = { .pme_name = "PM_LSU0_L1_PREF", .pme_code = 0xd0b8, .pme_short_desc = " LS0 L1 cache data prefetches", .pme_long_desc = " LS0 L1 cache data prefetches", }, [ POWER7_PME_PM_VSU_SCALAR_SINGLE_ISSUED ] = { .pme_name = "PM_VSU_SCALAR_SINGLE_ISSUED", .pme_code = 0xb884, .pme_short_desc = "Single Precision scalar instruction issued on Pipe0", .pme_long_desc = "Single Precision scalar instruction issued on Pipe0", }, [ POWER7_PME_PM_LSU1_DC_PREF_STREAM_CONFIRM_STRIDE ] = { .pme_name = "PM_LSU1_DC_PREF_STREAM_CONFIRM_STRIDE", .pme_code = 0xd0be, .pme_short_desc = "LS1 Dcache Strided prefetch stream confirmed", .pme_long_desc = "LS1 Dcache Strided prefetch stream confirmed", }, [ POWER7_PME_PM_L2_INST ] = { .pme_name = "PM_L2_INST", .pme_code = 0x36080, .pme_short_desc = "Instruction Load Count", .pme_long_desc = "Instruction Load Count", }, [ POWER7_PME_PM_VSU0_FRSP ] = { .pme_name = "PM_VSU0_FRSP", .pme_code = 0xa0b4, .pme_short_desc = "Round to single precision instruction executed", .pme_long_desc = "Round to single precision instruction executed", }, [ POWER7_PME_PM_FLUSH_DISP ] = { .pme_name = "PM_FLUSH_DISP", .pme_code = 0x2082, .pme_short_desc = "Dispatch flush", .pme_long_desc = "Dispatch flush", }, [ POWER7_PME_PM_PTEG_FROM_L2MISS ] = { .pme_name = "PM_PTEG_FROM_L2MISS", .pme_code = 0x4c058, .pme_short_desc = "PTEG loaded from L2 miss", .pme_long_desc = "A Page Table Entry was loaded into the TLB but not from the local L2.", }, [ POWER7_PME_PM_VSU1_DQ_ISSUED ] = { .pme_name = "PM_VSU1_DQ_ISSUED", .pme_code = 0xb09a, .pme_short_desc = "128BIT Decimal Issued on Pipe1", .pme_long_desc = "128BIT Decimal Issued on Pipe1", }, [ POWER7_PME_PM_CMPLU_STALL_LSU ] = { .pme_name = "PM_CMPLU_STALL_LSU", .pme_code = 0x20012, .pme_short_desc = "Completion stall caused by LSU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a load/store instruction.", }, [ POWER7_PME_PM_MRK_DATA_FROM_DMEM ] = { .pme_name = "PM_MRK_DATA_FROM_DMEM", .pme_code = 0x1d04a, .pme_short_desc = "Marked data loaded from distant memory", .pme_long_desc = "The processor's Data Cache was reloaded with data from memory attached to a distant module due to a marked load.", }, [ POWER7_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0xc8b0, .pme_short_desc = "Flush: Unaligned Load", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1). Combined Unit 0 + 1.", }, [ POWER7_PME_PM_PTEG_FROM_LMEM ] = { .pme_name = "PM_PTEG_FROM_LMEM", .pme_code = 0x4c052, .pme_short_desc = "PTEG loaded from local memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to the same module this proccessor is located on.", }, [ POWER7_PME_PM_MRK_DERAT_MISS_16M ] = { .pme_name = "PM_MRK_DERAT_MISS_16M", .pme_code = 0x3d05c, .pme_short_desc = "Marked DERAT misses for 16M page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 16M page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_THRD_ALL_RUN_CYC ] = { .pme_name = "PM_THRD_ALL_RUN_CYC", .pme_code = 0x2000c, .pme_short_desc = "All Threads in run_cycles", .pme_long_desc = "Cycles when all threads had their run latches set. Operating systems use the run latch to indicate when they are doing useful work.", }, [ POWER7_PME_PM_MEM0_PREFETCH_DISP ] = { .pme_name = "PM_MEM0_PREFETCH_DISP", .pme_code = 0x20083, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair1 Bit1", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair1 Bit1", }, [ POWER7_PME_PM_MRK_STALL_CMPLU_CYC_COUNT ] = { .pme_name = "PM_MRK_STALL_CMPLU_CYC_COUNT", .pme_code = 0x3003f, .pme_short_desc = "Marked Group Completion Stall cycles (use edge detect to count #)", .pme_long_desc = "Marked Group Completion Stall cycles (use edge detect to count #)", }, [ POWER7_PME_PM_DATA_FROM_DL2L3_MOD ] = { .pme_name = "PM_DATA_FROM_DL2L3_MOD", .pme_code = 0x3c04c, .pme_short_desc = "Data loaded from distant L2 or L3 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from an L2 or L3 on a distant module due to a demand load", }, [ POWER7_PME_PM_VSU_FRSP ] = { .pme_name = "PM_VSU_FRSP", .pme_code = 0xa8b4, .pme_short_desc = "Round to single precision instruction executed", .pme_long_desc = "Round to single precision instruction executed", }, [ POWER7_PME_PM_MRK_DATA_FROM_L21_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L21_MOD", .pme_code = 0x3d046, .pme_short_desc = "Marked data loaded from another L2 on same chip modified", .pme_long_desc = "Marked data loaded from another L2 on same chip modified", }, [ POWER7_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x20010, .pme_short_desc = "Overflow from counter 1", .pme_long_desc = "Overflows from PMC1 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_VSU0_SINGLE ] = { .pme_name = "PM_VSU0_SINGLE", .pme_code = 0xa0a8, .pme_short_desc = "FPU single precision", .pme_long_desc = "VSU0 executed single precision instruction", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L3MISS ] = { .pme_name = "PM_MRK_PTEG_FROM_L3MISS", .pme_code = 0x2d058, .pme_short_desc = "Marked PTEG loaded from L3 miss", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from beyond the L3 due to a marked load or store", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L31_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_L31_SHR", .pme_code = 0x2d056, .pme_short_desc = "Marked PTEG loaded from another L3 on same chip shared", .pme_long_desc = "Marked PTEG loaded from another L3 on same chip shared", }, [ POWER7_PME_PM_VSU0_VECTOR_SP_ISSUED ] = { .pme_name = "PM_VSU0_VECTOR_SP_ISSUED", .pme_code = 0xb090, .pme_short_desc = "Single Precision vector instruction issued (executed)", .pme_long_desc = "Single Precision vector instruction issued (executed)", }, [ POWER7_PME_PM_VSU1_FEST ] = { .pme_name = "PM_VSU1_FEST", .pme_code = 0xa0ba, .pme_short_desc = "Estimate instruction executed", .pme_long_desc = "Estimate instruction executed", }, [ POWER7_PME_PM_MRK_INST_DISP ] = { .pme_name = "PM_MRK_INST_DISP", .pme_code = 0x20030, .pme_short_desc = "marked instruction dispatch", .pme_long_desc = "A marked instruction was dispatched", }, [ POWER7_PME_PM_VSU0_COMPLEX_ISSUED ] = { .pme_name = "PM_VSU0_COMPLEX_ISSUED", .pme_code = 0xb096, .pme_short_desc = "Complex VMX instruction issued", .pme_long_desc = "Complex VMX instruction issued", }, [ POWER7_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0xc0b6, .pme_short_desc = "LS1 Flush: Unaligned Store", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4K boundary)", }, [ POWER7_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x2, .pme_short_desc = "# PPC Instructions Finished", .pme_long_desc = "Number of PowerPC Instructions that completed.", }, [ POWER7_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x1000e, .pme_short_desc = "fxu0 idle and fxu1 idle", .pme_long_desc = "FXU0 and FXU1 are both idle.", }, [ POWER7_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0xc0b0, .pme_short_desc = "LS0 Flush: Unaligned Load", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1)", }, [ POWER7_PME_PM_MRK_DATA_FROM_DL2L3_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_MOD", .pme_code = 0x3d04c, .pme_short_desc = "Marked data loaded from distant L2 or L3 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from an L2 or L3 on a distant module due to a marked load.", }, [ POWER7_PME_PM_LSU_LMQ_SRQ_EMPTY_ALL_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_ALL_CYC", .pme_code = 0x3001c, .pme_short_desc = "ALL threads lsu empty (lmq and srq empty)", .pme_long_desc = "ALL threads lsu empty (lmq and srq empty)", }, [ POWER7_PME_PM_LSU1_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_LMQ_FULL", .pme_code = 0xc0a6, .pme_short_desc = "LS1 Reject: LMQ Full (LHR)", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER7_PME_PM_INST_PTEG_FROM_L21_MOD ] = { .pme_name = "PM_INST_PTEG_FROM_L21_MOD", .pme_code = 0x3e056, .pme_short_desc = "Instruction PTEG loaded from another L2 on same chip modified", .pme_long_desc = "Instruction PTEG loaded from another L2 on same chip modified", }, [ POWER7_PME_PM_INST_FROM_RL2L3_MOD ] = { .pme_name = "PM_INST_FROM_RL2L3_MOD", .pme_code = 0x14042, .pme_short_desc = "Instruction fetched from remote L2 or L3 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from an L2 or L3 on a remote module. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_SHL_CREATED ] = { .pme_name = "PM_SHL_CREATED", .pme_code = 0x5082, .pme_short_desc = "SHL table entry Created", .pme_long_desc = "SHL table entry Created", }, [ POWER7_PME_PM_L2_ST_HIT ] = { .pme_name = "PM_L2_ST_HIT", .pme_code = 0x46182, .pme_short_desc = "All successful store dispatches that were L2Hits", .pme_long_desc = "A store request hit in the L2 directory. This event includes all requests to this L2 from all sources. Total for all slices.", }, [ POWER7_PME_PM_DATA_FROM_DMEM ] = { .pme_name = "PM_DATA_FROM_DMEM", .pme_code = 0x1c04a, .pme_short_desc = "Data loaded from distant memory", .pme_long_desc = "The processor's Data Cache was reloaded with data from memory attached to a distant module due to a demand load", }, [ POWER7_PME_PM_L3_LD_MISS ] = { .pme_name = "PM_L3_LD_MISS", .pme_code = 0x2f082, .pme_short_desc = "L3 demand LD Miss", .pme_long_desc = "L3 demand LD Miss", }, [ POWER7_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x4000e, .pme_short_desc = "fxu0 idle and fxu1 busy. ", .pme_long_desc = "FXU0 was idle while FXU1 was busy", }, [ POWER7_PME_PM_DISP_CLB_HELD_RES ] = { .pme_name = "PM_DISP_CLB_HELD_RES", .pme_code = 0x2094, .pme_short_desc = "Dispatch/CLB Hold: Resource", .pme_long_desc = "Dispatch/CLB Hold: Resource", }, [ POWER7_PME_PM_L2_SN_SX_I_DONE ] = { .pme_name = "PM_L2_SN_SX_I_DONE", .pme_code = 0x36382, .pme_short_desc = "SNP dispatched and went from Sx or Tx to Ix", .pme_long_desc = "SNP dispatched and went from Sx or Tx to Ix", }, [ POWER7_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x30004, .pme_short_desc = "group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER7_PME_PM_STCX_CMPL ] = { .pme_name = "PM_STCX_CMPL", .pme_code = 0xc098, .pme_short_desc = "STCX executed", .pme_long_desc = "Conditional stores with reservation completed", }, [ POWER7_PME_PM_VSU0_2FLOP ] = { .pme_name = "PM_VSU0_2FLOP", .pme_code = 0xa098, .pme_short_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", .pme_long_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", }, [ POWER7_PME_PM_L3_PREF_MISS ] = { .pme_name = "PM_L3_PREF_MISS", .pme_code = 0x3f082, .pme_short_desc = "L3 Prefetch Directory Miss", .pme_long_desc = "L3 Prefetch Directory Miss", }, [ POWER7_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0xd096, .pme_short_desc = "A sync is in the SRQ", .pme_long_desc = "Cycles that a sync instruction is active in the Store Request Queue.", }, [ POWER7_PME_PM_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU_REJECT_ERAT_MISS", .pme_code = 0x20064, .pme_short_desc = "LSU Reject due to ERAT (up to 2 per cycles)", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions due to an ERAT miss. Combined unit 0 + 1. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER7_PME_PM_L1_ICACHE_MISS ] = { .pme_name = "PM_L1_ICACHE_MISS", .pme_code = 0x200fc, .pme_short_desc = "Demand iCache Miss", .pme_long_desc = "An instruction fetch request missed the L1 cache.", }, [ POWER7_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0xc0be, .pme_short_desc = "LS1 Flush: SRQ", .pme_long_desc = "Load Hit Store flush. A younger load was flushed from unit 1 because it hits (overlaps) an older store that is already in the SRQ or in the same group. If the real addresses match but the effective addresses do not, an alias condition exists that prevents store forwarding. If the load and store are in the same group the load must be flushed to separate the two instructions. ", }, [ POWER7_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0xc080, .pme_short_desc = "LS0 L1 D cache load references counted at finish", .pme_long_desc = "Load references to Level 1 Data Cache, by unit 0.", }, [ POWER7_PME_PM_VSU0_FEST ] = { .pme_name = "PM_VSU0_FEST", .pme_code = 0xa0b8, .pme_short_desc = "Estimate instruction executed", .pme_long_desc = "Estimate instruction executed", }, [ POWER7_PME_PM_VSU_VECTOR_SINGLE_ISSUED ] = { .pme_name = "PM_VSU_VECTOR_SINGLE_ISSUED", .pme_code = 0xb890, .pme_short_desc = "Single Precision vector instruction issued (executed)", .pme_long_desc = "Single Precision vector instruction issued (executed)", }, [ POWER7_PME_PM_FREQ_UP ] = { .pme_name = "PM_FREQ_UP", .pme_code = 0x4000c, .pme_short_desc = "Power Management: Above Threshold A", .pme_long_desc = "Processor frequency was sped up due to power management", }, [ POWER7_PME_PM_DATA_FROM_LMEM ] = { .pme_name = "PM_DATA_FROM_LMEM", .pme_code = 0x3c04a, .pme_short_desc = "Data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to the same module this proccessor is located on.", }, [ POWER7_PME_PM_LSU1_LDX ] = { .pme_name = "PM_LSU1_LDX", .pme_code = 0xc08a, .pme_short_desc = "LS1 Vector Loads", .pme_long_desc = "LS1 Vector Loads", }, [ POWER7_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x40010, .pme_short_desc = "Overflow from counter 3", .pme_long_desc = "Overflows from PMC3 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_MRK_BR_MPRED ] = { .pme_name = "PM_MRK_BR_MPRED", .pme_code = 0x30036, .pme_short_desc = "Marked Branch Mispredicted", .pme_long_desc = "A marked branch was mispredicted", }, [ POWER7_PME_PM_SHL_MATCH ] = { .pme_name = "PM_SHL_MATCH", .pme_code = 0x5086, .pme_short_desc = "SHL Table Match", .pme_long_desc = "SHL Table Match", }, [ POWER7_PME_PM_MRK_BR_TAKEN ] = { .pme_name = "PM_MRK_BR_TAKEN", .pme_code = 0x10036, .pme_short_desc = "Marked Branch Taken", .pme_long_desc = "A marked branch was taken", }, [ POWER7_PME_PM_CMPLU_STALL_BRU ] = { .pme_name = "PM_CMPLU_STALL_BRU", .pme_code = 0x4004e, .pme_short_desc = "Completion stall due to BRU", .pme_long_desc = "Completion stall due to BRU", }, [ POWER7_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0xd092, .pme_short_desc = "Instruction SLB Miss - Tota of all segment sizes", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ POWER7_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x1e, .pme_short_desc = "Cycles", .pme_long_desc = "Processor Cycles", }, [ POWER7_PME_PM_DISP_HELD_THERMAL ] = { .pme_name = "PM_DISP_HELD_THERMAL", .pme_code = 0x30006, .pme_short_desc = "Dispatch Held due to Thermal", .pme_long_desc = "Dispatch Held due to Thermal", }, [ POWER7_PME_PM_INST_PTEG_FROM_RL2L3_SHR ] = { .pme_name = "PM_INST_PTEG_FROM_RL2L3_SHR", .pme_code = 0x2e054, .pme_short_desc = "Instruction PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "Instruction PTEG loaded from remote L2 or L3 shared", }, [ POWER7_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0xc0a2, .pme_short_desc = "LS1 SRQ forwarded data to a load", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER7_PME_PM_GCT_NOSLOT_BR_MPRED ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED", .pme_code = 0x4001a, .pme_short_desc = "GCT empty by branch mispredict", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of a branch misprediction.", }, [ POWER7_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x100f2, .pme_short_desc = "1 or more ppc insts finished", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER7_PME_PM_PTEG_FROM_DMEM ] = { .pme_name = "PM_PTEG_FROM_DMEM", .pme_code = 0x2c052, .pme_short_desc = "PTEG loaded from distant memory", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with data from memory attached to a distant module due to a demand load or store.", }, [ POWER7_PME_PM_VSU_2FLOP ] = { .pme_name = "PM_VSU_2FLOP", .pme_code = 0xa898, .pme_short_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", .pme_long_desc = "two flops operation (scalar fmadd, fnmadd, fmsub, fnmsub and DP vector versions of single flop instructions)", }, [ POWER7_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x4086, .pme_short_desc = "Cycles No room in EAT", .pme_long_desc = "The Global Completion Table is completely full.", }, [ POWER7_PME_PM_MRK_DATA_FROM_L3_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3_CYC", .pme_code = 0x40020, .pme_short_desc = "Marked ld latency Data source 0001 (L3)", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER7_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0xd09d, .pme_short_desc = "Slot 0 of SRQ valid", .pme_long_desc = "Slot 0 of SRQ valid", }, [ POWER7_PME_PM_MRK_DERAT_MISS_4K ] = { .pme_name = "PM_MRK_DERAT_MISS_4K", .pme_code = 0x1d05c, .pme_short_desc = "Marked DERAT misses for 4K page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 4K page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x40ae, .pme_short_desc = "Branch mispredict - target address", .pme_long_desc = "A branch instruction target was incorrectly predicted. This will result in a branch mispredict flush unless a flush is detected from an older instruction.", }, [ POWER7_PME_PM_INST_PTEG_FROM_L2MISS ] = { .pme_name = "PM_INST_PTEG_FROM_L2MISS", .pme_code = 0x4e058, .pme_short_desc = "Instruction PTEG loaded from L2 miss", .pme_long_desc = "Instruction PTEG loaded from L2 miss", }, [ POWER7_PME_PM_DPU_HELD_POWER ] = { .pme_name = "PM_DPU_HELD_POWER", .pme_code = 0x20006, .pme_short_desc = "Dispatch Held due to Power Management", .pme_long_desc = "Cycles that Instruction Dispatch was held due to power management. More than one hold condition can exist at the same time", }, [ POWER7_PME_PM_RUN_INST_CMPL ] = { .pme_name = "PM_RUN_INST_CMPL", .pme_code = 0x400fa, .pme_short_desc = "Run_Instructions", .pme_long_desc = "Number of run instructions completed. ", }, [ POWER7_PME_PM_MRK_VSU_FIN ] = { .pme_name = "PM_MRK_VSU_FIN", .pme_code = 0x30032, .pme_short_desc = "vsu (fpu) marked instr finish", .pme_long_desc = "vsu (fpu) marked instr finish", }, [ POWER7_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0xd09c, .pme_short_desc = "Slot 0 of SRQ valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the SRQ is split between the two threads (16 entries each).", }, [ POWER7_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x20008, .pme_short_desc = "GCT empty, all threads", .pme_long_desc = "Cycles when the Global Completion Table was completely empty. No thread had an entry allocated.", }, [ POWER7_PME_PM_IOPS_DISP ] = { .pme_name = "PM_IOPS_DISP", .pme_code = 0x30014, .pme_short_desc = "IOPS dispatched", .pme_long_desc = "IOPS dispatched", }, [ POWER7_PME_PM_RUN_SPURR ] = { .pme_name = "PM_RUN_SPURR", .pme_code = 0x10008, .pme_short_desc = "Run SPURR", .pme_long_desc = "Run SPURR", }, [ POWER7_PME_PM_PTEG_FROM_L21_MOD ] = { .pme_name = "PM_PTEG_FROM_L21_MOD", .pme_code = 0x3c056, .pme_short_desc = "PTEG loaded from another L2 on same chip modified", .pme_long_desc = "PTEG loaded from another L2 on same chip modified", }, [ POWER7_PME_PM_VSU0_1FLOP ] = { .pme_name = "PM_VSU0_1FLOP", .pme_code = 0xa080, .pme_short_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg, xsadd, xsmul, xssub, xscmp, xssel, xsabs, xsnabs, xsre, xssqrte, xsneg) operation finished", .pme_long_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg, xsadd, xsmul, xssub, xscmp, xssel, xsabs, xsnabs, xsre, xssqrte, xsneg) operation finished", }, [ POWER7_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0xd0b2, .pme_short_desc = "TLBIE snoop", .pme_long_desc = "A tlbie was snooped from another processor.", }, [ POWER7_PME_PM_DATA_FROM_L3MISS ] = { .pme_name = "PM_DATA_FROM_L3MISS", .pme_code = 0x2c048, .pme_short_desc = "Demand LD - L3 Miss (not L2 hit and not L3 hit)", .pme_long_desc = "The processor's Data Cache was reloaded from beyond L3 due to a demand load", }, [ POWER7_PME_PM_VSU_SINGLE ] = { .pme_name = "PM_VSU_SINGLE", .pme_code = 0xa8a8, .pme_short_desc = "Vector or Scalar single precision", .pme_long_desc = "Vector or Scalar single precision", }, [ POWER7_PME_PM_DTLB_MISS_16G ] = { .pme_name = "PM_DTLB_MISS_16G", .pme_code = 0x1c05e, .pme_short_desc = "Data TLB miss for 16G page", .pme_long_desc = "Data TLB references to 16GB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_CMPLU_STALL_VECTOR ] = { .pme_name = "PM_CMPLU_STALL_VECTOR", .pme_code = 0x2001c, .pme_short_desc = "Completion stall caused by Vector instruction", .pme_long_desc = "Completion stall caused by Vector instruction", }, [ POWER7_PME_PM_FLUSH ] = { .pme_name = "PM_FLUSH", .pme_code = 0x400f8, .pme_short_desc = "Flush (any type)", .pme_long_desc = "Flushes occurred including LSU and Branch flushes.", }, [ POWER7_PME_PM_L2_LD_HIT ] = { .pme_name = "PM_L2_LD_HIT", .pme_code = 0x36182, .pme_short_desc = "All successful load dispatches that were L2 hits", .pme_long_desc = "A load request (data or instruction) hit in the L2 directory. Includes speculative, prefetched, and demand requests. This event includes all requests to this L2 from all sources. Total for all slices", }, [ POWER7_PME_PM_NEST_PAIR2_AND ] = { .pme_name = "PM_NEST_PAIR2_AND", .pme_code = 0x30883, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair2 AND", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair2 AND", }, [ POWER7_PME_PM_VSU1_1FLOP ] = { .pme_name = "PM_VSU1_1FLOP", .pme_code = 0xa082, .pme_short_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg, xsadd, xsmul, xssub, xscmp, xssel, xsabs, xsnabs, xsre, xssqrte, xsneg) operation finished", .pme_long_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg, xsadd, xsmul, xssub, xscmp, xssel, xsabs, xsnabs, xsre, xssqrte, xsneg) operation finished", }, [ POWER7_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x408a, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "An instruction prefetch request has been made.", }, [ POWER7_PME_PM_L3_LD_HIT ] = { .pme_name = "PM_L3_LD_HIT", .pme_code = 0x2f080, .pme_short_desc = "L3 demand LD Hits", .pme_long_desc = "L3 demand LD Hits", }, [ POWER7_PME_PM_GCT_NOSLOT_IC_MISS ] = { .pme_name = "PM_GCT_NOSLOT_IC_MISS", .pme_code = 0x2001a, .pme_short_desc = "GCT empty by I cache miss", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of an Instruction Cache miss.", }, [ POWER7_PME_PM_DISP_HELD ] = { .pme_name = "PM_DISP_HELD", .pme_code = 0x10006, .pme_short_desc = "Dispatch Held", .pme_long_desc = "Dispatch Held", }, [ POWER7_PME_PM_L2_LD ] = { .pme_name = "PM_L2_LD", .pme_code = 0x16080, .pme_short_desc = "Data Load Count", .pme_long_desc = "Data Load Count", }, [ POWER7_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0xc8bc, .pme_short_desc = "Flush: SRQ", .pme_long_desc = "Load Hit Store flush. A younger load was flushed because it hits (overlaps) an older store that is already in the SRQ or in the same group. If the real addresses match but the effective addresses do not, an alias condition exists that prevents store forwarding. If the load and store are in the same group the load must be flushed to separate the two instructions. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_BC_PLUS_8_CONV ] = { .pme_name = "PM_BC_PLUS_8_CONV", .pme_code = 0x40b8, .pme_short_desc = "BC+8 Converted", .pme_long_desc = "BC+8 Converted", }, [ POWER7_PME_PM_MRK_DATA_FROM_L31_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L31_MOD_CYC", .pme_code = 0x40026, .pme_short_desc = "Marked ld latency Data source 0111 (L3.1 M same chip)", .pme_long_desc = "Marked ld latency Data source 0111 (L3.1 M same chip)", }, [ POWER7_PME_PM_CMPLU_STALL_VECTOR_LONG ] = { .pme_name = "PM_CMPLU_STALL_VECTOR_LONG", .pme_code = 0x4004a, .pme_short_desc = "completion stall due to long latency vector instruction", .pme_long_desc = "completion stall due to long latency vector instruction", }, [ POWER7_PME_PM_L2_RCST_BUSY_RC_FULL ] = { .pme_name = "PM_L2_RCST_BUSY_RC_FULL", .pme_code = 0x26282, .pme_short_desc = " L2 activated Busy to the core for stores due to all RC full", .pme_long_desc = " L2 activated Busy to the core for stores due to all RC full", }, [ POWER7_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x300f8, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ POWER7_PME_PM_THERMAL_MAX ] = { .pme_name = "PM_THERMAL_MAX", .pme_code = 0x40006, .pme_short_desc = "Processor In Thermal MAX", .pme_long_desc = "The processor experienced a thermal overload condition. This bit is sticky, it remains set until cleared by software.", }, [ POWER7_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0xc0b2, .pme_short_desc = "LS 1 Flush: Unaligned Load", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1).", }, [ POWER7_PME_PM_LSU1_REJECT_LHS ] = { .pme_name = "PM_LSU1_REJECT_LHS", .pme_code = 0xc0ae, .pme_short_desc = "LS1 Reject: Load Hit Store", .pme_long_desc = "Load Store Unit 1 rejected a load instruction that had an address overlap with an older store in the store queue. The store must be committed and de-allocated from the Store Queue before the load can execute successfully.", }, [ POWER7_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0xd09f, .pme_short_desc = "Slot 0 of LRQ valid", .pme_long_desc = "Slot 0 of LRQ valid", }, [ POWER7_PME_PM_L3_CO_L31 ] = { .pme_name = "PM_L3_CO_L31", .pme_code = 0x4f080, .pme_short_desc = "L3 Castouts to Memory", .pme_long_desc = "L3 Castouts to Memory", }, [ POWER7_PME_PM_POWER_EVENT4 ] = { .pme_name = "PM_POWER_EVENT4", .pme_code = 0x4006e, .pme_short_desc = "Power Management Event 4", .pme_long_desc = "Power Management Event 4", }, [ POWER7_PME_PM_DATA_FROM_L31_SHR ] = { .pme_name = "PM_DATA_FROM_L31_SHR", .pme_code = 0x1c04e, .pme_short_desc = "Data loaded from another L3 on same chip shared", .pme_long_desc = "Data loaded from another L3 on same chip shared", }, [ POWER7_PME_PM_BR_UNCOND ] = { .pme_name = "PM_BR_UNCOND", .pme_code = 0x409e, .pme_short_desc = "Unconditional Branch", .pme_long_desc = "An unconditional branch was executed.", }, [ POWER7_PME_PM_LSU1_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_LSU1_DC_PREF_STREAM_ALLOC", .pme_code = 0xd0aa, .pme_short_desc = "LS 1 D cache new prefetch stream allocated", .pme_long_desc = "LS 1 D cache new prefetch stream allocated", }, [ POWER7_PME_PM_PMC4_REWIND ] = { .pme_name = "PM_PMC4_REWIND", .pme_code = 0x10020, .pme_short_desc = "PMC4 Rewind Event", .pme_long_desc = "PMC4 was counting speculatively. The speculative condition was not met and the counter was restored to its previous value.", }, [ POWER7_PME_PM_L2_RCLD_DISP ] = { .pme_name = "PM_L2_RCLD_DISP", .pme_code = 0x16280, .pme_short_desc = " L2 RC load dispatch attempt", .pme_long_desc = " L2 RC load dispatch attempt", }, [ POWER7_PME_PM_THRD_PRIO_2_3_CYC ] = { .pme_name = "PM_THRD_PRIO_2_3_CYC", .pme_code = 0x40b2, .pme_short_desc = " Cycles thread running at priority level 2 or 3", .pme_long_desc = " Cycles thread running at priority level 2 or 3", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L2MISS ] = { .pme_name = "PM_MRK_PTEG_FROM_L2MISS", .pme_code = 0x4d058, .pme_short_desc = "Marked PTEG loaded from L2 miss", .pme_long_desc = "A Page Table Entry was loaded into the ERAT but not from the local L2 due to a marked load or store.", }, [ POWER7_PME_PM_IC_DEMAND_L2_BHT_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BHT_REDIRECT", .pme_code = 0x4098, .pme_short_desc = " L2 I cache demand request due to BHT redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (CR mispredict).", }, [ POWER7_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x200f6, .pme_short_desc = "DERAT Reloaded due to a DERAT miss", .pme_long_desc = "Total D-ERAT Misses. Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_IC_PREF_CANCEL_L2 ] = { .pme_name = "PM_IC_PREF_CANCEL_L2", .pme_code = 0x4094, .pme_short_desc = "L2 Squashed request", .pme_long_desc = "L2 Squashed request", }, [ POWER7_PME_PM_MRK_FIN_STALL_CYC_COUNT ] = { .pme_name = "PM_MRK_FIN_STALL_CYC_COUNT", .pme_code = 0x1003d, .pme_short_desc = "Marked instruction Finish Stall cycles (marked finish after NTC) (use edge detect to count #)", .pme_long_desc = "Marked instruction Finish Stall cycles (marked finish after NTC) (use edge detect to count #)", }, [ POWER7_PME_PM_BR_PRED_CCACHE ] = { .pme_name = "PM_BR_PRED_CCACHE", .pme_code = 0x40a0, .pme_short_desc = "Count Cache Predictions", .pme_long_desc = "The count value of a Branch and Count instruction was predicted", }, [ POWER7_PME_PM_GCT_UTIL_1_TO_2_SLOTS ] = { .pme_name = "PM_GCT_UTIL_1_TO_2_SLOTS", .pme_code = 0x209c, .pme_short_desc = "GCT Utilization 1-2 entries", .pme_long_desc = "GCT Utilization 1-2 entries", }, [ POWER7_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x30034, .pme_short_desc = "marked store complete (data home) with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ POWER7_PME_PM_LSU_TWO_TABLEWALK_CYC ] = { .pme_name = "PM_LSU_TWO_TABLEWALK_CYC", .pme_code = 0xd0a6, .pme_short_desc = "Cycles when two tablewalks pending on this thread", .pme_long_desc = "Cycles when two tablewalks pending on this thread", }, [ POWER7_PME_PM_MRK_DATA_FROM_L3MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L3MISS", .pme_code = 0x2d048, .pme_short_desc = "Marked data loaded from L3 miss", .pme_long_desc = "DL1 was reloaded from beyond L3 due to a marked load.", }, [ POWER7_PME_PM_GCT_NOSLOT_CYC ] = { .pme_name = "PM_GCT_NOSLOT_CYC", .pme_code = 0x100f8, .pme_short_desc = "No itags assigned ", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread.", }, [ POWER7_PME_PM_LSU_SET_MPRED ] = { .pme_name = "PM_LSU_SET_MPRED", .pme_code = 0xc0a8, .pme_short_desc = "Line already in cache at reload time", .pme_long_desc = "Line already in cache at reload time", }, [ POWER7_PME_PM_FLUSH_DISP_TLBIE ] = { .pme_name = "PM_FLUSH_DISP_TLBIE", .pme_code = 0x208a, .pme_short_desc = "Dispatch Flush: TLBIE", .pme_long_desc = "Dispatch Flush: TLBIE", }, [ POWER7_PME_PM_VSU1_FCONV ] = { .pme_name = "PM_VSU1_FCONV", .pme_code = 0xa0b2, .pme_short_desc = "Convert instruction executed", .pme_long_desc = "Convert instruction executed", }, [ POWER7_PME_PM_DERAT_MISS_16G ] = { .pme_name = "PM_DERAT_MISS_16G", .pme_code = 0x4c05c, .pme_short_desc = "DERAT misses for 16G page", .pme_long_desc = "A data request (load or store) missed the ERAT for 16G page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_INST_FROM_LMEM ] = { .pme_name = "PM_INST_FROM_LMEM", .pme_code = 0x3404a, .pme_short_desc = "Instruction fetched from local memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to the same module this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_IC_DEMAND_L2_BR_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BR_REDIRECT", .pme_code = 0x409a, .pme_short_desc = " L2 I cache demand request due to branch redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (either ALL mispredicted or Target).", }, [ POWER7_PME_PM_CMPLU_STALL_SCALAR_LONG ] = { .pme_name = "PM_CMPLU_STALL_SCALAR_LONG", .pme_code = 0x20018, .pme_short_desc = "Completion stall caused by long latency scalar instruction", .pme_long_desc = "Completion stall caused by long latency scalar instruction", }, [ POWER7_PME_PM_INST_PTEG_FROM_L2 ] = { .pme_name = "PM_INST_PTEG_FROM_L2", .pme_code = 0x1e050, .pme_short_desc = "Instruction PTEG loaded from L2", .pme_long_desc = "Instruction PTEG loaded from L2", }, [ POWER7_PME_PM_PTEG_FROM_L2 ] = { .pme_name = "PM_PTEG_FROM_L2", .pme_code = 0x1c050, .pme_short_desc = "PTEG loaded from L2", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from the local L2 due to a demand load or store.", }, [ POWER7_PME_PM_MRK_DATA_FROM_L21_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L21_SHR_CYC", .pme_code = 0x20024, .pme_short_desc = "Marked ld latency Data source 0100 (L2.1 S)", .pme_long_desc = "Marked load latency Data source 0100 (L2.1 S)", }, [ POWER7_PME_PM_MRK_DTLB_MISS_4K ] = { .pme_name = "PM_MRK_DTLB_MISS_4K", .pme_code = 0x2d05a, .pme_short_desc = "Marked Data TLB misses for 4K page", .pme_long_desc = "Data TLB references to 4KB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_VSU0_FPSCR ] = { .pme_name = "PM_VSU0_FPSCR", .pme_code = 0xb09c, .pme_short_desc = "Move to/from FPSCR type instruction issued on Pipe 0", .pme_long_desc = "Move to/from FPSCR type instruction issued on Pipe 0", }, [ POWER7_PME_PM_VSU1_VECT_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU1_VECT_DOUBLE_ISSUED", .pme_code = 0xb082, .pme_short_desc = "Double Precision vector instruction issued on Pipe1", .pme_long_desc = "Double Precision vector instruction issued on Pipe1", }, [ POWER7_PME_PM_MRK_PTEG_FROM_RL2L3_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_RL2L3_MOD", .pme_code = 0x1d052, .pme_short_desc = "Marked PTEG loaded from remote L2 or L3 modified", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with shared (T or SL) data from an L2 or L3 on a remote module due to a marked load or store.", }, [ POWER7_PME_PM_MEM0_RQ_DISP ] = { .pme_name = "PM_MEM0_RQ_DISP", .pme_code = 0x10083, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair0 Bit1", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair0 Bit1", }, [ POWER7_PME_PM_L2_LD_MISS ] = { .pme_name = "PM_L2_LD_MISS", .pme_code = 0x26080, .pme_short_desc = "Data Load Miss", .pme_long_desc = "Data Load Miss", }, [ POWER7_PME_PM_VMX_RESULT_SAT_1 ] = { .pme_name = "PM_VMX_RESULT_SAT_1", .pme_code = 0xb0a0, .pme_short_desc = "Valid result with sat=1", .pme_long_desc = "Valid result with sat=1", }, [ POWER7_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0xd8b8, .pme_short_desc = "L1 Prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ POWER7_PME_PM_MRK_DATA_FROM_LMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM_CYC", .pme_code = 0x2002c, .pme_short_desc = "Marked ld latency Data Source 1100 (Local Memory)", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER7_PME_PM_GRP_IC_MISS_NONSPEC ] = { .pme_name = "PM_GRP_IC_MISS_NONSPEC", .pme_code = 0x1000c, .pme_short_desc = "Group experienced non-speculative I cache miss", .pme_long_desc = "Number of groups, counted at completion, that have encountered an instruction cache miss.", }, [ POWER7_PME_PM_PB_NODE_PUMP ] = { .pme_name = "PM_PB_NODE_PUMP", .pme_code = 0x10081, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair0 Bit0", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair0 Bit0", }, [ POWER7_PME_PM_SHL_MERGED ] = { .pme_name = "PM_SHL_MERGED", .pme_code = 0x5084, .pme_short_desc = "SHL table entry merged with existing", .pme_long_desc = "SHL table entry merged with existing", }, [ POWER7_PME_PM_NEST_PAIR1_ADD ] = { .pme_name = "PM_NEST_PAIR1_ADD", .pme_code = 0x20881, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair1 ADD", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair1 ADD", }, [ POWER7_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x1c048, .pme_short_desc = "Data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a demand load.", }, [ POWER7_PME_PM_LSU_FLUSH ] = { .pme_name = "PM_LSU_FLUSH", .pme_code = 0x208e, .pme_short_desc = "Flush initiated by LSU", .pme_long_desc = "A flush was initiated by the Load Store Unit.", }, [ POWER7_PME_PM_LSU_SRQ_SYNC_COUNT ] = { .pme_name = "PM_LSU_SRQ_SYNC_COUNT", .pme_code = 0xd097, .pme_short_desc = "SRQ sync count (edge of PM_LSU_SRQ_SYNC_CYC)", .pme_long_desc = "SRQ sync count (edge of PM_LSU_SRQ_SYNC_CYC)", }, [ POWER7_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x30010, .pme_short_desc = "Overflow from counter 2", .pme_long_desc = "Overflows from PMC2 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0xc884, .pme_short_desc = "All Scalar Loads", .pme_long_desc = "LSU executed Floating Point load instruction. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_POWER_EVENT3 ] = { .pme_name = "PM_POWER_EVENT3", .pme_code = 0x3006e, .pme_short_desc = "Power Management Event 3", .pme_long_desc = "Power Management Event 3", }, [ POWER7_PME_PM_DISP_WT ] = { .pme_name = "PM_DISP_WT", .pme_code = 0x30008, .pme_short_desc = "Dispatched Starved (not held, nothing to dispatch)", .pme_long_desc = "Dispatched Starved (not held, nothing to dispatch)", }, [ POWER7_PME_PM_CMPLU_STALL_REJECT ] = { .pme_name = "PM_CMPLU_STALL_REJECT", .pme_code = 0x40016, .pme_short_desc = "Completion stall caused by reject", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a load/store reject. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER7_PME_PM_IC_BANK_CONFLICT ] = { .pme_name = "PM_IC_BANK_CONFLICT", .pme_code = 0x4082, .pme_short_desc = "Read blocked due to interleave conflict. ", .pme_long_desc = "Read blocked due to interleave conflict. ", }, [ POWER7_PME_PM_BR_MPRED_CR_TA ] = { .pme_name = "PM_BR_MPRED_CR_TA", .pme_code = 0x48ae, .pme_short_desc = "Branch mispredict - taken/not taken and target", .pme_long_desc = "Branch mispredict - taken/not taken and target", }, [ POWER7_PME_PM_L2_INST_MISS ] = { .pme_name = "PM_L2_INST_MISS", .pme_code = 0x36082, .pme_short_desc = "Instruction Load Misses", .pme_long_desc = "Instruction Load Misses", }, [ POWER7_PME_PM_CMPLU_STALL_ERAT_MISS ] = { .pme_name = "PM_CMPLU_STALL_ERAT_MISS", .pme_code = 0x40018, .pme_short_desc = "Completion stall caused by ERAT miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered an ERAT miss. This is a subset of PM_CMPLU_STALL_REJECT.", }, [ POWER7_PME_PM_NEST_PAIR2_ADD ] = { .pme_name = "PM_NEST_PAIR2_ADD", .pme_code = 0x30881, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair2 ADD", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair2 ADD", }, [ POWER7_PME_PM_MRK_LSU_FLUSH ] = { .pme_name = "PM_MRK_LSU_FLUSH", .pme_code = 0xd08c, .pme_short_desc = "Flush: (marked) : All Cases", .pme_long_desc = "Marked flush initiated by LSU", }, [ POWER7_PME_PM_L2_LDST ] = { .pme_name = "PM_L2_LDST", .pme_code = 0x16880, .pme_short_desc = "Data Load+Store Count", .pme_long_desc = "Data Load+Store Count", }, [ POWER7_PME_PM_INST_FROM_L31_SHR ] = { .pme_name = "PM_INST_FROM_L31_SHR", .pme_code = 0x1404e, .pme_short_desc = "Instruction fetched from another L3 on same chip shared", .pme_long_desc = "Instruction fetched from another L3 on same chip shared", }, [ POWER7_PME_PM_VSU0_FIN ] = { .pme_name = "PM_VSU0_FIN", .pme_code = 0xa0bc, .pme_short_desc = "VSU0 Finished an instruction", .pme_long_desc = "VSU0 Finished an instruction", }, [ POWER7_PME_PM_LARX_LSU ] = { .pme_name = "PM_LARX_LSU", .pme_code = 0xc894, .pme_short_desc = "Larx Finished", .pme_long_desc = "Larx Finished", }, [ POWER7_PME_PM_INST_FROM_RMEM ] = { .pme_name = "PM_INST_FROM_RMEM", .pme_code = 0x34042, .pme_short_desc = "Instruction fetched from remote memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to a different module than this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_DISP_CLB_HELD_TLBIE ] = { .pme_name = "PM_DISP_CLB_HELD_TLBIE", .pme_code = 0x2096, .pme_short_desc = "Dispatch Hold: Due to TLBIE", .pme_long_desc = "Dispatch Hold: Due to TLBIE", }, [ POWER7_PME_PM_MRK_DATA_FROM_DMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_DMEM_CYC", .pme_code = 0x2002e, .pme_short_desc = "Marked ld latency Data Source 1110 (Distant Memory)", .pme_long_desc = "Marked ld latency Data Source 1110 (Distant Memory)", }, [ POWER7_PME_PM_BR_PRED_CR ] = { .pme_name = "PM_BR_PRED_CR", .pme_code = 0x40a8, .pme_short_desc = "Branch predict - taken/not taken", .pme_long_desc = "A conditional branch instruction was predicted as taken or not taken.", }, [ POWER7_PME_PM_LSU_REJECT ] = { .pme_name = "PM_LSU_REJECT", .pme_code = 0x10064, .pme_short_desc = "LSU Reject (up to 2 per cycle)", .pme_long_desc = "The Load Store Unit rejected an instruction. Combined Unit 0 + 1", }, [ POWER7_PME_PM_GCT_UTIL_3_TO_6_SLOTS ] = { .pme_name = "PM_GCT_UTIL_3_TO_6_SLOTS", .pme_code = 0x209e, .pme_short_desc = "GCT Utilization 3-6 entries", .pme_long_desc = "GCT Utilization 3-6 entries", }, [ POWER7_PME_PM_CMPLU_STALL_END_GCT_NOSLOT ] = { .pme_name = "PM_CMPLU_STALL_END_GCT_NOSLOT", .pme_code = 0x10028, .pme_short_desc = "Count ended because GCT went empty", .pme_long_desc = "Count ended because GCT went empty", }, [ POWER7_PME_PM_LSU0_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_LMQ_FULL", .pme_code = 0xc0a4, .pme_short_desc = "LS0 Reject: LMQ Full (LHR)", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER7_PME_PM_VSU_FEST ] = { .pme_name = "PM_VSU_FEST", .pme_code = 0xa8b8, .pme_short_desc = "Estimate instruction executed", .pme_long_desc = "Estimate instruction executed", }, [ POWER7_PME_PM_NEST_PAIR0_AND ] = { .pme_name = "PM_NEST_PAIR0_AND", .pme_code = 0x10883, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair0 AND", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair0 AND", }, [ POWER7_PME_PM_PTEG_FROM_L3 ] = { .pme_name = "PM_PTEG_FROM_L3", .pme_code = 0x2c050, .pme_short_desc = "PTEG loaded from L3", .pme_long_desc = "A Page Table Entry was loaded into the TLB from the local L3 due to a demand load.", }, [ POWER7_PME_PM_POWER_EVENT2 ] = { .pme_name = "PM_POWER_EVENT2", .pme_code = 0x2006e, .pme_short_desc = "Power Management Event 2", .pme_long_desc = "Power Management Event 2", }, [ POWER7_PME_PM_IC_PREF_CANCEL_PAGE ] = { .pme_name = "PM_IC_PREF_CANCEL_PAGE", .pme_code = 0x4090, .pme_short_desc = "Prefetch Canceled due to page boundary", .pme_long_desc = "Prefetch Canceled due to page boundary", }, [ POWER7_PME_PM_VSU0_FSQRT_FDIV ] = { .pme_name = "PM_VSU0_FSQRT_FDIV", .pme_code = 0xa088, .pme_short_desc = "four flops operation (fdiv,fsqrt,xsdiv,xssqrt) Scalar Instructions only!", .pme_long_desc = "four flops operation (fdiv,fsqrt,xsdiv,xssqrt) Scalar Instructions only!", }, [ POWER7_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x40030, .pme_short_desc = "Marked group complete", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER7_PME_PM_VSU0_SCAL_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU0_SCAL_DOUBLE_ISSUED", .pme_code = 0xb088, .pme_short_desc = "Double Precision scalar instruction issued on Pipe0", .pme_long_desc = "Double Precision scalar instruction issued on Pipe0", }, [ POWER7_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x3000a, .pme_short_desc = "dispatch_success (Group Dispatched)", .pme_long_desc = "A group was dispatched", }, [ POWER7_PME_PM_LSU0_LDX ] = { .pme_name = "PM_LSU0_LDX", .pme_code = 0xc088, .pme_short_desc = "LS0 Vector Loads", .pme_long_desc = "LS0 Vector Loads", }, [ POWER7_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1c040, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a demand load.", }, [ POWER7_PME_PM_MRK_DATA_FROM_RL2L3_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_MOD", .pme_code = 0x1d042, .pme_short_desc = "Marked data loaded from remote L2 or L3 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from an L2 or L3 on a remote module due to a marked load.", }, [ POWER7_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0xc880, .pme_short_desc = " L1 D cache load references counted at finish", .pme_long_desc = " L1 D cache load references counted at finish", }, [ POWER7_PME_PM_VSU0_VECT_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU0_VECT_DOUBLE_ISSUED", .pme_code = 0xb080, .pme_short_desc = "Double Precision vector instruction issued on Pipe0", .pme_long_desc = "Double Precision vector instruction issued on Pipe0", }, [ POWER7_PME_PM_VSU1_2FLOP_DOUBLE ] = { .pme_name = "PM_VSU1_2FLOP_DOUBLE", .pme_code = 0xa08e, .pme_short_desc = "two flop DP vector operation (xvadddp, xvmuldp, xvsubdp, xvcmpdp, xvseldp, xvabsdp, xvnabsdp, xvredp ,xvsqrtedp, vxnegdp) ", .pme_long_desc = "two flop DP vector operation (xvadddp, xvmuldp, xvsubdp, xvcmpdp, xvseldp, xvabsdp, xvnabsdp, xvredp ,xvsqrtedp, vxnegdp) ", }, [ POWER7_PME_PM_THRD_PRIO_6_7_CYC ] = { .pme_name = "PM_THRD_PRIO_6_7_CYC", .pme_code = 0x40b6, .pme_short_desc = " Cycles thread running at priority level 6 or 7", .pme_long_desc = " Cycles thread running at priority level 6 or 7", }, [ POWER7_PME_PM_BC_PLUS_8_RSLV_TAKEN ] = { .pme_name = "PM_BC_PLUS_8_RSLV_TAKEN", .pme_code = 0x40ba, .pme_short_desc = "BC+8 Resolve outcome was Taken, resulting in the conditional instruction being canceled", .pme_long_desc = "BC+8 Resolve outcome was Taken, resulting in the conditional instruction being canceled", }, [ POWER7_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x40ac, .pme_short_desc = "Branch mispredict - taken/not taken", .pme_long_desc = "A conditional branch instruction was incorrectly predicted as taken or not taken. The branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This will result in a branch redirect flush if not overfidden by a flush of an older instruction.", }, [ POWER7_PME_PM_L3_CO_MEM ] = { .pme_name = "PM_L3_CO_MEM", .pme_code = 0x4f082, .pme_short_desc = "L3 Castouts to L3.1", .pme_long_desc = "L3 Castouts to L3.1", }, [ POWER7_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x400f0, .pme_short_desc = "Load Missed L1", .pme_long_desc = "Load references that miss the Level 1 Data cache. Combined unit 0 + 1.", }, [ POWER7_PME_PM_DATA_FROM_RL2L3_MOD ] = { .pme_name = "PM_DATA_FROM_RL2L3_MOD", .pme_code = 0x1c042, .pme_short_desc = "Data loaded from remote L2 or L3 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from an L2 or L3 on a remote module due to a demand load", }, [ POWER7_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x1001a, .pme_short_desc = "Storage Queue is full and is blocking dispatch", .pme_long_desc = "Cycles the Store Request Queue is full.", }, [ POWER7_PME_PM_TABLEWALK_CYC ] = { .pme_name = "PM_TABLEWALK_CYC", .pme_code = 0x10026, .pme_short_desc = "Cycles when a tablewalk (I or D) is active", .pme_long_desc = "Cycles doing instruction or data tablewalks", }, [ POWER7_PME_PM_MRK_PTEG_FROM_RMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_RMEM", .pme_code = 0x3d052, .pme_short_desc = "Marked PTEG loaded from remote memory", .pme_long_desc = "A Page Table Entry was loaded into the ERAT. POWER6 does not have a TLB", }, [ POWER7_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0xc8a0, .pme_short_desc = "Load got data from a store", .pme_long_desc = "Data from a store instruction was forwarded to a load. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_INST_PTEG_FROM_RMEM ] = { .pme_name = "PM_INST_PTEG_FROM_RMEM", .pme_code = 0x3e052, .pme_short_desc = "Instruction PTEG loaded from remote memory", .pme_long_desc = "Instruction PTEG loaded from remote memory", }, [ POWER7_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x10004, .pme_short_desc = "FXU0 Finished", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER7_PME_PM_LSU1_L1_SW_PREF ] = { .pme_name = "PM_LSU1_L1_SW_PREF", .pme_code = 0xc09e, .pme_short_desc = "LSU1 Software L1 Prefetches, including SW Transient Prefetches", .pme_long_desc = "LSU1 Software L1 Prefetches, including SW Transient Prefetches", }, [ POWER7_PME_PM_PTEG_FROM_L31_MOD ] = { .pme_name = "PM_PTEG_FROM_L31_MOD", .pme_code = 0x1c054, .pme_short_desc = "PTEG loaded from another L3 on same chip modified", .pme_long_desc = "PTEG loaded from another L3 on same chip modified", }, [ POWER7_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x10024, .pme_short_desc = "Overflow from counter 5", .pme_long_desc = "Overflows from PMC5 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_LD_REF_L1_LSU1 ] = { .pme_name = "PM_LD_REF_L1_LSU1", .pme_code = 0xc082, .pme_short_desc = "LS1 L1 D cache load references counted at finish", .pme_long_desc = "Load references to Level 1 Data Cache, by unit 1.", }, [ POWER7_PME_PM_INST_PTEG_FROM_L21_SHR ] = { .pme_name = "PM_INST_PTEG_FROM_L21_SHR", .pme_code = 0x4e056, .pme_short_desc = "Instruction PTEG loaded from another L2 on same chip shared", .pme_long_desc = "Instruction PTEG loaded from another L2 on same chip shared", }, [ POWER7_PME_PM_CMPLU_STALL_THRD ] = { .pme_name = "PM_CMPLU_STALL_THRD", .pme_code = 0x1001c, .pme_short_desc = "Completion Stalled due to thread conflict. Group ready to complete but it was another thread's turn", .pme_long_desc = "Completion Stalled due to thread conflict. Group ready to complete but it was another thread's turn", }, [ POWER7_PME_PM_DATA_FROM_RMEM ] = { .pme_name = "PM_DATA_FROM_RMEM", .pme_code = 0x3c042, .pme_short_desc = "Data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to a different module than this proccessor is located on.", }, [ POWER7_PME_PM_VSU0_SCAL_SINGLE_ISSUED ] = { .pme_name = "PM_VSU0_SCAL_SINGLE_ISSUED", .pme_code = 0xb084, .pme_short_desc = "Single Precision scalar instruction issued on Pipe0", .pme_long_desc = "Single Precision scalar instruction issued on Pipe0", }, [ POWER7_PME_PM_BR_MPRED_LSTACK ] = { .pme_name = "PM_BR_MPRED_LSTACK", .pme_code = 0x40a6, .pme_short_desc = "Branch Mispredict due to Link Stack", .pme_long_desc = "Branch Mispredict due to Link Stack", }, [ POWER7_PME_PM_MRK_DATA_FROM_RL2L3_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_MOD_CYC", .pme_code = 0x40028, .pme_short_desc = "Marked ld latency Data source 1001 (L2.5/L3.5 M same 4 chip node)", .pme_long_desc = "Marked ld latency Data source 1001 (L2.5/L3.5 M same 4 chip node)", }, [ POWER7_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0xc0b4, .pme_short_desc = "LS0 Flush: Unaligned Store", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4K boundary).", }, [ POWER7_PME_PM_LSU_NCST ] = { .pme_name = "PM_LSU_NCST", .pme_code = 0xc090, .pme_short_desc = "Non-cachable Stores sent to nest", .pme_long_desc = "Non-cachable Stores sent to nest", }, [ POWER7_PME_PM_BR_TAKEN ] = { .pme_name = "PM_BR_TAKEN", .pme_code = 0x20004, .pme_short_desc = "Branch Taken", .pme_long_desc = "A branch instruction was taken. This could have been a conditional branch or an unconditional branch", }, [ POWER7_PME_PM_INST_PTEG_FROM_LMEM ] = { .pme_name = "PM_INST_PTEG_FROM_LMEM", .pme_code = 0x4e052, .pme_short_desc = "Instruction PTEG loaded from local memory", .pme_long_desc = "Instruction PTEG loaded from local memory", }, [ POWER7_PME_PM_GCT_NOSLOT_BR_MPRED_IC_MISS ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED_IC_MISS", .pme_code = 0x4001c, .pme_short_desc = "GCT empty by branch mispredict + IC miss", .pme_long_desc = "No slot in GCT caused by branch mispredict or I cache miss", }, [ POWER7_PME_PM_DTLB_MISS_4K ] = { .pme_name = "PM_DTLB_MISS_4K", .pme_code = 0x2c05a, .pme_short_desc = "Data TLB miss for 4K page", .pme_long_desc = "Data TLB references to 4KB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_PMC4_SAVED ] = { .pme_name = "PM_PMC4_SAVED", .pme_code = 0x30022, .pme_short_desc = "PMC4 Rewind Value saved (matched condition)", .pme_long_desc = "PMC4 was counting speculatively. The speculative condition was met and the counter value was committed by copying it to the backup register.", }, [ POWER7_PME_PM_VSU1_PERMUTE_ISSUED ] = { .pme_name = "PM_VSU1_PERMUTE_ISSUED", .pme_code = 0xb092, .pme_short_desc = "Permute VMX Instruction Issued", .pme_long_desc = "Permute VMX Instruction Issued", }, [ POWER7_PME_PM_SLB_MISS ] = { .pme_name = "PM_SLB_MISS", .pme_code = 0xd890, .pme_short_desc = "Data + Instruction SLB Miss - Total of all segment sizes", .pme_long_desc = "Total of all Segment Lookaside Buffer (SLB) misses, Instructions + Data.", }, [ POWER7_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0xc0ba, .pme_short_desc = "LS1 Flush: LRQ", .pme_long_desc = "Load Hit Load or Store Hit Load flush. A younger load was flushed from unit 1 because it executed before an older store and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER7_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x300fc, .pme_short_desc = "TLB reload valid", .pme_long_desc = "Data TLB misses, all page sizes.", }, [ POWER7_PME_PM_VSU1_FRSP ] = { .pme_name = "PM_VSU1_FRSP", .pme_code = 0xa0b6, .pme_short_desc = "Round to single precision instruction executed", .pme_long_desc = "Round to single precision instruction executed", }, [ POWER7_PME_PM_VSU_VECTOR_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU_VECTOR_DOUBLE_ISSUED", .pme_code = 0xb880, .pme_short_desc = "Double Precision vector instruction issued on Pipe0", .pme_long_desc = "Double Precision vector instruction issued on Pipe0", }, [ POWER7_PME_PM_L2_CASTOUT_SHR ] = { .pme_name = "PM_L2_CASTOUT_SHR", .pme_code = 0x16182, .pme_short_desc = "L2 Castouts - Shared (T, Te, Si, S)", .pme_long_desc = "An L2 line in the Shared state was castout. Total for all slices.", }, [ POWER7_PME_PM_DATA_FROM_DL2L3_SHR ] = { .pme_name = "PM_DATA_FROM_DL2L3_SHR", .pme_code = 0x3c044, .pme_short_desc = "Data loaded from distant L2 or L3 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from an L2 or L3 on a distant module due to a demand load", }, [ POWER7_PME_PM_VSU1_STF ] = { .pme_name = "PM_VSU1_STF", .pme_code = 0xb08e, .pme_short_desc = "FPU store (SP or DP) issued on Pipe1", .pme_long_desc = "FPU store (SP or DP) issued on Pipe1", }, [ POWER7_PME_PM_ST_FIN ] = { .pme_name = "PM_ST_FIN", .pme_code = 0x200f0, .pme_short_desc = "Store Instructions Finished", .pme_long_desc = "Store requests sent to the nest.", }, [ POWER7_PME_PM_PTEG_FROM_L21_SHR ] = { .pme_name = "PM_PTEG_FROM_L21_SHR", .pme_code = 0x4c056, .pme_short_desc = "PTEG loaded from another L2 on same chip shared", .pme_long_desc = "PTEG loaded from another L2 on same chip shared", }, [ POWER7_PME_PM_L2_LOC_GUESS_WRONG ] = { .pme_name = "PM_L2_LOC_GUESS_WRONG", .pme_code = 0x26480, .pme_short_desc = "L2 guess loc and guess was not correct (ie data remote)", .pme_long_desc = "L2 guess loc and guess was not correct (ie data remote)", }, [ POWER7_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0xd08e, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ POWER7_PME_PM_LSU0_REJECT_LHS ] = { .pme_name = "PM_LSU0_REJECT_LHS", .pme_code = 0xc0ac, .pme_short_desc = "LS0 Reject: Load Hit Store", .pme_long_desc = "Load Store Unit 0 rejected a load instruction that had an address overlap with an older store in the store queue. The store must be committed and de-allocated from the Store Queue before the load can execute successfully.", }, [ POWER7_PME_PM_IC_PREF_CANCEL_HIT ] = { .pme_name = "PM_IC_PREF_CANCEL_HIT", .pme_code = 0x4092, .pme_short_desc = "Prefetch Canceled due to icache hit", .pme_long_desc = "Prefetch Canceled due to icache hit", }, [ POWER7_PME_PM_L3_PREF_BUSY ] = { .pme_name = "PM_L3_PREF_BUSY", .pme_code = 0x4f080, .pme_short_desc = "Prefetch machines >= threshold (8,16,20,24)", .pme_long_desc = "Prefetch machines >= threshold (8,16,20,24)", }, [ POWER7_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x2003a, .pme_short_desc = "bru marked instr finish", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER7_PME_PM_LSU1_NCLD ] = { .pme_name = "PM_LSU1_NCLD", .pme_code = 0xc08e, .pme_short_desc = "LS1 Non-cachable Loads counted at finish", .pme_long_desc = "A non-cacheable load was executed by Unit 0.", }, [ POWER7_PME_PM_INST_PTEG_FROM_L31_MOD ] = { .pme_name = "PM_INST_PTEG_FROM_L31_MOD", .pme_code = 0x1e054, .pme_short_desc = "Instruction PTEG loaded from another L3 on same chip modified", .pme_long_desc = "Instruction PTEG loaded from another L3 on same chip modified", }, [ POWER7_PME_PM_LSU_NCLD ] = { .pme_name = "PM_LSU_NCLD", .pme_code = 0xc88c, .pme_short_desc = "Non-cachable Loads counted at finish", .pme_long_desc = "A non-cacheable load was executed. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_LSU_LDX ] = { .pme_name = "PM_LSU_LDX", .pme_code = 0xc888, .pme_short_desc = "All Vector loads (vsx vector + vmx vector)", .pme_long_desc = "All Vector loads (vsx vector + vmx vector)", }, [ POWER7_PME_PM_L2_LOC_GUESS_CORRECT ] = { .pme_name = "PM_L2_LOC_GUESS_CORRECT", .pme_code = 0x16480, .pme_short_desc = "L2 guess loc and guess was correct (ie data local)", .pme_long_desc = "L2 guess loc and guess was correct (ie data local)", }, [ POWER7_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x10038, .pme_short_desc = "Threshold timeout event", .pme_long_desc = "The threshold timer expired", }, [ POWER7_PME_PM_L3_PREF_ST ] = { .pme_name = "PM_L3_PREF_ST", .pme_code = 0xd0ae, .pme_short_desc = "L3 cache ST prefetches", .pme_long_desc = "L3 cache ST prefetches", }, [ POWER7_PME_PM_DISP_CLB_HELD_SYNC ] = { .pme_name = "PM_DISP_CLB_HELD_SYNC", .pme_code = 0x2098, .pme_short_desc = "Dispatch/CLB Hold: Sync type instruction", .pme_long_desc = "Dispatch/CLB Hold: Sync type instruction", }, [ POWER7_PME_PM_VSU_SIMPLE_ISSUED ] = { .pme_name = "PM_VSU_SIMPLE_ISSUED", .pme_code = 0xb894, .pme_short_desc = "Simple VMX instruction issued", .pme_long_desc = "Simple VMX instruction issued", }, [ POWER7_PME_PM_VSU1_SINGLE ] = { .pme_name = "PM_VSU1_SINGLE", .pme_code = 0xa0aa, .pme_short_desc = "FPU single precision", .pme_long_desc = "VSU1 executed single precision instruction", }, [ POWER7_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x3001a, .pme_short_desc = "Data Tablewalk Active", .pme_long_desc = "Cycles a translation tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ POWER7_PME_PM_L2_RC_ST_DONE ] = { .pme_name = "PM_L2_RC_ST_DONE", .pme_code = 0x36380, .pme_short_desc = "RC did st to line that was Tx or Sx", .pme_long_desc = "RC did st to line that was Tx or Sx", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L21_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_L21_MOD", .pme_code = 0x3d056, .pme_short_desc = "Marked PTEG loaded from another L2 on same chip modified", .pme_long_desc = "Marked PTEG loaded from another L2 on same chip modified", }, [ POWER7_PME_PM_LARX_LSU1 ] = { .pme_name = "PM_LARX_LSU1", .pme_code = 0xc096, .pme_short_desc = "ls1 Larx Finished", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 1 ", }, [ POWER7_PME_PM_MRK_DATA_FROM_RMEM ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM", .pme_code = 0x3d042, .pme_short_desc = "Marked data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to a different module than this proccessor is located on.", }, [ POWER7_PME_PM_DISP_CLB_HELD ] = { .pme_name = "PM_DISP_CLB_HELD", .pme_code = 0x2090, .pme_short_desc = "CLB Hold: Any Reason", .pme_long_desc = "CLB Hold: Any Reason", }, [ POWER7_PME_PM_DERAT_MISS_4K ] = { .pme_name = "PM_DERAT_MISS_4K", .pme_code = 0x1c05c, .pme_short_desc = "DERAT misses for 4K page", .pme_long_desc = "A data request (load or store) missed the ERAT for 4K page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_L2_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2_RCLD_DISP_FAIL_ADDR", .pme_code = 0x16282, .pme_short_desc = " L2 RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = " L2 RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", }, [ POWER7_PME_PM_SEG_EXCEPTION ] = { .pme_name = "PM_SEG_EXCEPTION", .pme_code = 0x28a4, .pme_short_desc = "ISEG + DSEG Exception", .pme_long_desc = "ISEG + DSEG Exception", }, [ POWER7_PME_PM_FLUSH_DISP_SB ] = { .pme_name = "PM_FLUSH_DISP_SB", .pme_code = 0x208c, .pme_short_desc = "Dispatch Flush: Scoreboard", .pme_long_desc = "Dispatch Flush: Scoreboard", }, [ POWER7_PME_PM_L2_DC_INV ] = { .pme_name = "PM_L2_DC_INV", .pme_code = 0x26182, .pme_short_desc = "Dcache invalidates from L2 ", .pme_long_desc = "The L2 invalidated a line in processor's data cache. This is caused by the L2 line being cast out or invalidated. Total for all slices", }, [ POWER7_PME_PM_PTEG_FROM_DL2L3_MOD ] = { .pme_name = "PM_PTEG_FROM_DL2L3_MOD", .pme_code = 0x4c054, .pme_short_desc = "PTEG loaded from distant L2 or L3 modified", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with modified (M) data from an L2 or L3 on a distant module due to a demand load or store.", }, [ POWER7_PME_PM_DSEG ] = { .pme_name = "PM_DSEG", .pme_code = 0x20a6, .pme_short_desc = "DSEG Exception", .pme_long_desc = "DSEG Exception", }, [ POWER7_PME_PM_BR_PRED_LSTACK ] = { .pme_name = "PM_BR_PRED_LSTACK", .pme_code = 0x40a2, .pme_short_desc = "Link Stack Predictions", .pme_long_desc = "The target address of a Branch to Link instruction was predicted by the link stack.", }, [ POWER7_PME_PM_VSU0_STF ] = { .pme_name = "PM_VSU0_STF", .pme_code = 0xb08c, .pme_short_desc = "FPU store (SP or DP) issued on Pipe0", .pme_long_desc = "FPU store (SP or DP) issued on Pipe0", }, [ POWER7_PME_PM_LSU_FX_FIN ] = { .pme_name = "PM_LSU_FX_FIN", .pme_code = 0x10066, .pme_short_desc = "LSU Finished a FX operation (up to 2 per cycle)", .pme_long_desc = "LSU Finished a FX operation (up to 2 per cycle)", }, [ POWER7_PME_PM_DERAT_MISS_16M ] = { .pme_name = "PM_DERAT_MISS_16M", .pme_code = 0x3c05c, .pme_short_desc = "DERAT misses for 16M page", .pme_long_desc = "A data request (load or store) missed the ERAT for 16M page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_MRK_PTEG_FROM_DL2L3_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_DL2L3_MOD", .pme_code = 0x4d054, .pme_short_desc = "Marked PTEG loaded from distant L2 or L3 modified", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with modified (M) data from an L2 or L3 on a distant module due to a marked load or store.", }, [ POWER7_PME_PM_GCT_UTIL_11_PLUS_SLOTS ] = { .pme_name = "PM_GCT_UTIL_11_PLUS_SLOTS", .pme_code = 0x20a2, .pme_short_desc = "GCT Utilization 11+ entries", .pme_long_desc = "GCT Utilization 11+ entries", }, [ POWER7_PME_PM_INST_FROM_L3 ] = { .pme_name = "PM_INST_FROM_L3", .pme_code = 0x14048, .pme_short_desc = "Instruction fetched from L3", .pme_long_desc = "An instruction fetch group was fetched from L3. Fetch Groups can contain up to 8 instructions", }, [ POWER7_PME_PM_MRK_IFU_FIN ] = { .pme_name = "PM_MRK_IFU_FIN", .pme_code = 0x3003a, .pme_short_desc = "IFU non-branch marked instruction finished", .pme_long_desc = "The Instruction Fetch Unit finished a marked instruction.", }, [ POWER7_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x400fc, .pme_short_desc = "ITLB Reloaded (always zero on POWER6)", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ POWER7_PME_PM_VSU_STF ] = { .pme_name = "PM_VSU_STF", .pme_code = 0xb88c, .pme_short_desc = "FPU store (SP or DP) issued on Pipe0", .pme_long_desc = "FPU store (SP or DP) issued on Pipe0", }, [ POWER7_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0xc8b4, .pme_short_desc = "Flush: Unaligned Store", .pme_long_desc = "A store was flushed because it was unaligned (crossed a 4K boundary). Combined Unit 0 + 1.", }, [ POWER7_PME_PM_L2_LDST_MISS ] = { .pme_name = "PM_L2_LDST_MISS", .pme_code = 0x26880, .pme_short_desc = "Data Load+Store Miss", .pme_long_desc = "Data Load+Store Miss", }, [ POWER7_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x40004, .pme_short_desc = "FXU1 Finished", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER7_PME_PM_SHL_DEALLOCATED ] = { .pme_name = "PM_SHL_DEALLOCATED", .pme_code = 0x5080, .pme_short_desc = "SHL Table entry deallocated", .pme_long_desc = "SHL Table entry deallocated", }, [ POWER7_PME_PM_L2_SN_M_WR_DONE ] = { .pme_name = "PM_L2_SN_M_WR_DONE", .pme_code = 0x46382, .pme_short_desc = "SNP dispatched for a write and was M", .pme_long_desc = "SNP dispatched for a write and was M", }, [ POWER7_PME_PM_LSU_REJECT_SET_MPRED ] = { .pme_name = "PM_LSU_REJECT_SET_MPRED", .pme_code = 0xc8a8, .pme_short_desc = "Reject: Set Predict Wrong", .pme_long_desc = "The Load Store Unit rejected an instruction because the cache set was improperly predicted. This is a fast reject and will be immediately redispatched. Combined Unit 0 + 1", }, [ POWER7_PME_PM_L3_PREF_LD ] = { .pme_name = "PM_L3_PREF_LD", .pme_code = 0xd0ac, .pme_short_desc = "L3 cache LD prefetches", .pme_long_desc = "L3 cache LD prefetches", }, [ POWER7_PME_PM_L2_SN_M_RD_DONE ] = { .pme_name = "PM_L2_SN_M_RD_DONE", .pme_code = 0x46380, .pme_short_desc = "SNP dispatched for a read and was M", .pme_long_desc = "SNP dispatched for a read and was M", }, [ POWER7_PME_PM_MRK_DERAT_MISS_16G ] = { .pme_name = "PM_MRK_DERAT_MISS_16G", .pme_code = 0x4d05c, .pme_short_desc = "Marked DERAT misses for 16G page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 16G page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_VSU_FCONV ] = { .pme_name = "PM_VSU_FCONV", .pme_code = 0xa8b0, .pme_short_desc = "Convert instruction executed", .pme_long_desc = "Convert instruction executed", }, [ POWER7_PME_PM_ANY_THRD_RUN_CYC ] = { .pme_name = "PM_ANY_THRD_RUN_CYC", .pme_code = 0x100fa, .pme_short_desc = "One of threads in run_cycles ", .pme_long_desc = "One of threads in run_cycles ", }, [ POWER7_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0xd0a4, .pme_short_desc = "LMQ full", .pme_long_desc = "The Load Miss Queue was full.", }, [ POWER7_PME_PM_MRK_LSU_REJECT_LHS ] = { .pme_name = "PM_MRK_LSU_REJECT_LHS", .pme_code = 0xd082, .pme_short_desc = " Reject(marked): Load Hit Store", .pme_long_desc = "The Load Store Unit rejected a marked load instruction that had an address overlap with an older store in the store queue. The store must be committed and de-allocated from the Store Queue before the load can execute successfully", }, [ POWER7_PME_PM_MRK_LD_MISS_L1_CYC ] = { .pme_name = "PM_MRK_LD_MISS_L1_CYC", .pme_code = 0x4003e, .pme_short_desc = "L1 data load miss cycles", .pme_long_desc = "L1 data load miss cycles", }, [ POWER7_PME_PM_MRK_DATA_FROM_L2_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2_CYC", .pme_code = 0x20020, .pme_short_desc = "Marked ld latency Data source 0000 (L2 hit)", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER7_PME_PM_INST_IMC_MATCH_DISP ] = { .pme_name = "PM_INST_IMC_MATCH_DISP", .pme_code = 0x30016, .pme_short_desc = "IMC Matches dispatched", .pme_long_desc = "IMC Matches dispatched", }, [ POWER7_PME_PM_MRK_DATA_FROM_RMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM_CYC", .pme_code = 0x4002c, .pme_short_desc = "Marked ld latency Data source 1101 (Memory same 4 chip node)", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER7_PME_PM_VSU0_SIMPLE_ISSUED ] = { .pme_name = "PM_VSU0_SIMPLE_ISSUED", .pme_code = 0xb094, .pme_short_desc = "Simple VMX instruction issued", .pme_long_desc = "Simple VMX instruction issued", }, [ POWER7_PME_PM_CMPLU_STALL_DIV ] = { .pme_name = "PM_CMPLU_STALL_DIV", .pme_code = 0x40014, .pme_short_desc = "Completion stall caused by DIV instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point divide instruction. This is a subset of PM_CMPLU_STALL_FXU.", }, [ POWER7_PME_PM_MRK_PTEG_FROM_RL2L3_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_RL2L3_SHR", .pme_code = 0x2d054, .pme_short_desc = "Marked PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from memory attached to a different module than this proccessor is located on due to a marked load or store.", }, [ POWER7_PME_PM_VSU_FMA_DOUBLE ] = { .pme_name = "PM_VSU_FMA_DOUBLE", .pme_code = 0xa890, .pme_short_desc = "DP vector version of fmadd,fnmadd,fmsub,fnmsub", .pme_long_desc = "DP vector version of fmadd,fnmadd,fmsub,fnmsub", }, [ POWER7_PME_PM_VSU_4FLOP ] = { .pme_name = "PM_VSU_4FLOP", .pme_code = 0xa89c, .pme_short_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", .pme_long_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", }, [ POWER7_PME_PM_VSU1_FIN ] = { .pme_name = "PM_VSU1_FIN", .pme_code = 0xa0be, .pme_short_desc = "VSU1 Finished an instruction", .pme_long_desc = "VSU1 Finished an instruction", }, [ POWER7_PME_PM_NEST_PAIR1_AND ] = { .pme_name = "PM_NEST_PAIR1_AND", .pme_code = 0x20883, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair1 AND", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair1 AND", }, [ POWER7_PME_PM_INST_PTEG_FROM_RL2L3_MOD ] = { .pme_name = "PM_INST_PTEG_FROM_RL2L3_MOD", .pme_code = 0x1e052, .pme_short_desc = "Instruction PTEG loaded from remote L2 or L3 modified", .pme_long_desc = "Instruction PTEG loaded from remote L2 or L3 modified", }, [ POWER7_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x200f4, .pme_short_desc = "Run_cycles", .pme_long_desc = "Processor Cycles gated by the run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. Gating by the run latch filters out the idle loop.", }, [ POWER7_PME_PM_PTEG_FROM_RMEM ] = { .pme_name = "PM_PTEG_FROM_RMEM", .pme_code = 0x3c052, .pme_short_desc = "PTEG loaded from remote memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to a different module than this proccessor is located on.", }, [ POWER7_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0xd09e, .pme_short_desc = "Slot 0 of LRQ valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the LRQ is split between the two threads (16 entries each).", }, [ POWER7_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0xc084, .pme_short_desc = "LS0 Scalar Loads", .pme_long_desc = "A floating point load was executed by LSU0", }, [ POWER7_PME_PM_FLUSH_COMPLETION ] = { .pme_name = "PM_FLUSH_COMPLETION", .pme_code = 0x30012, .pme_short_desc = "Completion Flush", .pme_long_desc = "Completion Flush", }, [ POWER7_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0x300f0, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_L2_NODE_PUMP ] = { .pme_name = "PM_L2_NODE_PUMP", .pme_code = 0x36480, .pme_short_desc = "RC req that was a local (aka node) pump attempt", .pme_long_desc = "RC req that was a local (aka node) pump attempt", }, [ POWER7_PME_PM_INST_FROM_DL2L3_SHR ] = { .pme_name = "PM_INST_FROM_DL2L3_SHR", .pme_code = 0x34044, .pme_short_desc = "Instruction fetched from distant L2 or L3 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L2 or L3 on a distant module. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_MRK_STALL_CMPLU_CYC ] = { .pme_name = "PM_MRK_STALL_CMPLU_CYC", .pme_code = 0x3003e, .pme_short_desc = "Marked Group Completion Stall cycles ", .pme_long_desc = "Marked Group Completion Stall cycles ", }, [ POWER7_PME_PM_VSU1_DENORM ] = { .pme_name = "PM_VSU1_DENORM", .pme_code = 0xa0ae, .pme_short_desc = "FPU denorm operand", .pme_long_desc = "VSU1 received denormalized data", }, [ POWER7_PME_PM_MRK_DATA_FROM_L31_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L31_SHR_CYC", .pme_code = 0x20026, .pme_short_desc = "Marked ld latency Data source 0110 (L3.1 S) ", .pme_long_desc = "Marked load latency Data source 0110 (L3.1 S) ", }, [ POWER7_PME_PM_NEST_PAIR0_ADD ] = { .pme_name = "PM_NEST_PAIR0_ADD", .pme_code = 0x10881, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair0 ADD", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair0 ADD", }, [ POWER7_PME_PM_INST_FROM_L3MISS ] = { .pme_name = "PM_INST_FROM_L3MISS", .pme_code = 0x24048, .pme_short_desc = "Instruction fetched missed L3", .pme_long_desc = "An instruction fetch group was fetched from beyond L3. Fetch groups can contain up to 8 instructions.", }, [ POWER7_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x2080, .pme_short_desc = "ee off and external interrupt", .pme_long_desc = "Cycles when an interrupt due to an external exception is pending but external exceptions were masked.", }, [ POWER7_PME_PM_INST_PTEG_FROM_DMEM ] = { .pme_name = "PM_INST_PTEG_FROM_DMEM", .pme_code = 0x2e052, .pme_short_desc = "Instruction PTEG loaded from distant memory", .pme_long_desc = "Instruction PTEG loaded from distant memory", }, [ POWER7_PME_PM_INST_FROM_DL2L3_MOD ] = { .pme_name = "PM_INST_FROM_DL2L3_MOD", .pme_code = 0x3404c, .pme_short_desc = "Instruction fetched from distant L2 or L3 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from an L2 or L3 on a distant module. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x30024, .pme_short_desc = "Overflow from counter 6", .pme_long_desc = "Overflows from PMC6 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_VSU_2FLOP_DOUBLE ] = { .pme_name = "PM_VSU_2FLOP_DOUBLE", .pme_code = 0xa88c, .pme_short_desc = "DP vector version of fmul, fsub, fcmp, fsel, fabs, fnabs, fres ,fsqrte, fneg", .pme_long_desc = "DP vector version of fmul, fsub, fcmp, fsel, fabs, fnabs, fres ,fsqrte, fneg", }, [ POWER7_PME_PM_TLB_MISS ] = { .pme_name = "PM_TLB_MISS", .pme_code = 0x20066, .pme_short_desc = "TLB Miss (I + D)", .pme_long_desc = "Total of Data TLB mises + Instruction TLB misses", }, [ POWER7_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x2000e, .pme_short_desc = "fxu0 busy and fxu1 busy.", .pme_long_desc = "Cycles when both FXU0 and FXU1 are busy.", }, [ POWER7_PME_PM_L2_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2_RCLD_DISP_FAIL_OTHER", .pme_code = 0x26280, .pme_short_desc = " L2 RC load dispatch attempt failed due to other reasons", .pme_long_desc = " L2 RC load dispatch attempt failed due to other reasons", }, [ POWER7_PME_PM_LSU_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU_REJECT_LMQ_FULL", .pme_code = 0xc8a4, .pme_short_desc = "Reject: LMQ Full (LHR)", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all the eight entries are full, subsequent load instructions are rejected. Combined unit 0 + 1.", }, [ POWER7_PME_PM_IC_RELOAD_SHR ] = { .pme_name = "PM_IC_RELOAD_SHR", .pme_code = 0x4096, .pme_short_desc = "Reloading line to be shared between the threads", .pme_long_desc = "An Instruction Cache request was made by this thread and the cache line was already in the cache for the other thread. The line is marked valid for all threads.", }, [ POWER7_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x10031, .pme_short_desc = "IDU Marked Instruction", .pme_long_desc = "A group was sampled (marked). The group is called a marked group. One instruction within the group is tagged for detailed monitoring. The sampled instruction is called a marked instructions. Events associated with the marked instruction are annotated with the marked term.", }, [ POWER7_PME_PM_MRK_ST_NEST ] = { .pme_name = "PM_MRK_ST_NEST", .pme_code = 0x20034, .pme_short_desc = "marked store sent to Nest", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ POWER7_PME_PM_VSU1_FSQRT_FDIV ] = { .pme_name = "PM_VSU1_FSQRT_FDIV", .pme_code = 0xa08a, .pme_short_desc = "four flops operation (fdiv,fsqrt,xsdiv,xssqrt) Scalar Instructions only!", .pme_long_desc = "four flops operation (fdiv,fsqrt,xsdiv,xssqrt) Scalar Instructions only!", }, [ POWER7_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0xc0b8, .pme_short_desc = "LS0 Flush: LRQ", .pme_long_desc = "Load Hit Load or Store Hit Load flush. A younger load was flushed from unit 0 because it executed before an older store and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER7_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0xc094, .pme_short_desc = "ls0 Larx Finished", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 ", }, [ POWER7_PME_PM_IBUF_FULL_CYC ] = { .pme_name = "PM_IBUF_FULL_CYC", .pme_code = 0x4084, .pme_short_desc = "Cycles No room in ibuff", .pme_long_desc = "Cycles with the Instruction Buffer was full. The Instruction Buffer is a circular queue of 64 instructions per thread, organized as 16 groups of 4 instructions.", }, [ POWER7_PME_PM_MRK_DATA_FROM_DL2L3_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_SHR_CYC", .pme_code = 0x2002a, .pme_short_desc = "Marked ld latency Data Source 1010 (Distant L2.75/L3.75 S)", .pme_long_desc = "Marked ld latency Data Source 1010 (Distant L2.75/L3.75 S)", }, [ POWER7_PME_PM_LSU_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_LSU_DC_PREF_STREAM_ALLOC", .pme_code = 0xd8a8, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "D cache new prefetch stream allocated", }, [ POWER7_PME_PM_GRP_MRK_CYC ] = { .pme_name = "PM_GRP_MRK_CYC", .pme_code = 0x10030, .pme_short_desc = "cycles IDU marked instruction before dispatch", .pme_long_desc = "cycles IDU marked instruction before dispatch", }, [ POWER7_PME_PM_MRK_DATA_FROM_RL2L3_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_SHR_CYC", .pme_code = 0x20028, .pme_short_desc = "Marked ld latency Data Source 1000 (Remote L2.5/L3.5 S)", .pme_long_desc = "Marked load latency Data Source 1000 (Remote L2.5/L3.5 S)", }, [ POWER7_PME_PM_L2_GLOB_GUESS_CORRECT ] = { .pme_name = "PM_L2_GLOB_GUESS_CORRECT", .pme_code = 0x16482, .pme_short_desc = "L2 guess glb and guess was correct (ie data remote)", .pme_long_desc = "L2 guess glb and guess was correct (ie data remote)", }, [ POWER7_PME_PM_LSU_REJECT_LHS ] = { .pme_name = "PM_LSU_REJECT_LHS", .pme_code = 0xc8ac, .pme_short_desc = "Reject: Load Hit Store", .pme_long_desc = "The Load Store Unit rejected a load load instruction that had an address overlap with an older store in the store queue. The store must be committed and de-allocated from the Store Queue before the load can execute successfully. Combined Unit 0 + 1", }, [ POWER7_PME_PM_MRK_DATA_FROM_LMEM ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM", .pme_code = 0x3d04a, .pme_short_desc = "Marked data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to the same module this proccessor is located on.", }, [ POWER7_PME_PM_INST_PTEG_FROM_L3 ] = { .pme_name = "PM_INST_PTEG_FROM_L3", .pme_code = 0x2e050, .pme_short_desc = "Instruction PTEG loaded from L3", .pme_long_desc = "Instruction PTEG loaded from L3", }, [ POWER7_PME_PM_FREQ_DOWN ] = { .pme_name = "PM_FREQ_DOWN", .pme_code = 0x3000c, .pme_short_desc = "Frequency is being slewed down due to Power Management", .pme_long_desc = "Processor frequency was slowed down due to power management", }, [ POWER7_PME_PM_PB_RETRY_NODE_PUMP ] = { .pme_name = "PM_PB_RETRY_NODE_PUMP", .pme_code = 0x30081, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair2 Bit0", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair2 Bit0", }, [ POWER7_PME_PM_INST_FROM_RL2L3_SHR ] = { .pme_name = "PM_INST_FROM_RL2L3_SHR", .pme_code = 0x1404c, .pme_short_desc = "Instruction fetched from remote L2 or L3 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L2 or L3 on a remote module. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_MRK_INST_ISSUED ] = { .pme_name = "PM_MRK_INST_ISSUED", .pme_code = 0x10032, .pme_short_desc = "Marked instruction issued", .pme_long_desc = "A marked instruction was issued to an execution unit.", }, [ POWER7_PME_PM_PTEG_FROM_L3MISS ] = { .pme_name = "PM_PTEG_FROM_L3MISS", .pme_code = 0x2c058, .pme_short_desc = "PTEG loaded from L3 miss", .pme_long_desc = " Page Table Entry was loaded into the ERAT from beyond the L3 due to a demand load or store.", }, [ POWER7_PME_PM_RUN_PURR ] = { .pme_name = "PM_RUN_PURR", .pme_code = 0x400f4, .pme_short_desc = "Run_PURR", .pme_long_desc = "The Processor Utilization of Resources Register was incremented while the run latch was set. The PURR registers will be incremented roughly in the ratio in which the instructions are dispatched from the two threads. ", }, [ POWER7_PME_PM_MRK_GRP_IC_MISS ] = { .pme_name = "PM_MRK_GRP_IC_MISS", .pme_code = 0x40038, .pme_short_desc = "Marked group experienced I cache miss", .pme_long_desc = "A group containing a marked (sampled) instruction experienced an instruction cache miss.", }, [ POWER7_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x1d048, .pme_short_desc = "Marked data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a marked load.", }, [ POWER7_PME_PM_CMPLU_STALL_DCACHE_MISS ] = { .pme_name = "PM_CMPLU_STALL_DCACHE_MISS", .pme_code = 0x20016, .pme_short_desc = "Completion stall caused by D cache miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a Data Cache Miss. Data Cache Miss has higher priority than any other Load/Store delay, so if an instruction encounters multiple delays only the Data Cache Miss will be reported and the entire delay period will be charged to Data Cache Miss. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER7_PME_PM_PTEG_FROM_RL2L3_SHR ] = { .pme_name = "PM_PTEG_FROM_RL2L3_SHR", .pme_code = 0x2c054, .pme_short_desc = "PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "A Page Table Entry was loaded into the ERAT with shared (T or SL) data from an L2 or L3 on a remote module due to a demand load or store.", }, [ POWER7_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0xc8b8, .pme_short_desc = "Flush: LRQ", .pme_long_desc = "Load Hit Load or Store Hit Load flush. A younger load was flushed because it executed before an older store and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte. Combined Unit 0 + 1.", }, [ POWER7_PME_PM_MRK_DERAT_MISS_64K ] = { .pme_name = "PM_MRK_DERAT_MISS_64K", .pme_code = 0x2d05c, .pme_short_desc = "Marked DERAT misses for 64K page", .pme_long_desc = "A marked data request (load or store) missed the ERAT for 64K page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_INST_PTEG_FROM_DL2L3_MOD ] = { .pme_name = "PM_INST_PTEG_FROM_DL2L3_MOD", .pme_code = 0x4e054, .pme_short_desc = "Instruction PTEG loaded from distant L2 or L3 modified", .pme_long_desc = "Instruction PTEG loaded from distant L2 or L3 modified", }, [ POWER7_PME_PM_L2_ST_MISS ] = { .pme_name = "PM_L2_ST_MISS", .pme_code = 0x26082, .pme_short_desc = "Data Store Miss", .pme_long_desc = "Data Store Miss", }, [ POWER7_PME_PM_LWSYNC ] = { .pme_name = "PM_LWSYNC", .pme_code = 0xd094, .pme_short_desc = "lwsync count (easier to use than IMC)", .pme_long_desc = "lwsync count (easier to use than IMC)", }, [ POWER7_PME_PM_LSU0_DC_PREF_STREAM_CONFIRM_STRIDE ] = { .pme_name = "PM_LSU0_DC_PREF_STREAM_CONFIRM_STRIDE", .pme_code = 0xd0bc, .pme_short_desc = "LS0 Dcache Strided prefetch stream confirmed", .pme_long_desc = "LS0 Dcache Strided prefetch stream confirmed", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L21_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_L21_SHR", .pme_code = 0x4d056, .pme_short_desc = "Marked PTEG loaded from another L2 on same chip shared", .pme_long_desc = "Marked PTEG loaded from another L2 on same chip shared", }, [ POWER7_PME_PM_MRK_LSU_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_LRQ", .pme_code = 0xd088, .pme_short_desc = "Flush: (marked) LRQ", .pme_long_desc = "Load Hit Load or Store Hit Load flush. A marked load was flushed because it executed before an older store and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER7_PME_PM_INST_IMC_MATCH_CMPL ] = { .pme_name = "PM_INST_IMC_MATCH_CMPL", .pme_code = 0x100f0, .pme_short_desc = "IMC Match Count", .pme_long_desc = "Number of instructions resulting from the marked instructions expansion that completed.", }, [ POWER7_PME_PM_NEST_PAIR3_AND ] = { .pme_name = "PM_NEST_PAIR3_AND", .pme_code = 0x40883, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair3 AND", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair3 AND", }, [ POWER7_PME_PM_PB_RETRY_SYS_PUMP ] = { .pme_name = "PM_PB_RETRY_SYS_PUMP", .pme_code = 0x40081, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair3 Bit0", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair3 Bit0", }, [ POWER7_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x30030, .pme_short_desc = "marked instr finish any unit ", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER7_PME_PM_MRK_PTEG_FROM_DL2L3_SHR ] = { .pme_name = "PM_MRK_PTEG_FROM_DL2L3_SHR", .pme_code = 0x3d054, .pme_short_desc = "Marked PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from memory attached to a different module than this proccessor is located on due to a marked load or store.", }, [ POWER7_PME_PM_INST_FROM_L31_MOD ] = { .pme_name = "PM_INST_FROM_L31_MOD", .pme_code = 0x14044, .pme_short_desc = "Instruction fetched from another L3 on same chip modified", .pme_long_desc = "Instruction fetched from another L3 on same chip modified", }, [ POWER7_PME_PM_MRK_DTLB_MISS_64K ] = { .pme_name = "PM_MRK_DTLB_MISS_64K", .pme_code = 0x3d05e, .pme_short_desc = "Marked Data TLB misses for 64K page", .pme_long_desc = "Data TLB references to 64KB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_LSU_FIN ] = { .pme_name = "PM_LSU_FIN", .pme_code = 0x30066, .pme_short_desc = "LSU Finished an instruction (up to 2 per cycle)", .pme_long_desc = "LSU Finished an instruction (up to 2 per cycle)", }, [ POWER7_PME_PM_MRK_LSU_REJECT ] = { .pme_name = "PM_MRK_LSU_REJECT", .pme_code = 0x40064, .pme_short_desc = "LSU marked reject (up to 2 per cycle)", .pme_long_desc = "LSU marked reject (up to 2 per cycle)", }, [ POWER7_PME_PM_L2_CO_FAIL_BUSY ] = { .pme_name = "PM_L2_CO_FAIL_BUSY", .pme_code = 0x16382, .pme_short_desc = " L2 RC Cast Out dispatch attempt failed due to all CO machines busy", .pme_long_desc = " L2 RC Cast Out dispatch attempt failed due to all CO machines busy", }, [ POWER7_PME_PM_MEM0_WQ_DISP ] = { .pme_name = "PM_MEM0_WQ_DISP", .pme_code = 0x40083, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair3 Bit1", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair3 Bit1", }, [ POWER7_PME_PM_DATA_FROM_L31_MOD ] = { .pme_name = "PM_DATA_FROM_L31_MOD", .pme_code = 0x1c044, .pme_short_desc = "Data loaded from another L3 on same chip modified", .pme_long_desc = "Data loaded from another L3 on same chip modified", }, [ POWER7_PME_PM_THERMAL_WARN ] = { .pme_name = "PM_THERMAL_WARN", .pme_code = 0x10016, .pme_short_desc = "Processor in Thermal Warning", .pme_long_desc = "Processor in Thermal Warning", }, [ POWER7_PME_PM_VSU0_4FLOP ] = { .pme_name = "PM_VSU0_4FLOP", .pme_code = 0xa09c, .pme_short_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", .pme_long_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", }, [ POWER7_PME_PM_BR_MPRED_CCACHE ] = { .pme_name = "PM_BR_MPRED_CCACHE", .pme_code = 0x40a4, .pme_short_desc = "Branch Mispredict due to Count Cache prediction", .pme_long_desc = "A branch instruction target was incorrectly predicted by the ccount cache. This will result in a branch redirect flush if not overfidden by a flush of an older instruction.", }, [ POWER7_PME_PM_CMPLU_STALL_IFU ] = { .pme_name = "PM_CMPLU_STALL_IFU", .pme_code = 0x4004c, .pme_short_desc = "Completion stall due to IFU ", .pme_long_desc = "Completion stall due to IFU ", }, [ POWER7_PME_PM_L1_DEMAND_WRITE ] = { .pme_name = "PM_L1_DEMAND_WRITE", .pme_code = 0x408c, .pme_short_desc = "Instruction Demand sectors wriittent into IL1", .pme_long_desc = "Instruction Demand sectors wriittent into IL1", }, [ POWER7_PME_PM_FLUSH_BR_MPRED ] = { .pme_name = "PM_FLUSH_BR_MPRED", .pme_code = 0x2084, .pme_short_desc = "Flush caused by branch mispredict", .pme_long_desc = "A flush was caused by a branch mispredict.", }, [ POWER7_PME_PM_MRK_DTLB_MISS_16G ] = { .pme_name = "PM_MRK_DTLB_MISS_16G", .pme_code = 0x1d05e, .pme_short_desc = "Marked Data TLB misses for 16G page", .pme_long_desc = "Data TLB references to 16GB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_MRK_PTEG_FROM_DMEM ] = { .pme_name = "PM_MRK_PTEG_FROM_DMEM", .pme_code = 0x2d052, .pme_short_desc = "Marked PTEG loaded from distant memory", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from memory attached to a different module than this proccessor is located on due to a marked load or store.", }, [ POWER7_PME_PM_L2_RCST_DISP ] = { .pme_name = "PM_L2_RCST_DISP", .pme_code = 0x36280, .pme_short_desc = " L2 RC store dispatch attempt", .pme_long_desc = " L2 RC store dispatch attempt", }, [ POWER7_PME_PM_CMPLU_STALL ] = { .pme_name = "PM_CMPLU_STALL", .pme_code = 0x4000a, .pme_short_desc = "No groups completed, GCT not empty", .pme_long_desc = "No groups completed, GCT not empty", }, [ POWER7_PME_PM_LSU_PARTIAL_CDF ] = { .pme_name = "PM_LSU_PARTIAL_CDF", .pme_code = 0xc0aa, .pme_short_desc = "A partial cacheline was returned from the L3", .pme_long_desc = "A partial cacheline was returned from the L3", }, [ POWER7_PME_PM_DISP_CLB_HELD_SB ] = { .pme_name = "PM_DISP_CLB_HELD_SB", .pme_code = 0x20a8, .pme_short_desc = "Dispatch/CLB Hold: Scoreboard", .pme_long_desc = "Dispatch/CLB Hold: Scoreboard", }, [ POWER7_PME_PM_VSU0_FMA_DOUBLE ] = { .pme_name = "PM_VSU0_FMA_DOUBLE", .pme_code = 0xa090, .pme_short_desc = "four flop DP vector operations (xvmadddp, xvnmadddp, xvmsubdp, xvmsubdp)", .pme_long_desc = "four flop DP vector operations (xvmadddp, xvnmadddp, xvmsubdp, xvmsubdp)", }, [ POWER7_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x3000e, .pme_short_desc = "fxu0 busy and fxu1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ POWER7_PME_PM_IC_DEMAND_CYC ] = { .pme_name = "PM_IC_DEMAND_CYC", .pme_code = 0x10018, .pme_short_desc = "Cycles when a demand ifetch was pending", .pme_long_desc = "Cycles when a demand ifetch was pending", }, [ POWER7_PME_PM_MRK_DATA_FROM_L21_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L21_SHR", .pme_code = 0x3d04e, .pme_short_desc = "Marked data loaded from another L2 on same chip shared", .pme_long_desc = "Marked data loaded from another L2 on same chip shared", }, [ POWER7_PME_PM_MRK_LSU_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU_FLUSH_UST", .pme_code = 0xd086, .pme_short_desc = "Flush: (marked) Unaligned Store", .pme_long_desc = "A marked store was flushed because it was unaligned", }, [ POWER7_PME_PM_INST_PTEG_FROM_L3MISS ] = { .pme_name = "PM_INST_PTEG_FROM_L3MISS", .pme_code = 0x2e058, .pme_short_desc = "Instruction PTEG loaded from L3 miss", .pme_long_desc = "Instruction PTEG loaded from L3 miss", }, [ POWER7_PME_PM_VSU_DENORM ] = { .pme_name = "PM_VSU_DENORM", .pme_code = 0xa8ac, .pme_short_desc = "Vector or Scalar denorm operand", .pme_long_desc = "Vector or Scalar denorm operand", }, [ POWER7_PME_PM_MRK_LSU_PARTIAL_CDF ] = { .pme_name = "PM_MRK_LSU_PARTIAL_CDF", .pme_code = 0xd080, .pme_short_desc = "A partial cacheline was returned from the L3 for a marked load", .pme_long_desc = "A partial cacheline was returned from the L3 for a marked load", }, [ POWER7_PME_PM_INST_FROM_L21_SHR ] = { .pme_name = "PM_INST_FROM_L21_SHR", .pme_code = 0x3404e, .pme_short_desc = "Instruction fetched from another L2 on same chip shared", .pme_long_desc = "Instruction fetched from another L2 on same chip shared", }, [ POWER7_PME_PM_IC_PREF_WRITE ] = { .pme_name = "PM_IC_PREF_WRITE", .pme_code = 0x408e, .pme_short_desc = "Instruction prefetch written into IL1", .pme_long_desc = "Number of Instruction Cache entries written because of prefetch. Prefetch entries are marked least recently used and are candidates for eviction if they are not needed to satify a demand fetch.", }, [ POWER7_PME_PM_BR_PRED ] = { .pme_name = "PM_BR_PRED", .pme_code = 0x409c, .pme_short_desc = "Branch Predictions made", .pme_long_desc = "A branch prediction was made. This could have been a target prediction, a condition prediction, or both", }, [ POWER7_PME_PM_INST_FROM_DMEM ] = { .pme_name = "PM_INST_FROM_DMEM", .pme_code = 0x1404a, .pme_short_desc = "Instruction fetched from distant memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to a distant module. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_IC_PREF_CANCEL_ALL ] = { .pme_name = "PM_IC_PREF_CANCEL_ALL", .pme_code = 0x4890, .pme_short_desc = "Prefetch Canceled due to page boundary or icache hit", .pme_long_desc = "Prefetch Canceled due to page boundary or icache hit", }, [ POWER7_PME_PM_LSU_DC_PREF_STREAM_CONFIRM ] = { .pme_name = "PM_LSU_DC_PREF_STREAM_CONFIRM", .pme_code = 0xd8b4, .pme_short_desc = "Dcache new prefetch stream confirmed", .pme_long_desc = "Dcache new prefetch stream confirmed", }, [ POWER7_PME_PM_MRK_LSU_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_SRQ", .pme_code = 0xd08a, .pme_short_desc = "Flush: (marked) SRQ", .pme_long_desc = "Load Hit Store flush. A marked load was flushed because it hits (overlaps) an older store that is already in the SRQ or in the same group. If the real addresses match but the effective addresses do not, an alias condition exists that prevents store forwarding. If the load and store are in the same group the load must be flushed to separate the two instructions. ", }, [ POWER7_PME_PM_MRK_FIN_STALL_CYC ] = { .pme_name = "PM_MRK_FIN_STALL_CYC", .pme_code = 0x1003c, .pme_short_desc = "Marked instruction Finish Stall cycles (marked finish after NTC) ", .pme_long_desc = "Marked instruction Finish Stall cycles (marked finish after NTC) ", }, [ POWER7_PME_PM_L2_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2_RCST_DISP_FAIL_OTHER", .pme_code = 0x46280, .pme_short_desc = " L2 RC store dispatch attempt failed due to other reasons", .pme_long_desc = " L2 RC store dispatch attempt failed due to other reasons", }, [ POWER7_PME_PM_VSU1_DD_ISSUED ] = { .pme_name = "PM_VSU1_DD_ISSUED", .pme_code = 0xb098, .pme_short_desc = "64BIT Decimal Issued on Pipe1", .pme_long_desc = "64BIT Decimal Issued on Pipe1", }, [ POWER7_PME_PM_PTEG_FROM_L31_SHR ] = { .pme_name = "PM_PTEG_FROM_L31_SHR", .pme_code = 0x2c056, .pme_short_desc = "PTEG loaded from another L3 on same chip shared", .pme_long_desc = "PTEG loaded from another L3 on same chip shared", }, [ POWER7_PME_PM_DATA_FROM_L21_SHR ] = { .pme_name = "PM_DATA_FROM_L21_SHR", .pme_code = 0x3c04e, .pme_short_desc = "Data loaded from another L2 on same chip shared", .pme_long_desc = "Data loaded from another L2 on same chip shared", }, [ POWER7_PME_PM_LSU0_NCLD ] = { .pme_name = "PM_LSU0_NCLD", .pme_code = 0xc08c, .pme_short_desc = "LS0 Non-cachable Loads counted at finish", .pme_long_desc = "A non-cacheable load was executed by unit 0.", }, [ POWER7_PME_PM_VSU1_4FLOP ] = { .pme_name = "PM_VSU1_4FLOP", .pme_code = 0xa09e, .pme_short_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", .pme_long_desc = "four flops operation (scalar fdiv, fsqrt; DP vector version of fmadd, fnmadd, fmsub, fnmsub; SP vector versions of single flop instructions)", }, [ POWER7_PME_PM_VSU1_8FLOP ] = { .pme_name = "PM_VSU1_8FLOP", .pme_code = 0xa0a2, .pme_short_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", .pme_long_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", }, [ POWER7_PME_PM_VSU_8FLOP ] = { .pme_name = "PM_VSU_8FLOP", .pme_code = 0xa8a0, .pme_short_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", .pme_long_desc = "eight flops operation (DP vector versions of fdiv,fsqrt and SP vector versions of fmadd,fnmadd,fmsub,fnmsub) ", }, [ POWER7_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x2003e, .pme_short_desc = "LSU empty (lmq and srq empty)", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER7_PME_PM_DTLB_MISS_64K ] = { .pme_name = "PM_DTLB_MISS_64K", .pme_code = 0x3c05e, .pme_short_desc = "Data TLB miss for 64K page", .pme_long_desc = "Data TLB references to 64KB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_THRD_CONC_RUN_INST ] = { .pme_name = "PM_THRD_CONC_RUN_INST", .pme_code = 0x300f4, .pme_short_desc = "Concurrent Run Instructions", .pme_long_desc = "Instructions completed by this thread when both threads had their run latches set.", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L2 ] = { .pme_name = "PM_MRK_PTEG_FROM_L2", .pme_code = 0x1d050, .pme_short_desc = "Marked PTEG loaded from L2", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from the local L2 due to a marked load or store.", }, [ POWER7_PME_PM_PB_SYS_PUMP ] = { .pme_name = "PM_PB_SYS_PUMP", .pme_code = 0x20081, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair1 Bit0", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair1 Bit0", }, [ POWER7_PME_PM_VSU_FIN ] = { .pme_name = "PM_VSU_FIN", .pme_code = 0xa8bc, .pme_short_desc = "VSU0 Finished an instruction", .pme_long_desc = "VSU0 Finished an instruction", }, [ POWER7_PME_PM_MRK_DATA_FROM_L31_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L31_MOD", .pme_code = 0x1d044, .pme_short_desc = "Marked data loaded from another L3 on same chip modified", .pme_long_desc = "Marked data loaded from another L3 on same chip modified", }, [ POWER7_PME_PM_THRD_PRIO_0_1_CYC ] = { .pme_name = "PM_THRD_PRIO_0_1_CYC", .pme_code = 0x40b0, .pme_short_desc = " Cycles thread running at priority level 0 or 1", .pme_long_desc = " Cycles thread running at priority level 0 or 1", }, [ POWER7_PME_PM_DERAT_MISS_64K ] = { .pme_name = "PM_DERAT_MISS_64K", .pme_code = 0x2c05c, .pme_short_desc = "DERAT misses for 64K page", .pme_long_desc = "A data request (load or store) missed the ERAT for 64K page and resulted in an ERAT reload.", }, [ POWER7_PME_PM_PMC2_REWIND ] = { .pme_name = "PM_PMC2_REWIND", .pme_code = 0x30020, .pme_short_desc = "PMC2 Rewind Event (did not match condition)", .pme_long_desc = "PMC2 was counting speculatively. The speculative condition was not met and the counter was restored to its previous value.", }, [ POWER7_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x14040, .pme_short_desc = "Instruction fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ POWER7_PME_PM_GRP_BR_MPRED_NONSPEC ] = { .pme_name = "PM_GRP_BR_MPRED_NONSPEC", .pme_code = 0x1000a, .pme_short_desc = "Group experienced non-speculative branch redirect", .pme_long_desc = "Group experienced non-speculative branch redirect", }, [ POWER7_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x200f2, .pme_short_desc = "# PPC Dispatched", .pme_long_desc = "Number of PowerPC instructions successfully dispatched.", }, [ POWER7_PME_PM_MEM0_RD_CANCEL_TOTAL ] = { .pme_name = "PM_MEM0_RD_CANCEL_TOTAL", .pme_code = 0x30083, .pme_short_desc = " Nest events (MC0/MC1/PB/GX), Pair2 Bit1", .pme_long_desc = " Nest events (MC0/MC1/PB/GX), Pair2 Bit1", }, [ POWER7_PME_PM_LSU0_DC_PREF_STREAM_CONFIRM ] = { .pme_name = "PM_LSU0_DC_PREF_STREAM_CONFIRM", .pme_code = 0xd0b4, .pme_short_desc = "LS0 Dcache prefetch stream confirmed", .pme_long_desc = "LS0 Dcache prefetch stream confirmed", }, [ POWER7_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0x300f6, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid,the data cache has been reloaded. Prior to POWER5+ this included data cache reloads due to prefetch activity. With POWER5+ this now only includes reloads due to demand loads.", }, [ POWER7_PME_PM_VSU_SCALAR_DOUBLE_ISSUED ] = { .pme_name = "PM_VSU_SCALAR_DOUBLE_ISSUED", .pme_code = 0xb888, .pme_short_desc = "Double Precision scalar instruction issued on Pipe0", .pme_long_desc = "Double Precision scalar instruction issued on Pipe0", }, [ POWER7_PME_PM_L3_PREF_HIT ] = { .pme_name = "PM_L3_PREF_HIT", .pme_code = 0x3f080, .pme_short_desc = "L3 Prefetch Directory Hit", .pme_long_desc = "L3 Prefetch Directory Hit", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L31_MOD ] = { .pme_name = "PM_MRK_PTEG_FROM_L31_MOD", .pme_code = 0x1d054, .pme_short_desc = "Marked PTEG loaded from another L3 on same chip modified", .pme_long_desc = "Marked PTEG loaded from another L3 on same chip modified", }, [ POWER7_PME_PM_CMPLU_STALL_STORE ] = { .pme_name = "PM_CMPLU_STALL_STORE", .pme_code = 0x2004a, .pme_short_desc = "Completion stall due to store instruction", .pme_long_desc = "Completion stall due to store instruction", }, [ POWER7_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x20038, .pme_short_desc = "fxu marked instr finish", .pme_long_desc = "One of the Fixed Point Units finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER7_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x10010, .pme_short_desc = "Overflow from counter 4", .pme_long_desc = "Overflows from PMC4 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER7_PME_PM_MRK_PTEG_FROM_L3 ] = { .pme_name = "PM_MRK_PTEG_FROM_L3", .pme_code = 0x2d050, .pme_short_desc = "Marked PTEG loaded from L3", .pme_long_desc = "A Page Table Entry was loaded into the ERAT from the local L3 due to a marked load or store.", }, [ POWER7_PME_PM_LSU0_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU0_LMQ_LHR_MERGE", .pme_code = 0xd098, .pme_short_desc = "LS0 Load Merged with another cacheline request", .pme_long_desc = "LS0 Load Merged with another cacheline request", }, [ POWER7_PME_PM_BTAC_HIT ] = { .pme_name = "PM_BTAC_HIT", .pme_code = 0x508a, .pme_short_desc = "BTAC Correct Prediction", .pme_long_desc = "BTAC Correct Prediction", }, [ POWER7_PME_PM_L3_RD_BUSY ] = { .pme_name = "PM_L3_RD_BUSY", .pme_code = 0x4f082, .pme_short_desc = "Rd machines busy >= threshold (2,4,6,8)", .pme_long_desc = "Rd machines busy >= threshold (2,4,6,8)", }, [ POWER7_PME_PM_LSU0_L1_SW_PREF ] = { .pme_name = "PM_LSU0_L1_SW_PREF", .pme_code = 0xc09c, .pme_short_desc = "LSU0 Software L1 Prefetches, including SW Transient Prefetches", .pme_long_desc = "LSU0 Software L1 Prefetches, including SW Transient Prefetches", }, [ POWER7_PME_PM_INST_FROM_L2MISS ] = { .pme_name = "PM_INST_FROM_L2MISS", .pme_code = 0x44048, .pme_short_desc = "Instruction fetched missed L2", .pme_long_desc = "An instruction fetch group was fetched from beyond the local L2.", }, [ POWER7_PME_PM_LSU0_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_LSU0_DC_PREF_STREAM_ALLOC", .pme_code = 0xd0a8, .pme_short_desc = "LS0 D cache new prefetch stream allocated", .pme_long_desc = "LS0 D cache new prefetch stream allocated", }, [ POWER7_PME_PM_L2_ST ] = { .pme_name = "PM_L2_ST", .pme_code = 0x16082, .pme_short_desc = "Data Store Count", .pme_long_desc = "Data Store Count", }, [ POWER7_PME_PM_VSU0_DENORM ] = { .pme_name = "PM_VSU0_DENORM", .pme_code = 0xa0ac, .pme_short_desc = "FPU denorm operand", .pme_long_desc = "VSU0 received denormalized data", }, [ POWER7_PME_PM_MRK_DATA_FROM_DL2L3_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_SHR", .pme_code = 0x3d044, .pme_short_desc = "Marked data loaded from distant L2 or L3 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from an L2 or L3 on a distant module due to a marked load.", }, [ POWER7_PME_PM_BR_PRED_CR_TA ] = { .pme_name = "PM_BR_PRED_CR_TA", .pme_code = 0x48aa, .pme_short_desc = "Branch predict - taken/not taken and target", .pme_long_desc = "Both the condition (taken or not taken) and the target address of a branch instruction was predicted.", }, [ POWER7_PME_PM_VSU0_FCONV ] = { .pme_name = "PM_VSU0_FCONV", .pme_code = 0xa0b0, .pme_short_desc = "Convert instruction executed", .pme_long_desc = "Convert instruction executed", }, [ POWER7_PME_PM_MRK_LSU_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU_FLUSH_ULD", .pme_code = 0xd084, .pme_short_desc = "Flush: (marked) Unaligned Load", .pme_long_desc = "A marked load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER7_PME_PM_BTAC_MISS ] = { .pme_name = "PM_BTAC_MISS", .pme_code = 0x5088, .pme_short_desc = "BTAC Mispredicted", .pme_long_desc = "BTAC Mispredicted", }, [ POWER7_PME_PM_MRK_LD_MISS_EXPOSED_CYC_COUNT ] = { .pme_name = "PM_MRK_LD_MISS_EXPOSED_CYC_COUNT", .pme_code = 0x1003f, .pme_short_desc = "Marked Load exposed Miss (use edge detect to count #)", .pme_long_desc = "Marked Load exposed Miss (use edge detect to count #)", }, [ POWER7_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1d040, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a marked load.", }, [ POWER7_PME_PM_LSU_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_LSU_DCACHE_RELOAD_VALID", .pme_code = 0xd0a2, .pme_short_desc = "count per sector of lines reloaded in L1 (demand + prefetch) ", .pme_long_desc = "count per sector of lines reloaded in L1 (demand + prefetch) ", }, [ POWER7_PME_PM_VSU_FMA ] = { .pme_name = "PM_VSU_FMA", .pme_code = 0xa884, .pme_short_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub) Scalar instructions only!", .pme_long_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub) Scalar instructions only!", }, [ POWER7_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0xc0bc, .pme_short_desc = "LS0 Flush: SRQ", .pme_long_desc = "Load Hit Store flush. A younger load was flushed from unit 0 because it hits (overlaps) an older store that is already in the SRQ or in the same group. If the real addresses match but the effective addresses do not, an alias condition exists that prevents store forwarding. If the load and store are in the same group the load must be flushed to separate the two instructions. ", }, [ POWER7_PME_PM_LSU1_L1_PREF ] = { .pme_name = "PM_LSU1_L1_PREF", .pme_code = 0xd0ba, .pme_short_desc = " LS1 L1 cache data prefetches", .pme_long_desc = " LS1 L1 cache data prefetches", }, [ POWER7_PME_PM_IOPS_CMPL ] = { .pme_name = "PM_IOPS_CMPL", .pme_code = 0x10014, .pme_short_desc = "Internal Operations completed", .pme_long_desc = "Number of internal operations that completed.", }, [ POWER7_PME_PM_L2_SYS_PUMP ] = { .pme_name = "PM_L2_SYS_PUMP", .pme_code = 0x36482, .pme_short_desc = "RC req that was a global (aka system) pump attempt", .pme_long_desc = "RC req that was a global (aka system) pump attempt", }, [ POWER7_PME_PM_L2_RCLD_BUSY_RC_FULL ] = { .pme_name = "PM_L2_RCLD_BUSY_RC_FULL", .pme_code = 0x46282, .pme_short_desc = " L2 activated Busy to the core for loads due to all RC full", .pme_long_desc = " L2 activated Busy to the core for loads due to all RC full", }, [ POWER7_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0xd0a1, .pme_short_desc = "Slot 0 of LMQ valid", .pme_long_desc = "Slot 0 of LMQ valid", }, [ POWER7_PME_PM_FLUSH_DISP_SYNC ] = { .pme_name = "PM_FLUSH_DISP_SYNC", .pme_code = 0x2088, .pme_short_desc = "Dispatch Flush: Sync", .pme_long_desc = "Dispatch Flush: Sync", }, [ POWER7_PME_PM_MRK_DATA_FROM_DL2L3_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_DL2L3_MOD_CYC", .pme_code = 0x4002a, .pme_short_desc = "Marked ld latency Data source 1011 (L2.75/L3.75 M different 4 chip node)", .pme_long_desc = "Marked ld latency Data source 1011 (L2.75/L3.75 M different 4 chip node)", }, [ POWER7_PME_PM_L2_IC_INV ] = { .pme_name = "PM_L2_IC_INV", .pme_code = 0x26180, .pme_short_desc = "Icache Invalidates from L2 ", .pme_long_desc = "Icache Invalidates from L2 ", }, [ POWER7_PME_PM_MRK_DATA_FROM_L21_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L21_MOD_CYC", .pme_code = 0x40024, .pme_short_desc = "Marked ld latency Data source 0101 (L2.1 M same chip)", .pme_long_desc = "Marked ld latency Data source 0101 (L2.1 M same chip)", }, [ POWER7_PME_PM_L3_PREF_LDST ] = { .pme_name = "PM_L3_PREF_LDST", .pme_code = 0xd8ac, .pme_short_desc = "L3 cache prefetches LD + ST", .pme_long_desc = "L3 cache prefetches LD + ST", }, [ POWER7_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x40008, .pme_short_desc = "ALL threads srq empty", .pme_long_desc = "The Store Request Queue is empty", }, [ POWER7_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0xd0a0, .pme_short_desc = "Slot 0 of LMQ valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the LRQ is split between the two threads (16 entries each).", }, [ POWER7_PME_PM_FLUSH_PARTIAL ] = { .pme_name = "PM_FLUSH_PARTIAL", .pme_code = 0x2086, .pme_short_desc = "Partial flush", .pme_long_desc = "Partial flush", }, [ POWER7_PME_PM_VSU1_FMA_DOUBLE ] = { .pme_name = "PM_VSU1_FMA_DOUBLE", .pme_code = 0xa092, .pme_short_desc = "four flop DP vector operations (xvmadddp, xvnmadddp, xvmsubdp, xvmsubdp)", .pme_long_desc = "four flop DP vector operations (xvmadddp, xvnmadddp, xvmsubdp, xvmsubdp)", }, [ POWER7_PME_PM_1PLUS_PPC_DISP ] = { .pme_name = "PM_1PLUS_PPC_DISP", .pme_code = 0x400f2, .pme_short_desc = "Cycles at least one Instr Dispatched", .pme_long_desc = "A group containing at least one PPC instruction was dispatched. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER7_PME_PM_DATA_FROM_L2MISS ] = { .pme_name = "PM_DATA_FROM_L2MISS", .pme_code = 0x200fe, .pme_short_desc = "Demand LD - L2 Miss (not L2 hit)", .pme_long_desc = "The processor's Data Cache was reloaded but not from the local L2.", }, [ POWER7_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Counter OFF", .pme_long_desc = "The counter is suspended (does not count)", }, [ POWER7_PME_PM_VSU0_FMA ] = { .pme_name = "PM_VSU0_FMA", .pme_code = 0xa084, .pme_short_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub, xsmadd, xsnmadd, xsmsub, xsnmsub) Scalar instructions only!", .pme_long_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub, xsmadd, xsnmadd, xsmsub, xsnmsub) Scalar instructions only!", }, [ POWER7_PME_PM_CMPLU_STALL_SCALAR ] = { .pme_name = "PM_CMPLU_STALL_SCALAR", .pme_code = 0x40012, .pme_short_desc = "Completion stall caused by FPU instruction", .pme_long_desc = "Completion stall caused by FPU instruction", }, [ POWER7_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0xc09a, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ POWER7_PME_PM_VSU0_FSQRT_FDIV_DOUBLE ] = { .pme_name = "PM_VSU0_FSQRT_FDIV_DOUBLE", .pme_code = 0xa094, .pme_short_desc = "eight flop DP vector operations (xvfdivdp, xvsqrtdp ", .pme_long_desc = "eight flop DP vector operations (xvfdivdp, xvsqrtdp ", }, [ POWER7_PME_PM_DC_PREF_DST ] = { .pme_name = "PM_DC_PREF_DST", .pme_code = 0xd0b0, .pme_short_desc = "Data Stream Touch", .pme_long_desc = "A prefetch stream was started using the DST instruction.", }, [ POWER7_PME_PM_VSU1_SCAL_SINGLE_ISSUED ] = { .pme_name = "PM_VSU1_SCAL_SINGLE_ISSUED", .pme_code = 0xb086, .pme_short_desc = "Single Precision scalar instruction issued on Pipe1", .pme_long_desc = "Single Precision scalar instruction issued on Pipe1", }, [ POWER7_PME_PM_L3_HIT ] = { .pme_name = "PM_L3_HIT", .pme_code = 0x1f080, .pme_short_desc = "L3 Hits", .pme_long_desc = "L3 Hits", }, [ POWER7_PME_PM_L2_GLOB_GUESS_WRONG ] = { .pme_name = "PM_L2_GLOB_GUESS_WRONG", .pme_code = 0x26482, .pme_short_desc = "L2 guess glb and guess was not correct (ie data local)", .pme_long_desc = "L2 guess glb and guess was not correct (ie data local)", }, [ POWER7_PME_PM_MRK_DFU_FIN ] = { .pme_name = "PM_MRK_DFU_FIN", .pme_code = 0x20032, .pme_short_desc = "Decimal Unit marked Instruction Finish", .pme_long_desc = "The Decimal Floating Point Unit finished a marked instruction.", }, [ POWER7_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x4080, .pme_short_desc = "Instruction fetches from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ POWER7_PME_PM_BRU_FIN ] = { .pme_name = "PM_BRU_FIN", .pme_code = 0x10068, .pme_short_desc = "Branch Instruction Finished ", .pme_long_desc = "The Branch execution unit finished an instruction", }, [ POWER7_PME_PM_IC_DEMAND_REQ ] = { .pme_name = "PM_IC_DEMAND_REQ", .pme_code = 0x4088, .pme_short_desc = "Demand Instruction fetch request", .pme_long_desc = "Demand Instruction fetch request", }, [ POWER7_PME_PM_VSU1_FSQRT_FDIV_DOUBLE ] = { .pme_name = "PM_VSU1_FSQRT_FDIV_DOUBLE", .pme_code = 0xa096, .pme_short_desc = "eight flop DP vector operations (xvfdivdp, xvsqrtdp ", .pme_long_desc = "eight flop DP vector operations (xvfdivdp, xvsqrtdp ", }, [ POWER7_PME_PM_VSU1_FMA ] = { .pme_name = "PM_VSU1_FMA", .pme_code = 0xa086, .pme_short_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub, xsmadd, xsnmadd, xsmsub, xsnmsub) Scalar instructions only!", .pme_long_desc = "two flops operation (fmadd, fnmadd, fmsub, fnmsub, xsmadd, xsnmadd, xsmsub, xsnmsub) Scalar instructions only!", }, [ POWER7_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x20036, .pme_short_desc = "Marked DL1 Demand Miss", .pme_long_desc = "Marked L1 D cache load misses", }, [ POWER7_PME_PM_VSU0_2FLOP_DOUBLE ] = { .pme_name = "PM_VSU0_2FLOP_DOUBLE", .pme_code = 0xa08c, .pme_short_desc = "two flop DP vector operation (xvadddp, xvmuldp, xvsubdp, xvcmpdp, xvseldp, xvabsdp, xvnabsdp, xvredp ,xvsqrtedp, vxnegdp) ", .pme_long_desc = "two flop DP vector operation (xvadddp, xvmuldp, xvsubdp, xvcmpdp, xvseldp, xvabsdp, xvnabsdp, xvredp ,xvsqrtedp, vxnegdp) ", }, [ POWER7_PME_PM_LSU_DC_PREF_STRIDED_STREAM_CONFIRM ] = { .pme_name = "PM_LSU_DC_PREF_STRIDED_STREAM_CONFIRM", .pme_code = 0xd8bc, .pme_short_desc = "Dcache Strided prefetch stream confirmed (software + hardware)", .pme_long_desc = "Dcache Strided prefetch stream confirmed (software + hardware)", }, [ POWER7_PME_PM_INST_PTEG_FROM_L31_SHR ] = { .pme_name = "PM_INST_PTEG_FROM_L31_SHR", .pme_code = 0x2e056, .pme_short_desc = "Instruction PTEG loaded from another L3 on same chip shared", .pme_long_desc = "Instruction PTEG loaded from another L3 on same chip shared", }, [ POWER7_PME_PM_MRK_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_MRK_LSU_REJECT_ERAT_MISS", .pme_code = 0x30064, .pme_short_desc = "LSU marked reject due to ERAT (up to 2 per cycle)", .pme_long_desc = "LSU marked reject due to ERAT (up to 2 per cycle)", }, [ POWER7_PME_PM_MRK_DATA_FROM_L2MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS", .pme_code = 0x4d048, .pme_short_desc = "Marked data loaded missed L2", .pme_long_desc = "DL1 was reloaded from beyond L2 due to a marked demand load.", }, [ POWER7_PME_PM_DATA_FROM_RL2L3_SHR ] = { .pme_name = "PM_DATA_FROM_RL2L3_SHR", .pme_code = 0x1c04c, .pme_short_desc = "Data loaded from remote L2 or L3 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from an L2 or L3 on a remote module due to a demand load", }, [ POWER7_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x14046, .pme_short_desc = "Instruction fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch groups can contain up to 8 instructions", }, [ POWER7_PME_PM_VSU1_SQ ] = { .pme_name = "PM_VSU1_SQ", .pme_code = 0xb09e, .pme_short_desc = "Store Vector Issued on Pipe1", .pme_long_desc = "Store Vector Issued on Pipe1", }, [ POWER7_PME_PM_L2_LD_DISP ] = { .pme_name = "PM_L2_LD_DISP", .pme_code = 0x36180, .pme_short_desc = "All successful load dispatches", .pme_long_desc = "All successful load dispatches", }, [ POWER7_PME_PM_L2_DISP_ALL ] = { .pme_name = "PM_L2_DISP_ALL", .pme_code = 0x46080, .pme_short_desc = "All successful LD/ST dispatches for this thread(i+d)", .pme_long_desc = "All successful LD/ST dispatches for this thread(i+d)", }, [ POWER7_PME_PM_THRD_GRP_CMPL_BOTH_CYC ] = { .pme_name = "PM_THRD_GRP_CMPL_BOTH_CYC", .pme_code = 0x10012, .pme_short_desc = "Cycles group completed by both threads", .pme_long_desc = "Cycles that both threads completed.", }, [ POWER7_PME_PM_VSU_FSQRT_FDIV_DOUBLE ] = { .pme_name = "PM_VSU_FSQRT_FDIV_DOUBLE", .pme_code = 0xa894, .pme_short_desc = "DP vector versions of fdiv,fsqrt ", .pme_long_desc = "DP vector versions of fdiv,fsqrt ", }, [ POWER7_PME_PM_BR_MPRED ] = { .pme_name = "PM_BR_MPRED", .pme_code = 0x400f6, .pme_short_desc = "Number of Branch Mispredicts", .pme_long_desc = "A branch instruction was incorrectly predicted. This could have been a target prediction, a condition prediction, or both", }, [ POWER7_PME_PM_INST_PTEG_FROM_DL2L3_SHR ] = { .pme_name = "PM_INST_PTEG_FROM_DL2L3_SHR", .pme_code = 0x3e054, .pme_short_desc = "Instruction PTEG loaded from remote L2 or L3 shared", .pme_long_desc = "Instruction PTEG loaded from remote L2 or L3 shared", }, [ POWER7_PME_PM_VSU_1FLOP ] = { .pme_name = "PM_VSU_1FLOP", .pme_code = 0xa880, .pme_short_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg) operation finished", .pme_long_desc = "one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg) operation finished", }, [ POWER7_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x2000a, .pme_short_desc = "cycles in hypervisor mode ", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ POWER7_PME_PM_MRK_DATA_FROM_RL2L3_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_RL2L3_SHR", .pme_code = 0x1d04c, .pme_short_desc = "Marked data loaded from remote L2 or L3 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from an L2 or L3 on a remote module due to a marked load", }, [ POWER7_PME_PM_DTLB_MISS_16M ] = { .pme_name = "PM_DTLB_MISS_16M", .pme_code = 0x4c05e, .pme_short_desc = "Data TLB miss for 16M page", .pme_long_desc = "Data TLB references to 16MB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER7_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x40032, .pme_short_desc = "Marked LSU instruction finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER7_PME_PM_LSU1_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU1_LMQ_LHR_MERGE", .pme_code = 0xd09a, .pme_short_desc = "LS1 Load Merge with another cacheline request", .pme_long_desc = "LS1 Load Merge with another cacheline request", }, [ POWER7_PME_PM_IFU_FIN ] = { .pme_name = "PM_IFU_FIN", .pme_code = 0x40066, .pme_short_desc = "IFU Finished a (non-branch) instruction", .pme_long_desc = "The Instruction Fetch Unit finished an instruction", }, [ POWER7_PME_PM_1THRD_CON_RUN_INSTR ] = { .pme_name = "PM_1THRD_CON_RUN_INSTR", .pme_code = 0x30062, .pme_short_desc = "1 thread Concurrent Run Instructions", .pme_long_desc = "1 thread Concurrent Run Instructions", }, [ POWER7_PME_PM_CMPLU_STALL_COUNT ] = { .pme_name = "PM_CMPLU_STALL_COUNT", .pme_code = 0x4000B, .pme_short_desc = "Marked LSU instruction finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER7_PME_PM_MEM0_PB_RD_CL ] = { .pme_name = "PM_MEM0_PB_RD_CL", .pme_code = 0x30083, .pme_short_desc = "Nest events (MC0/MC1/PB/GX), Pair2 Bit1", .pme_long_desc = "Nest events (MC0/MC1/PB/GX), Pair2 Bit1", }, [ POWER7_PME_PM_THRD_1_RUN_CYC ] = { .pme_name = "PM_THRD_1_RUN_CYC", .pme_code = 0x10060, .pme_short_desc = "1 thread in Run Cycles", .pme_long_desc = "At least one thread has set its run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. This event does not respect FCWAIT.", }, [ POWER7_PME_PM_THRD_2_CONC_RUN_INSTR ] = { .pme_name = "PM_THRD_2_CONC_RUN_INSTR", .pme_code = 0x40062, .pme_short_desc = "2 thread Concurrent Run Instructions", .pme_long_desc = "2 thread Concurrent Run Instructions", }, [ POWER7_PME_PM_THRD_2_RUN_CYC ] = { .pme_name = "PM_THRD_2_RUN_CYC", .pme_code = 0x20060, .pme_short_desc = "2 thread in Run Cycles", .pme_long_desc = "2 thread in Run Cycles", }, [ POWER7_PME_PM_THRD_3_CONC_RUN_INST ] = { .pme_name = "PM_THRD_3_CONC_RUN_INST", .pme_code = 0x10062, .pme_short_desc = "3 thread in Run Cycles", .pme_long_desc = "3 thread in Run Cycles", }, [ POWER7_PME_PM_THRD_3_RUN_CYC ] = { .pme_name = "PM_THRD_3_RUN_CYC", .pme_code = 0x30060, .pme_short_desc = "3 thread in Run Cycles", .pme_long_desc = "3 thread in Run Cycles", }, [ POWER7_PME_PM_THRD_4_CONC_RUN_INST ] = { .pme_name = "PM_THRD_4_CONC_RUN_INST", .pme_code = 0x20062, .pme_short_desc = "4 thread in Run Cycles", .pme_long_desc = "4 thread in Run Cycles", }, [ POWER7_PME_PM_THRD_4_RUN_CYC ] = { .pme_name = "PM_THRD_4_RUN_CYC", .pme_code = 0x40060, .pme_short_desc = "4 thread in Run Cycles", .pme_long_desc = "4 thread in Run Cycles", }, }; #endif papi-5.3.0/src/libpfm4/lib/events/power4_events.h0000600003276200002170000024170412247131124021322 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER4_EVENTS_H__ #define __POWER4_EVENTS_H__ /* * File: power4_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define POWER4_PME_PM_MRK_LSU_SRQ_INST_VALID 0 #define POWER4_PME_PM_FPU1_SINGLE 1 #define POWER4_PME_PM_DC_PREF_OUT_STREAMS 2 #define POWER4_PME_PM_FPU0_STALL3 3 #define POWER4_PME_PM_TB_BIT_TRANS 4 #define POWER4_PME_PM_GPR_MAP_FULL_CYC 5 #define POWER4_PME_PM_MRK_ST_CMPL 6 #define POWER4_PME_PM_MRK_LSU_FLUSH_LRQ 7 #define POWER4_PME_PM_FPU0_STF 8 #define POWER4_PME_PM_FPU1_FMA 9 #define POWER4_PME_PM_L2SA_MOD_TAG 10 #define POWER4_PME_PM_MRK_DATA_FROM_L275_SHR 11 #define POWER4_PME_PM_1INST_CLB_CYC 12 #define POWER4_PME_PM_LSU1_FLUSH_ULD 13 #define POWER4_PME_PM_MRK_INST_FIN 14 #define POWER4_PME_PM_MRK_LSU0_FLUSH_UST 15 #define POWER4_PME_PM_FPU_FDIV 16 #define POWER4_PME_PM_LSU_LRQ_S0_ALLOC 17 #define POWER4_PME_PM_FPU0_FULL_CYC 18 #define POWER4_PME_PM_FPU_SINGLE 19 #define POWER4_PME_PM_FPU0_FMA 20 #define POWER4_PME_PM_MRK_LSU1_FLUSH_ULD 21 #define POWER4_PME_PM_LSU1_FLUSH_LRQ 22 #define POWER4_PME_PM_L2SA_ST_HIT 23 #define POWER4_PME_PM_L2SB_SHR_INV 24 #define POWER4_PME_PM_DTLB_MISS 25 #define POWER4_PME_PM_MRK_ST_MISS_L1 26 #define POWER4_PME_PM_EXT_INT 27 #define POWER4_PME_PM_MRK_LSU1_FLUSH_LRQ 28 #define POWER4_PME_PM_MRK_ST_GPS 29 #define POWER4_PME_PM_GRP_DISP_SUCCESS 30 #define POWER4_PME_PM_LSU1_LDF 31 #define POWER4_PME_PM_FAB_CMD_ISSUED 32 #define POWER4_PME_PM_LSU0_SRQ_STFWD 33 #define POWER4_PME_PM_CR_MAP_FULL_CYC 34 #define POWER4_PME_PM_MRK_LSU0_FLUSH_ULD 35 #define POWER4_PME_PM_LSU_DERAT_MISS 36 #define POWER4_PME_PM_FPU0_SINGLE 37 #define POWER4_PME_PM_FPU1_FDIV 38 #define POWER4_PME_PM_FPU1_FEST 39 #define POWER4_PME_PM_FPU0_FRSP_FCONV 40 #define POWER4_PME_PM_MRK_ST_CMPL_INT 41 #define POWER4_PME_PM_FXU_FIN 42 #define POWER4_PME_PM_FPU_STF 43 #define POWER4_PME_PM_DSLB_MISS 44 #define POWER4_PME_PM_DATA_FROM_L275_SHR 45 #define POWER4_PME_PM_FXLS1_FULL_CYC 46 #define POWER4_PME_PM_L3B0_DIR_MIS 47 #define POWER4_PME_PM_2INST_CLB_CYC 48 #define POWER4_PME_PM_MRK_STCX_FAIL 49 #define POWER4_PME_PM_LSU_LMQ_LHR_MERGE 50 #define POWER4_PME_PM_FXU0_BUSY_FXU1_IDLE 51 #define POWER4_PME_PM_L3B1_DIR_REF 52 #define POWER4_PME_PM_MRK_LSU_FLUSH_UST 53 #define POWER4_PME_PM_MRK_DATA_FROM_L25_SHR 54 #define POWER4_PME_PM_LSU_FLUSH_ULD 55 #define POWER4_PME_PM_MRK_BRU_FIN 56 #define POWER4_PME_PM_IERAT_XLATE_WR 57 #define POWER4_PME_PM_LSU0_BUSY 58 #define POWER4_PME_PM_L2SA_ST_REQ 59 #define POWER4_PME_PM_DATA_FROM_MEM 60 #define POWER4_PME_PM_FPR_MAP_FULL_CYC 61 #define POWER4_PME_PM_FPU1_FULL_CYC 62 #define POWER4_PME_PM_FPU0_FIN 63 #define POWER4_PME_PM_3INST_CLB_CYC 64 #define POWER4_PME_PM_DATA_FROM_L35 65 #define POWER4_PME_PM_L2SA_SHR_INV 66 #define POWER4_PME_PM_MRK_LSU_FLUSH_SRQ 67 #define POWER4_PME_PM_THRESH_TIMEO 68 #define POWER4_PME_PM_FPU_FSQRT 69 #define POWER4_PME_PM_MRK_LSU0_FLUSH_LRQ 70 #define POWER4_PME_PM_FXLS0_FULL_CYC 71 #define POWER4_PME_PM_DATA_TABLEWALK_CYC 72 #define POWER4_PME_PM_FPU0_ALL 73 #define POWER4_PME_PM_FPU0_FEST 74 #define POWER4_PME_PM_DATA_FROM_L25_MOD 75 #define POWER4_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 76 #define POWER4_PME_PM_FPU_FEST 77 #define POWER4_PME_PM_0INST_FETCH 78 #define POWER4_PME_PM_LARX_LSU1 79 #define POWER4_PME_PM_LD_MISS_L1_LSU0 80 #define POWER4_PME_PM_L1_PREF 81 #define POWER4_PME_PM_FPU1_STALL3 82 #define POWER4_PME_PM_BRQ_FULL_CYC 83 #define POWER4_PME_PM_LARX 84 #define POWER4_PME_PM_MRK_DATA_FROM_L35 85 #define POWER4_PME_PM_WORK_HELD 86 #define POWER4_PME_PM_MRK_LD_MISS_L1_LSU0 87 #define POWER4_PME_PM_FXU_IDLE 88 #define POWER4_PME_PM_INST_CMPL 89 #define POWER4_PME_PM_LSU1_FLUSH_UST 90 #define POWER4_PME_PM_LSU0_FLUSH_ULD 91 #define POWER4_PME_PM_INST_FROM_L2 92 #define POWER4_PME_PM_DATA_FROM_L3 93 #define POWER4_PME_PM_FPU0_DENORM 94 #define POWER4_PME_PM_FPU1_FMOV_FEST 95 #define POWER4_PME_PM_GRP_DISP_REJECT 96 #define POWER4_PME_PM_INST_FETCH_CYC 97 #define POWER4_PME_PM_LSU_LDF 98 #define POWER4_PME_PM_INST_DISP 99 #define POWER4_PME_PM_L2SA_MOD_INV 100 #define POWER4_PME_PM_DATA_FROM_L25_SHR 101 #define POWER4_PME_PM_FAB_CMD_RETRIED 102 #define POWER4_PME_PM_L1_DCACHE_RELOAD_VALID 103 #define POWER4_PME_PM_MRK_GRP_ISSUED 104 #define POWER4_PME_PM_FPU_FULL_CYC 105 #define POWER4_PME_PM_FPU_FMA 106 #define POWER4_PME_PM_MRK_CRU_FIN 107 #define POWER4_PME_PM_MRK_LSU1_FLUSH_UST 108 #define POWER4_PME_PM_MRK_FXU_FIN 109 #define POWER4_PME_PM_BR_ISSUED 110 #define POWER4_PME_PM_EE_OFF 111 #define POWER4_PME_PM_INST_FROM_L3 112 #define POWER4_PME_PM_ITLB_MISS 113 #define POWER4_PME_PM_FXLS_FULL_CYC 114 #define POWER4_PME_PM_FXU1_BUSY_FXU0_IDLE 115 #define POWER4_PME_PM_GRP_DISP_VALID 116 #define POWER4_PME_PM_L2SC_ST_HIT 117 #define POWER4_PME_PM_MRK_GRP_DISP 118 #define POWER4_PME_PM_L2SB_MOD_TAG 119 #define POWER4_PME_PM_INST_FROM_L25_L275 120 #define POWER4_PME_PM_LSU_FLUSH_UST 121 #define POWER4_PME_PM_L2SB_ST_HIT 122 #define POWER4_PME_PM_FXU1_FIN 123 #define POWER4_PME_PM_L3B1_DIR_MIS 124 #define POWER4_PME_PM_4INST_CLB_CYC 125 #define POWER4_PME_PM_GRP_CMPL 126 #define POWER4_PME_PM_DC_PREF_L2_CLONE_L3 127 #define POWER4_PME_PM_FPU_FRSP_FCONV 128 #define POWER4_PME_PM_5INST_CLB_CYC 129 #define POWER4_PME_PM_MRK_LSU0_FLUSH_SRQ 130 #define POWER4_PME_PM_MRK_LSU_FLUSH_ULD 131 #define POWER4_PME_PM_8INST_CLB_CYC 132 #define POWER4_PME_PM_LSU_LMQ_FULL_CYC 133 #define POWER4_PME_PM_ST_REF_L1_LSU0 134 #define POWER4_PME_PM_LSU0_DERAT_MISS 135 #define POWER4_PME_PM_LSU_SRQ_SYNC_CYC 136 #define POWER4_PME_PM_FPU_STALL3 137 #define POWER4_PME_PM_MRK_DATA_FROM_L2 138 #define POWER4_PME_PM_FPU0_FMOV_FEST 139 #define POWER4_PME_PM_LSU0_FLUSH_SRQ 140 #define POWER4_PME_PM_LD_REF_L1_LSU0 141 #define POWER4_PME_PM_L2SC_SHR_INV 142 #define POWER4_PME_PM_LSU1_FLUSH_SRQ 143 #define POWER4_PME_PM_LSU_LMQ_S0_ALLOC 144 #define POWER4_PME_PM_ST_REF_L1 145 #define POWER4_PME_PM_LSU_SRQ_EMPTY_CYC 146 #define POWER4_PME_PM_FPU1_STF 147 #define POWER4_PME_PM_L3B0_DIR_REF 148 #define POWER4_PME_PM_RUN_CYC 149 #define POWER4_PME_PM_LSU_LMQ_S0_VALID 150 #define POWER4_PME_PM_LSU_LRQ_S0_VALID 151 #define POWER4_PME_PM_LSU0_LDF 152 #define POWER4_PME_PM_MRK_IMR_RELOAD 153 #define POWER4_PME_PM_7INST_CLB_CYC 154 #define POWER4_PME_PM_MRK_GRP_TIMEO 155 #define POWER4_PME_PM_FPU_FMOV_FEST 156 #define POWER4_PME_PM_GRP_DISP_BLK_SB_CYC 157 #define POWER4_PME_PM_XER_MAP_FULL_CYC 158 #define POWER4_PME_PM_ST_MISS_L1 159 #define POWER4_PME_PM_STOP_COMPLETION 160 #define POWER4_PME_PM_MRK_GRP_CMPL 161 #define POWER4_PME_PM_ISLB_MISS 162 #define POWER4_PME_PM_CYC 163 #define POWER4_PME_PM_LD_MISS_L1_LSU1 164 #define POWER4_PME_PM_STCX_FAIL 165 #define POWER4_PME_PM_LSU1_SRQ_STFWD 166 #define POWER4_PME_PM_GRP_DISP 167 #define POWER4_PME_PM_DATA_FROM_L2 168 #define POWER4_PME_PM_L2_PREF 169 #define POWER4_PME_PM_FPU0_FPSCR 170 #define POWER4_PME_PM_FPU1_DENORM 171 #define POWER4_PME_PM_MRK_DATA_FROM_L25_MOD 172 #define POWER4_PME_PM_L2SB_ST_REQ 173 #define POWER4_PME_PM_L2SB_MOD_INV 174 #define POWER4_PME_PM_FPU0_FSQRT 175 #define POWER4_PME_PM_LD_REF_L1 176 #define POWER4_PME_PM_MRK_L1_RELOAD_VALID 177 #define POWER4_PME_PM_L2SB_SHR_MOD 178 #define POWER4_PME_PM_INST_FROM_L1 179 #define POWER4_PME_PM_1PLUS_PPC_CMPL 180 #define POWER4_PME_PM_EE_OFF_EXT_INT 181 #define POWER4_PME_PM_L2SC_SHR_MOD 182 #define POWER4_PME_PM_LSU_LRQ_FULL_CYC 183 #define POWER4_PME_PM_IC_PREF_INSTALL 184 #define POWER4_PME_PM_MRK_LSU1_FLUSH_SRQ 185 #define POWER4_PME_PM_GCT_FULL_CYC 186 #define POWER4_PME_PM_INST_FROM_MEM 187 #define POWER4_PME_PM_FXU_BUSY 188 #define POWER4_PME_PM_ST_REF_L1_LSU1 189 #define POWER4_PME_PM_MRK_LD_MISS_L1 190 #define POWER4_PME_PM_MRK_LSU1_INST_FIN 191 #define POWER4_PME_PM_L1_WRITE_CYC 192 #define POWER4_PME_PM_BIQ_IDU_FULL_CYC 193 #define POWER4_PME_PM_MRK_LSU0_INST_FIN 194 #define POWER4_PME_PM_L2SC_ST_REQ 195 #define POWER4_PME_PM_LSU1_BUSY 196 #define POWER4_PME_PM_FPU_ALL 197 #define POWER4_PME_PM_LSU_SRQ_S0_ALLOC 198 #define POWER4_PME_PM_GRP_MRK 199 #define POWER4_PME_PM_FPU1_FIN 200 #define POWER4_PME_PM_DC_PREF_STREAM_ALLOC 201 #define POWER4_PME_PM_BR_MPRED_CR 202 #define POWER4_PME_PM_BR_MPRED_TA 203 #define POWER4_PME_PM_CRQ_FULL_CYC 204 #define POWER4_PME_PM_INST_FROM_PREF 205 #define POWER4_PME_PM_LD_MISS_L1 206 #define POWER4_PME_PM_STCX_PASS 207 #define POWER4_PME_PM_DC_INV_L2 208 #define POWER4_PME_PM_LSU_SRQ_FULL_CYC 209 #define POWER4_PME_PM_LSU0_FLUSH_LRQ 210 #define POWER4_PME_PM_LSU_SRQ_S0_VALID 211 #define POWER4_PME_PM_LARX_LSU0 212 #define POWER4_PME_PM_GCT_EMPTY_CYC 213 #define POWER4_PME_PM_FPU1_ALL 214 #define POWER4_PME_PM_FPU1_FSQRT 215 #define POWER4_PME_PM_FPU_FIN 216 #define POWER4_PME_PM_L2SA_SHR_MOD 217 #define POWER4_PME_PM_MRK_LD_MISS_L1_LSU1 218 #define POWER4_PME_PM_LSU_SRQ_STFWD 219 #define POWER4_PME_PM_FXU0_FIN 220 #define POWER4_PME_PM_MRK_FPU_FIN 221 #define POWER4_PME_PM_LSU_BUSY 222 #define POWER4_PME_PM_INST_FROM_L35 223 #define POWER4_PME_PM_FPU1_FRSP_FCONV 224 #define POWER4_PME_PM_SNOOP_TLBIE 225 #define POWER4_PME_PM_FPU0_FDIV 226 #define POWER4_PME_PM_LD_REF_L1_LSU1 227 #define POWER4_PME_PM_MRK_DATA_FROM_L275_MOD 228 #define POWER4_PME_PM_HV_CYC 229 #define POWER4_PME_PM_6INST_CLB_CYC 230 #define POWER4_PME_PM_LR_CTR_MAP_FULL_CYC 231 #define POWER4_PME_PM_L2SC_MOD_INV 232 #define POWER4_PME_PM_FPU_DENORM 233 #define POWER4_PME_PM_DATA_FROM_L275_MOD 234 #define POWER4_PME_PM_LSU1_DERAT_MISS 235 #define POWER4_PME_PM_IC_PREF_REQ 236 #define POWER4_PME_PM_MRK_LSU_FIN 237 #define POWER4_PME_PM_MRK_DATA_FROM_L3 238 #define POWER4_PME_PM_MRK_DATA_FROM_MEM 239 #define POWER4_PME_PM_LSU0_FLUSH_UST 240 #define POWER4_PME_PM_LSU_FLUSH_LRQ 241 #define POWER4_PME_PM_LSU_FLUSH_SRQ 242 #define POWER4_PME_PM_L2SC_MOD_TAG 243 static const pme_power_entry_t power4_pe[] = { [ POWER4_PME_PM_MRK_LSU_SRQ_INST_VALID ] = { .pme_name = "PM_MRK_LSU_SRQ_INST_VALID", .pme_code = 0x933, .pme_short_desc = "Marked instruction valid in SRQ", .pme_long_desc = "This signal is asserted every cycle when a marked request is resident in the Store Request Queue", }, [ POWER4_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0x127, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing single precision instruction.", }, [ POWER4_PME_PM_DC_PREF_OUT_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_STREAMS", .pme_code = 0xc36, .pme_short_desc = "Out of prefetch streams", .pme_long_desc = "A new prefetch stream was detected, but no more stream entries were available", }, [ POWER4_PME_PM_FPU0_STALL3 ] = { .pme_name = "PM_FPU0_STALL3", .pme_code = 0x121, .pme_short_desc = "FPU0 stalled in pipe3", .pme_long_desc = "This signal indicates that fp0 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ POWER4_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x8005, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ POWER4_PME_PM_GPR_MAP_FULL_CYC ] = { .pme_name = "PM_GPR_MAP_FULL_CYC", .pme_code = 0x235, .pme_short_desc = "Cycles GPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the gpr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ POWER4_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x1003, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ POWER4_PME_PM_MRK_LSU_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_LRQ", .pme_code = 0x3910, .pme_short_desc = "Marked LRQ flushes", .pme_long_desc = "A marked load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0x122, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a store instruction.", }, [ POWER4_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0x105, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_L2SA_MOD_TAG ] = { .pme_name = "PM_L2SA_MOD_TAG", .pme_code = 0xf06, .pme_short_desc = "L2 slice A transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_MRK_DATA_FROM_L275_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L275_SHR", .pme_code = 0x6c76, .pme_short_desc = "Marked data loaded from L2.75 shared", .pme_long_desc = "DL1 was reloaded with shared (T) data from the L2 of another MCM due to a marked demand load", }, [ POWER4_PME_PM_1INST_CLB_CYC ] = { .pme_name = "PM_1INST_CLB_CYC", .pme_code = 0x450, .pme_short_desc = "Cycles 1 instruction in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0xc04, .pme_short_desc = "LSU1 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x7005, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_MRK_LSU0_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU0_FLUSH_UST", .pme_code = 0x911, .pme_short_desc = "LSU0 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 0 because it was unaligned", }, [ POWER4_PME_PM_FPU_FDIV ] = { .pme_name = "PM_FPU_FDIV", .pme_code = 0x1100, .pme_short_desc = "FPU executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0xc26, .pme_short_desc = "LRQ slot 0 allocated", .pme_long_desc = "LRQ slot zero was allocated", }, [ POWER4_PME_PM_FPU0_FULL_CYC ] = { .pme_name = "PM_FPU0_FULL_CYC", .pme_code = 0x203, .pme_short_desc = "Cycles FPU0 issue queue full", .pme_long_desc = "The issue queue for FPU unit 0 cannot accept any more instructions. Issue is stopped", }, [ POWER4_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x5120, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0x101, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_MRK_LSU1_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU1_FLUSH_ULD", .pme_code = 0x914, .pme_short_desc = "LSU1 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0xc06, .pme_short_desc = "LSU1 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_L2SA_ST_HIT ] = { .pme_name = "PM_L2SA_ST_HIT", .pme_code = 0xf11, .pme_short_desc = "L2 slice A store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A,B, and C.", }, [ POWER4_PME_PM_L2SB_SHR_INV ] = { .pme_name = "PM_L2SB_SHR_INV", .pme_code = 0xf21, .pme_short_desc = "L2 slice B transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A,B,and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER4_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x904, .pme_short_desc = "Data TLB misses", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ POWER4_PME_PM_MRK_ST_MISS_L1 ] = { .pme_name = "PM_MRK_ST_MISS_L1", .pme_code = 0x923, .pme_short_desc = "Marked L1 D cache store misses", .pme_long_desc = "A marked store missed the dcache", }, [ POWER4_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x8002, .pme_short_desc = "External interrupts", .pme_long_desc = "An external interrupt occurred", }, [ POWER4_PME_PM_MRK_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_LRQ", .pme_code = 0x916, .pme_short_desc = "LSU1 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x6003, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ POWER4_PME_PM_GRP_DISP_SUCCESS ] = { .pme_name = "PM_GRP_DISP_SUCCESS", .pme_code = 0x5001, .pme_short_desc = "Group dispatch success", .pme_long_desc = "Number of groups sucessfully dispatched (not rejected)", }, [ POWER4_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0x934, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 1", }, [ POWER4_PME_PM_FAB_CMD_ISSUED ] = { .pme_name = "PM_FAB_CMD_ISSUED", .pme_code = 0xf16, .pme_short_desc = "Fabric command issued", .pme_long_desc = "A bus command was issued on the MCM to MCM fabric from the local (this chip's) Fabric Bus Controller. This event is scaled to the fabric frequency and must be adjusted for a true count. i.e. if the fabric is running 2:1, divide the count by 2.", }, [ POWER4_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0xc20, .pme_short_desc = "LSU0 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0", }, [ POWER4_PME_PM_CR_MAP_FULL_CYC ] = { .pme_name = "PM_CR_MAP_FULL_CYC", .pme_code = 0x204, .pme_short_desc = "Cycles CR logical operation mapper full", .pme_long_desc = "The ISU sends a signal indicating that the cr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ POWER4_PME_PM_MRK_LSU0_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU0_FLUSH_ULD", .pme_code = 0x910, .pme_short_desc = "LSU0 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x6900, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total D-ERAT Misses (Unit 0 + Unit 1). Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ POWER4_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0x123, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing single precision instruction.", }, [ POWER4_PME_PM_FPU1_FDIV ] = { .pme_name = "PM_FPU1_FDIV", .pme_code = 0x104, .pme_short_desc = "FPU1 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER4_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0x116, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ POWER4_PME_PM_FPU0_FRSP_FCONV ] = { .pme_name = "PM_FPU0_FRSP_FCONV", .pme_code = 0x111, .pme_short_desc = "FPU0 executed FRSP or FCONV instructions", .pme_long_desc = "fThis signal is active for one cycle when fp0 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x3003, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ POWER4_PME_PM_FXU_FIN ] = { .pme_name = "PM_FXU_FIN", .pme_code = 0x3230, .pme_short_desc = "FXU produced a result", .pme_long_desc = "The fixed point unit (Unit 0 + Unit 1) finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER4_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x6120, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU is executing a store instruction. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x905, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve", }, [ POWER4_PME_PM_DATA_FROM_L275_SHR ] = { .pme_name = "PM_DATA_FROM_L275_SHR", .pme_code = 0x6c66, .pme_short_desc = "Data loaded from L2.75 shared", .pme_long_desc = "DL1 was reloaded with shared (T) data from the L2 of another MCM due to a demand load", }, [ POWER4_PME_PM_FXLS1_FULL_CYC ] = { .pme_name = "PM_FXLS1_FULL_CYC", .pme_code = 0x214, .pme_short_desc = "Cycles FXU1/LS1 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 1 cannot accept any more instructions. Issue is stopped", }, [ POWER4_PME_PM_L3B0_DIR_MIS ] = { .pme_name = "PM_L3B0_DIR_MIS", .pme_code = 0xf01, .pme_short_desc = "L3 bank 0 directory misses", .pme_long_desc = "A reference was made to the local L3 directory by a local CPU and it missed in the L3. Only requests from on-MCM CPUs are counted. This event is scaled to the L3 speed and the count must be scaled. i.e. if the L3 is running 3:1, divide the count by 3", }, [ POWER4_PME_PM_2INST_CLB_CYC ] = { .pme_name = "PM_2INST_CLB_CYC", .pme_code = 0x451, .pme_short_desc = "Cycles 2 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x925, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ POWER4_PME_PM_LSU_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU_LMQ_LHR_MERGE", .pme_code = 0x926, .pme_short_desc = "LMQ LHR merges", .pme_long_desc = "A dcache miss occured for the same real cache line address as an earlier request already in the Load Miss Queue and was merged into the LMQ entry.", }, [ POWER4_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x7002, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ POWER4_PME_PM_L3B1_DIR_REF ] = { .pme_name = "PM_L3B1_DIR_REF", .pme_code = 0xf02, .pme_short_desc = "L3 bank 1 directory references", .pme_long_desc = "A reference was made to the local L3 directory by a local CPU. Only requests from on-MCM CPUs are counted. This event is scaled to the L3 speed and the count must be scaled. i.e. if the L3 is running 3:1, divide the count by 3", }, [ POWER4_PME_PM_MRK_LSU_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU_FLUSH_UST", .pme_code = 0x7910, .pme_short_desc = "Marked unaligned store flushes", .pme_long_desc = "A marked store was flushed because it was unaligned", }, [ POWER4_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x5c76, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a marked demand load", }, [ POWER4_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0x1c00, .pme_short_desc = "LRQ unaligned load flushes", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x2005, .pme_short_desc = "Marked instruction BRU processing finished", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_IERAT_XLATE_WR ] = { .pme_name = "PM_IERAT_XLATE_WR", .pme_code = 0x327, .pme_short_desc = "Translation written to ierat", .pme_long_desc = "This signal will be asserted each time the I-ERAT is written. This indicates that an ERAT miss has been serviced. ERAT misses will initiate a sequence resulting in the ERAT being written. ERAT misses that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed, This should be a fairly accurate count of ERAT missed (best available).", }, [ POWER4_PME_PM_LSU0_BUSY ] = { .pme_name = "PM_LSU0_BUSY", .pme_code = 0xc33, .pme_short_desc = "LSU0 busy", .pme_long_desc = "LSU unit 0 is busy rejecting instructions", }, [ POWER4_PME_PM_L2SA_ST_REQ ] = { .pme_name = "PM_L2SA_ST_REQ", .pme_code = 0xf10, .pme_short_desc = "L2 slice A store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_DATA_FROM_MEM ] = { .pme_name = "PM_DATA_FROM_MEM", .pme_code = 0x2c66, .pme_short_desc = "Data loaded from memory", .pme_long_desc = "DL1 was reloaded from memory due to a demand load", }, [ POWER4_PME_PM_FPR_MAP_FULL_CYC ] = { .pme_name = "PM_FPR_MAP_FULL_CYC", .pme_code = 0x201, .pme_short_desc = "Cycles FPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the FPR mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ POWER4_PME_PM_FPU1_FULL_CYC ] = { .pme_name = "PM_FPU1_FULL_CYC", .pme_code = 0x207, .pme_short_desc = "Cycles FPU1 issue queue full", .pme_long_desc = "The issue queue for FPU unit 1 cannot accept any more instructions. Issue is stopped", }, [ POWER4_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0x113, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "fp0 finished, produced a result This only indicates finish, not completion. ", }, [ POWER4_PME_PM_3INST_CLB_CYC ] = { .pme_name = "PM_3INST_CLB_CYC", .pme_code = 0x452, .pme_short_desc = "Cycles 3 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_DATA_FROM_L35 ] = { .pme_name = "PM_DATA_FROM_L35", .pme_code = 0x3c66, .pme_short_desc = "Data loaded from L3.5", .pme_long_desc = "DL1 was reloaded from the L3 of another MCM due to a demand load", }, [ POWER4_PME_PM_L2SA_SHR_INV ] = { .pme_name = "PM_L2SA_SHR_INV", .pme_code = 0xf05, .pme_short_desc = "L2 slice A transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A,B,and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER4_PME_PM_MRK_LSU_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_SRQ", .pme_code = 0x4910, .pme_short_desc = "Marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER4_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x2003, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ POWER4_PME_PM_FPU_FSQRT ] = { .pme_name = "PM_FPU_FSQRT", .pme_code = 0x6100, .pme_short_desc = "FPU executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_MRK_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_LRQ", .pme_code = 0x912, .pme_short_desc = "LSU0 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_FXLS0_FULL_CYC ] = { .pme_name = "PM_FXLS0_FULL_CYC", .pme_code = 0x210, .pme_short_desc = "Cycles FXU0/LS0 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 0 cannot accept any more instructions. Issue is stopped", }, [ POWER4_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x936, .pme_short_desc = "Cycles doing data tablewalks", .pme_long_desc = "This signal is asserted every cycle when a tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ POWER4_PME_PM_FPU0_ALL ] = { .pme_name = "PM_FPU0_ALL", .pme_code = 0x103, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ POWER4_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0x112, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ POWER4_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x8c66, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a demand load", }, [ POWER4_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x2002, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER4_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x3110, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ POWER4_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x8327, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ POWER4_PME_PM_LARX_LSU1 ] = { .pme_name = "PM_LARX_LSU1", .pme_code = 0xc77, .pme_short_desc = "Larx executed on LSU1", .pme_long_desc = "Invalid event, larx instructions are never executed on unit 1", }, [ POWER4_PME_PM_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_LD_MISS_L1_LSU0", .pme_code = 0xc12, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 0, missed the dcache", }, [ POWER4_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0xc35, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ POWER4_PME_PM_FPU1_STALL3 ] = { .pme_name = "PM_FPU1_STALL3", .pme_code = 0x125, .pme_short_desc = "FPU1 stalled in pipe3", .pme_long_desc = "This signal indicates that fp1 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ POWER4_PME_PM_BRQ_FULL_CYC ] = { .pme_name = "PM_BRQ_FULL_CYC", .pme_code = 0x205, .pme_short_desc = "Cycles branch queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu br unit cannot accept any more group (queue is full of groups).", }, [ POWER4_PME_PM_LARX ] = { .pme_name = "PM_LARX", .pme_code = 0x4c70, .pme_short_desc = "Larx executed", .pme_long_desc = "A Larx (lwarx or ldarx) was executed. This is the combined count from LSU0 + LSU1, but these instructions only execute on LSU0", }, [ POWER4_PME_PM_MRK_DATA_FROM_L35 ] = { .pme_name = "PM_MRK_DATA_FROM_L35", .pme_code = 0x3c76, .pme_short_desc = "Marked data loaded from L3.5", .pme_long_desc = "DL1 was reloaded from the L3 of another MCM due to a marked demand load", }, [ POWER4_PME_PM_WORK_HELD ] = { .pme_name = "PM_WORK_HELD", .pme_code = 0x2001, .pme_short_desc = "Work held", .pme_long_desc = "RAS Unit has signaled completion to stop and there are groups waiting to complete", }, [ POWER4_PME_PM_MRK_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU0", .pme_code = 0x920, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 0, missed the dcache", }, [ POWER4_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x5002, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle", }, [ POWER4_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x8001, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of Eligible Instructions that completed. ", }, [ POWER4_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0xc05, .pme_short_desc = "LSU1 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ POWER4_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0xc00, .pme_short_desc = "LSU0 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x3327, .pme_short_desc = "Instructions fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x1c66, .pme_short_desc = "Data loaded from L3", .pme_long_desc = "DL1 was reloaded from the local L3 due to a demand load", }, [ POWER4_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0x120, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ POWER4_PME_PM_FPU1_FMOV_FEST ] = { .pme_name = "PM_FPU1_FMOV_FEST", .pme_code = 0x114, .pme_short_desc = "FPU1 executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ POWER4_PME_PM_GRP_DISP_REJECT ] = { .pme_name = "PM_GRP_DISP_REJECT", .pme_code = 0x8003, .pme_short_desc = "Group dispatch rejected", .pme_long_desc = "A group that previously attempted dispatch was rejected.", }, [ POWER4_PME_PM_INST_FETCH_CYC ] = { .pme_name = "PM_INST_FETCH_CYC", .pme_code = 0x323, .pme_short_desc = "Cycles at least 1 instruction fetched", .pme_long_desc = "Asserted each cycle when the IFU sends at least one instruction to the IDU. ", }, [ POWER4_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x8930, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction", }, [ POWER4_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x221, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "The ISU sends the number of instructions dispatched.", }, [ POWER4_PME_PM_L2SA_MOD_INV ] = { .pme_name = "PM_L2SA_MOD_INV", .pme_code = 0xf07, .pme_short_desc = "L2 slice A transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x5c66, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a demand load", }, [ POWER4_PME_PM_FAB_CMD_RETRIED ] = { .pme_name = "PM_FAB_CMD_RETRIED", .pme_code = 0xf17, .pme_short_desc = "Fabric command retried", .pme_long_desc = "A bus command on the MCM to MCM fabric was retried. This event is the total count of all retried fabric commands for the local MCM (all four chips report the same value). This event is scaled to the fabric frequency and must be adjusted for a true count. i.e. if the fabric is running 2:1, divide the count by 2.", }, [ POWER4_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0xc64, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid", }, [ POWER4_PME_PM_MRK_GRP_ISSUED ] = { .pme_name = "PM_MRK_GRP_ISSUED", .pme_code = 0x6005, .pme_short_desc = "Marked group issued", .pme_long_desc = "A sampled instruction was issued", }, [ POWER4_PME_PM_FPU_FULL_CYC ] = { .pme_name = "PM_FPU_FULL_CYC", .pme_code = 0x5200, .pme_short_desc = "Cycles FPU issue queue full", .pme_long_desc = "Cycles when one or both FPU issue queues are full", }, [ POWER4_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x2100, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_MRK_CRU_FIN ] = { .pme_name = "PM_MRK_CRU_FIN", .pme_code = 0x4005, .pme_short_desc = "Marked instruction CRU processing finished", .pme_long_desc = "The Condition Register Unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_MRK_LSU1_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU1_FLUSH_UST", .pme_code = 0x915, .pme_short_desc = "LSU1 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ POWER4_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x6004, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "One of the Fixed Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_BR_ISSUED ] = { .pme_name = "PM_BR_ISSUED", .pme_code = 0x330, .pme_short_desc = "Branches issued", .pme_long_desc = "This signal will be asserted each time the ISU issues a branch instruction. This signal will be asserted each time the ISU selects a branch instruction to issue.", }, [ POWER4_PME_PM_EE_OFF ] = { .pme_name = "PM_EE_OFF", .pme_code = 0x233, .pme_short_desc = "Cycles MSR(EE) bit off", .pme_long_desc = "The number of Cycles MSR(EE) bit was off.", }, [ POWER4_PME_PM_INST_FROM_L3 ] = { .pme_name = "PM_INST_FROM_L3", .pme_code = 0x5327, .pme_short_desc = "Instruction fetched from L3", .pme_long_desc = "An instruction fetch group was fetched from L3. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x900, .pme_short_desc = "Instruction TLB misses", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ POWER4_PME_PM_FXLS_FULL_CYC ] = { .pme_name = "PM_FXLS_FULL_CYC", .pme_code = 0x8210, .pme_short_desc = "Cycles FXLS queue is full", .pme_long_desc = "Cycles when one or both FXU/LSU issue queue are full", }, [ POWER4_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x4002, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy", }, [ POWER4_PME_PM_GRP_DISP_VALID ] = { .pme_name = "PM_GRP_DISP_VALID", .pme_code = 0x223, .pme_short_desc = "Group dispatch valid", .pme_long_desc = "Dispatch has been attempted for a valid group. Some groups may be rejected. The total number of successful dispatches is the number of dispatch valid minus dispatch reject.", }, [ POWER4_PME_PM_L2SC_ST_HIT ] = { .pme_name = "PM_L2SC_ST_HIT", .pme_code = 0xf15, .pme_short_desc = "L2 slice C store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A,B, and C.", }, [ POWER4_PME_PM_MRK_GRP_DISP ] = { .pme_name = "PM_MRK_GRP_DISP", .pme_code = 0x1002, .pme_short_desc = "Marked group dispatched", .pme_long_desc = "A group containing a sampled instruction was dispatched", }, [ POWER4_PME_PM_L2SB_MOD_TAG ] = { .pme_name = "PM_L2SB_MOD_TAG", .pme_code = 0xf22, .pme_short_desc = "L2 slice B transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_INST_FROM_L25_L275 ] = { .pme_name = "PM_INST_FROM_L25_L275", .pme_code = 0x2327, .pme_short_desc = "Instruction fetched from L2.5/L2.75", .pme_long_desc = "An instruction fetch group was fetched from the L2 of another chip. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0x2c00, .pme_short_desc = "SRQ unaligned store flushes", .pme_long_desc = "A store was flushed because it was unaligned", }, [ POWER4_PME_PM_L2SB_ST_HIT ] = { .pme_name = "PM_L2SB_ST_HIT", .pme_code = 0xf13, .pme_short_desc = "L2 slice B store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A,B, and C.", }, [ POWER4_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x236, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result", }, [ POWER4_PME_PM_L3B1_DIR_MIS ] = { .pme_name = "PM_L3B1_DIR_MIS", .pme_code = 0xf03, .pme_short_desc = "L3 bank 1 directory misses", .pme_long_desc = "A reference was made to the local L3 directory by a local CPU and it missed in the L3. Only requests from on-MCM CPUs are counted. This event is scaled to the L3 speed and the count must be scaled. i.e. if the L3 is running 3:1, divide the count by 3", }, [ POWER4_PME_PM_4INST_CLB_CYC ] = { .pme_name = "PM_4INST_CLB_CYC", .pme_code = 0x453, .pme_short_desc = "Cycles 4 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x7003, .pme_short_desc = "Group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER4_PME_PM_DC_PREF_L2_CLONE_L3 ] = { .pme_name = "PM_DC_PREF_L2_CLONE_L3", .pme_code = 0xc27, .pme_short_desc = "L2 prefetch cloned with L3", .pme_long_desc = "A prefetch request was made to the L2 with a cloned request sent to the L3", }, [ POWER4_PME_PM_FPU_FRSP_FCONV ] = { .pme_name = "PM_FPU_FRSP_FCONV", .pme_code = 0x7110, .pme_short_desc = "FPU executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_5INST_CLB_CYC ] = { .pme_name = "PM_5INST_CLB_CYC", .pme_code = 0x454, .pme_short_desc = "Cycles 5 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_MRK_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_SRQ", .pme_code = 0x913, .pme_short_desc = "LSU0 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER4_PME_PM_MRK_LSU_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU_FLUSH_ULD", .pme_code = 0x8910, .pme_short_desc = "Marked unaligned load flushes", .pme_long_desc = "A marked load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER4_PME_PM_8INST_CLB_CYC ] = { .pme_name = "PM_8INST_CLB_CYC", .pme_code = 0x457, .pme_short_desc = "Cycles 8 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0x927, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The LMQ was full", }, [ POWER4_PME_PM_ST_REF_L1_LSU0 ] = { .pme_name = "PM_ST_REF_L1_LSU0", .pme_code = 0xc11, .pme_short_desc = "LSU0 L1 D cache store references", .pme_long_desc = "A store executed on unit 0", }, [ POWER4_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x902, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 0 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER4_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0x932, .pme_short_desc = "SRQ sync duration", .pme_long_desc = "This signal is asserted every cycle when a sync is in the SRQ.", }, [ POWER4_PME_PM_FPU_STALL3 ] = { .pme_name = "PM_FPU_STALL3", .pme_code = 0x2120, .pme_short_desc = "FPU stalled in pipe3", .pme_long_desc = "FPU has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x4c76, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a marked demand load", }, [ POWER4_PME_PM_FPU0_FMOV_FEST ] = { .pme_name = "PM_FPU0_FMOV_FEST", .pme_code = 0x110, .pme_short_desc = "FPU0 executed FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ POWER4_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0xc03, .pme_short_desc = "LSU0 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER4_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0xc10, .pme_short_desc = "LSU0 L1 D cache load references", .pme_long_desc = "A load executed on unit 0", }, [ POWER4_PME_PM_L2SC_SHR_INV ] = { .pme_name = "PM_L2SC_SHR_INV", .pme_code = 0xf25, .pme_short_desc = "L2 slice C transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A,B,and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER4_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0xc07, .pme_short_desc = "LSU1 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. ", }, [ POWER4_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0x935, .pme_short_desc = "LMQ slot 0 allocated", .pme_long_desc = "The first entry in the LMQ was allocated.", }, [ POWER4_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x7c10, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Total DL1 Store references", }, [ POWER4_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x4003, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "The Store Request Queue is empty", }, [ POWER4_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0x126, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a store instruction.", }, [ POWER4_PME_PM_L3B0_DIR_REF ] = { .pme_name = "PM_L3B0_DIR_REF", .pme_code = 0xf00, .pme_short_desc = "L3 bank 0 directory references", .pme_long_desc = "A reference was made to the local L3 directory by a local CPU. Only requests from on-MCM CPUs are counted. This event is scaled to the L3 speed and the count must be scaled. i.e. if the L3 is running 3:1, divide the count by 3", }, [ POWER4_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x1005, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch", }, [ POWER4_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0x931, .pme_short_desc = "LMQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle when the first entry in the LMQ is valid. The LMQ had eight entries that are allocated FIFO", }, [ POWER4_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0xc22, .pme_short_desc = "LRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ POWER4_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0x930, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 0", }, [ POWER4_PME_PM_MRK_IMR_RELOAD ] = { .pme_name = "PM_MRK_IMR_RELOAD", .pme_code = 0x922, .pme_short_desc = "Marked IMR reloaded", .pme_long_desc = "A DL1 reload occured due to marked load", }, [ POWER4_PME_PM_7INST_CLB_CYC ] = { .pme_name = "PM_7INST_CLB_CYC", .pme_code = 0x456, .pme_short_desc = "Cycles 7 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_MRK_GRP_TIMEO ] = { .pme_name = "PM_MRK_GRP_TIMEO", .pme_code = 0x5005, .pme_short_desc = "Marked group completion timeout", .pme_long_desc = "The sampling timeout expired indicating that the previously sampled instruction is no longer in the processor", }, [ POWER4_PME_PM_FPU_FMOV_FEST ] = { .pme_name = "PM_FPU_FMOV_FEST", .pme_code = 0x8110, .pme_short_desc = "FPU executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ . Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_GRP_DISP_BLK_SB_CYC ] = { .pme_name = "PM_GRP_DISP_BLK_SB_CYC", .pme_code = 0x231, .pme_short_desc = "Cycles group dispatch blocked by scoreboard", .pme_long_desc = "The ISU sends a signal indicating that dispatch is blocked by scoreboard.", }, [ POWER4_PME_PM_XER_MAP_FULL_CYC ] = { .pme_name = "PM_XER_MAP_FULL_CYC", .pme_code = 0x202, .pme_short_desc = "Cycles XER mapper full", .pme_long_desc = "The ISU sends a signal indicating that the xer mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ POWER4_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0xc23, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache", }, [ POWER4_PME_PM_STOP_COMPLETION ] = { .pme_name = "PM_STOP_COMPLETION", .pme_code = 0x3001, .pme_short_desc = "Completion stopped", .pme_long_desc = "RAS Unit has signaled completion to stop", }, [ POWER4_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x4004, .pme_short_desc = "Marked group completed", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER4_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x901, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ POWER4_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x7, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ POWER4_PME_PM_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_LD_MISS_L1_LSU1", .pme_code = 0xc16, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 1, missed the dcache", }, [ POWER4_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x921, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ POWER4_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0xc24, .pme_short_desc = "LSU1 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1", }, [ POWER4_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x2004, .pme_short_desc = "Group dispatches", .pme_long_desc = "A group was dispatched", }, [ POWER4_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x4c66, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a demand load", }, [ POWER4_PME_PM_L2_PREF ] = { .pme_name = "PM_L2_PREF", .pme_code = 0xc34, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "A request to prefetch data into L2 was made", }, [ POWER4_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0x130, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing fpscr move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*. mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs", }, [ POWER4_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0x124, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ POWER4_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x8c76, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a marked demand load", }, [ POWER4_PME_PM_L2SB_ST_REQ ] = { .pme_name = "PM_L2SB_ST_REQ", .pme_code = 0xf12, .pme_short_desc = "L2 slice B store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_L2SB_MOD_INV ] = { .pme_name = "PM_L2SB_MOD_INV", .pme_code = 0xf23, .pme_short_desc = "L2 slice B transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_FPU0_FSQRT ] = { .pme_name = "PM_FPU0_FSQRT", .pme_code = 0x102, .pme_short_desc = "FPU0 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x8c10, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Total DL1 Load references", }, [ POWER4_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0xc74, .pme_short_desc = "Marked L1 reload data source valid", .pme_long_desc = "The source information is valid and is for a marked load", }, [ POWER4_PME_PM_L2SB_SHR_MOD ] = { .pme_name = "PM_L2SB_SHR_MOD", .pme_code = 0xf20, .pme_short_desc = "L2 slice B transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A,B,and C. ", }, [ POWER4_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x6327, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x5003, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER4_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x237, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles MSR(EE) bit off and external interrupt pending", }, [ POWER4_PME_PM_L2SC_SHR_MOD ] = { .pme_name = "PM_L2SC_SHR_MOD", .pme_code = 0xf24, .pme_short_desc = "L2 slice C transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A,B,and C. ", }, [ POWER4_PME_PM_LSU_LRQ_FULL_CYC ] = { .pme_name = "PM_LSU_LRQ_FULL_CYC", .pme_code = 0x212, .pme_short_desc = "Cycles LRQ full", .pme_long_desc = "The isu sends this signal when the lrq is full.", }, [ POWER4_PME_PM_IC_PREF_INSTALL ] = { .pme_name = "PM_IC_PREF_INSTALL", .pme_code = 0x325, .pme_short_desc = "Instruction prefetched installed in prefetch buffer", .pme_long_desc = "This signal is asserted when a prefetch buffer entry (line) is allocated but the request is not a demand fetch.", }, [ POWER4_PME_PM_MRK_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_SRQ", .pme_code = 0x917, .pme_short_desc = "LSU1 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER4_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x200, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The ISU sends a signal indicating the gct is full. ", }, [ POWER4_PME_PM_INST_FROM_MEM ] = { .pme_name = "PM_INST_FROM_MEM", .pme_code = 0x1327, .pme_short_desc = "Instruction fetched from memory", .pme_long_desc = "An instruction fetch group was fetched from memory. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x6002, .pme_short_desc = "FXU busy", .pme_long_desc = "FXU0 and FXU1 are both busy", }, [ POWER4_PME_PM_ST_REF_L1_LSU1 ] = { .pme_name = "PM_ST_REF_L1_LSU1", .pme_code = 0xc15, .pme_short_desc = "LSU1 L1 D cache store references", .pme_long_desc = "A store executed on unit 1", }, [ POWER4_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x1920, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ POWER4_PME_PM_MRK_LSU1_INST_FIN ] = { .pme_name = "PM_MRK_LSU1_INST_FIN", .pme_code = 0xc32, .pme_short_desc = "LSU1 finished a marked instruction", .pme_long_desc = "LSU unit 1 finished a marked instruction", }, [ POWER4_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x333, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "This signal is asserted each cycle a cache write is active.", }, [ POWER4_PME_PM_BIQ_IDU_FULL_CYC ] = { .pme_name = "PM_BIQ_IDU_FULL_CYC", .pme_code = 0x324, .pme_short_desc = "Cycles BIQ or IDU full", .pme_long_desc = "This signal will be asserted each time either the IDU is full or the BIQ is full.", }, [ POWER4_PME_PM_MRK_LSU0_INST_FIN ] = { .pme_name = "PM_MRK_LSU0_INST_FIN", .pme_code = 0xc31, .pme_short_desc = "LSU0 finished a marked instruction", .pme_long_desc = "LSU unit 0 finished a marked instruction", }, [ POWER4_PME_PM_L2SC_ST_REQ ] = { .pme_name = "PM_L2SC_ST_REQ", .pme_code = 0xf14, .pme_short_desc = "L2 slice C store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_LSU1_BUSY ] = { .pme_name = "PM_LSU1_BUSY", .pme_code = 0xc37, .pme_short_desc = "LSU1 busy", .pme_long_desc = "LSU unit 1 is busy rejecting instructions ", }, [ POWER4_PME_PM_FPU_ALL ] = { .pme_name = "PM_FPU_ALL", .pme_code = 0x5100, .pme_short_desc = "FPU executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0xc25, .pme_short_desc = "SRQ slot 0 allocated", .pme_long_desc = "SRQ Slot zero was allocated", }, [ POWER4_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x5004, .pme_short_desc = "Group marked in IDU", .pme_long_desc = "A group was sampled (marked)", }, [ POWER4_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0x117, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "fp1 finished, produced a result. This only indicates finish, not completion. ", }, [ POWER4_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x907, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated", }, [ POWER4_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x331, .pme_short_desc = "Branch mispredictions due CR bit setting", .pme_long_desc = "This signal is asserted when the branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This signal is asserted after a branch issue event and will result in a branch redirect flush if not overridden by a flush of an older instruction.", }, [ POWER4_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x332, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "branch miss predict due to a target address prediction. This signal will be asserted each time the branch execution unit detects an incorrect target address prediction. This signal will be asserted after a valid branch execution unit issue and will cause a branch mispredict flush unless a flush is detected from an older instruction.", }, [ POWER4_PME_PM_CRQ_FULL_CYC ] = { .pme_name = "PM_CRQ_FULL_CYC", .pme_code = 0x211, .pme_short_desc = "Cycles CR issue queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu cr unit cannot accept any more group (queue is full of groups).", }, [ POWER4_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x7327, .pme_short_desc = "Instructions fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x3c10, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Total DL1 Load references that miss the DL1", }, [ POWER4_PME_PM_STCX_PASS ] = { .pme_name = "PM_STCX_PASS", .pme_code = 0xc75, .pme_short_desc = "Stcx passes", .pme_long_desc = "A stcx (stwcx or stdcx) instruction was successful", }, [ POWER4_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0xc17, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ POWER4_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x213, .pme_short_desc = "Cycles SRQ full", .pme_long_desc = "The isu sends this signal when the srq is full.", }, [ POWER4_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0xc02, .pme_short_desc = "LSU0 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0xc21, .pme_short_desc = "SRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ POWER4_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0xc73, .pme_short_desc = "Larx executed on LSU0", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 (there is no coresponding unit 1 event since larx instructions can only execute on unit 0)", }, [ POWER4_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x1004, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ POWER4_PME_PM_FPU1_ALL ] = { .pme_name = "PM_FPU1_ALL", .pme_code = 0x107, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ POWER4_PME_PM_FPU1_FSQRT ] = { .pme_name = "PM_FPU1_FSQRT", .pme_code = 0x106, .pme_short_desc = "FPU1 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x4110, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result This only indicates finish, not completion. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_L2SA_SHR_MOD ] = { .pme_name = "PM_L2SA_SHR_MOD", .pme_code = 0xf04, .pme_short_desc = "L2 slice A transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A,B,and C. ", }, [ POWER4_PME_PM_MRK_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU1", .pme_code = 0x924, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 1, missed the dcache", }, [ POWER4_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0x1c20, .pme_short_desc = "SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load", }, [ POWER4_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x232, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result", }, [ POWER4_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x7004, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_LSU_BUSY ] = { .pme_name = "PM_LSU_BUSY", .pme_code = 0x4c30, .pme_short_desc = "LSU busy", .pme_long_desc = "LSU (unit 0 + unit 1) is busy rejecting instructions ", }, [ POWER4_PME_PM_INST_FROM_L35 ] = { .pme_name = "PM_INST_FROM_L35", .pme_code = 0x4327, .pme_short_desc = "Instructions fetched from L3.5", .pme_long_desc = "An instruction fetch group was fetched from the L3 of another module. Fetch Groups can contain up to 8 instructions", }, [ POWER4_PME_PM_FPU1_FRSP_FCONV ] = { .pme_name = "PM_FPU1_FRSP_FCONV", .pme_code = 0x115, .pme_short_desc = "FPU1 executed FRSP or FCONV instructions", .pme_long_desc = "fThis signal is active for one cycle when fp1 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER4_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0x903, .pme_short_desc = "Snoop TLBIE", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ POWER4_PME_PM_FPU0_FDIV ] = { .pme_name = "PM_FPU0_FDIV", .pme_code = 0x100, .pme_short_desc = "FPU0 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER4_PME_PM_LD_REF_L1_LSU1 ] = { .pme_name = "PM_LD_REF_L1_LSU1", .pme_code = 0xc14, .pme_short_desc = "LSU1 L1 D cache load references", .pme_long_desc = "A load executed on unit 1", }, [ POWER4_PME_PM_MRK_DATA_FROM_L275_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L275_MOD", .pme_code = 0x7c76, .pme_short_desc = "Marked data loaded from L2.75 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of another MCM due to a marked demand load. ", }, [ POWER4_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x3004, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 0 and MSR[PR]=0)", }, [ POWER4_PME_PM_6INST_CLB_CYC ] = { .pme_name = "PM_6INST_CLB_CYC", .pme_code = 0x455, .pme_short_desc = "Cycles 6 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is an 8-deep, 4-wide instruction buffer. Fullness is indicated in the 8 valid bits associated with each of the 4-wide slots with full(0) correspanding to the number of cycles there are 8 instructions in the queue and full (7) corresponding to the number of cycles there is 1 instruction in the queue. This signal gives a real time history of the number of instruction quads valid in the instruction queue.", }, [ POWER4_PME_PM_LR_CTR_MAP_FULL_CYC ] = { .pme_name = "PM_LR_CTR_MAP_FULL_CYC", .pme_code = 0x206, .pme_short_desc = "Cycles LR/CTR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the lr/ctr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ POWER4_PME_PM_L2SC_MOD_INV ] = { .pme_name = "PM_L2SC_MOD_INV", .pme_code = 0xf27, .pme_short_desc = "L2 slice C transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", }, [ POWER4_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x1120, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized. Combined Unit 0 + Unit 1", }, [ POWER4_PME_PM_DATA_FROM_L275_MOD ] = { .pme_name = "PM_DATA_FROM_L275_MOD", .pme_code = 0x7c66, .pme_short_desc = "Data loaded from L2.75 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of another MCM due to a demand load. ", }, [ POWER4_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x906, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER4_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x326, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "Asserted when a non-canceled prefetch is made to the cache interface unit (CIU).", }, [ POWER4_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x8004, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER4_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x1c76, .pme_short_desc = "Marked data loaded from L3", .pme_long_desc = "DL1 was reloaded from the local L3 due to a marked demand load", }, [ POWER4_PME_PM_MRK_DATA_FROM_MEM ] = { .pme_name = "PM_MRK_DATA_FROM_MEM", .pme_code = 0x2c76, .pme_short_desc = "Marked data loaded from memory", .pme_long_desc = "DL1 was reloaded from memory due to a marked demand load", }, [ POWER4_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0xc01, .pme_short_desc = "LSU0 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4k boundary)", }, [ POWER4_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0x6c00, .pme_short_desc = "LRQ flushes", .pme_long_desc = "A load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER4_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0x5c00, .pme_short_desc = "SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER4_PME_PM_L2SC_MOD_TAG ] = { .pme_name = "PM_L2SC_MOD_TAG", .pme_code = 0xf26, .pme_short_desc = "L2 slice C transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A,B,and C.", } }; #endif papi-5.3.0/src/libpfm4/lib/events/intel_snb_events.h0000600003276200002170000022653112247131123022057 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snb (Intel Sandy Bridge) */ static const intel_x86_umask_t snb_agu_bypass_cancel[]={ { .uname = "COUNT", .udesc = "This event counts executed load operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_arith[]={ { .uname = "FPU_DIV_ACTIVE", .udesc = "Cycles that the divider is active, includes integer and floating point", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FPU_DIV", .udesc = "Number of cycles the divider is activated, includes integer and floating point", .uequiv = "FPU_DIV_ACTIVE:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_br_inst_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All macro conditional non-taken branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All macro conditional taken branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_DIRECT_JUMP", .udesc = "All macro unconditional non-taken branch instructions, excluding calls and indirects", .ucode = 0x4200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_JUMP", .udesc = "All macro unconditional taken branch instructions, excluding calls and indirects", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All non-taken indirect branches that are not calls nor returns", .ucode = 0x4400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_RETURN_NEAR", .udesc = "All taken indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All near executed branches instructions (not necessarily retired)", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_CONDITIONAL", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uequiv = "ALL_CONDITIONAL", .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_br_inst_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All taken and not taken macro branches including far branches (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "All taken and not taken macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "FAR_BRANCH", .udesc = "Number of far branch instructions retired (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "All macro direct and indirect near calls, does not count far calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_RETURN", .udesc = "Number of near ret instructions retired (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_TAKEN", .udesc = "Number of near branch taken instructions retired (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "All not taken macro branch instructions retired (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_br_misp_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All non-taken mispredicted macro conditional branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All taken mispredicted macro conditional branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All non-taken mispredicted indirect branches that are not calls nor returns", .ucode = 0x4400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken mispredicted indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_RETURN_NEAR", .udesc = "All non-taken mispredicted indirect branches that have a return mnemonic", .ucode = 0x4800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_RETURN_NEAR", .udesc = "All taken mispredicted indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_DIRECT_NEAR_CALL", .udesc = "All non-taken mispredicted non-indirect calls", .ucode = 0x5000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken mispredicted non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_NEAR_CALL", .udesc = "All nontaken mispredicted indirect calls, including both register and memory indirect", .ucode = 0x6000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken mispredicted indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All mispredicted macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_RETURN_NEAR", .udesc = "All mispredicted indirect branches that have a return mnemonic", .ucode = 0xc800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All mispredicted non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All mispredicted indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All mispredicted branch instructions", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_br_misp_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All mispredicted macro branches (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "All mispredicted macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "All macro direct and indirect near calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "Number of branch instructions retired that were mispredicted and not-taken (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TAKEN", .udesc = "Number of branch instructions retired that were mispredicted and taken (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_lock_cycles[]={ { .uname = "SPLIT_LOCK_UC_LOCK_DURATION", .udesc = "Cycles in which the L1D and L2 are locked, due to a UC lock or split lock", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CACHE_LOCK_DURATION", .udesc = "Cycles in which the L1D is locked", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_cpl_cycles[]={ { .uname = "RING0", .udesc = "Unhalted core cycles the thread was in ring 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING0_TRANS", .udesc = "Transitions from rings 1, 2, or 3 to ring 0", .uequiv = "RING0:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING123", .udesc = "Unhalted core cycles the thread was in rings 1, 2, or 3", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_cpu_clk_unhalted[]={ { .uname = "REF_P", .udesc = "Cycles when the core is unhalted (count at 100 Mhz)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_dsb2mite_switches[]={ { .uname = "COUNT", .udesc = "Number of DSB to MITE switches", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PENALTY_CYCLES", .udesc = "Cycles SB to MITE switches caused delay", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_dsb_fill[]={ { .uname = "ALL_CANCEL", .udesc = "Number of times a valid DSB fill has been cancelled for any reason", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "EXCEED_DSB_LINES", .udesc = "DSB Fill encountered > 3 DSB lines", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "OTHER_CANCEL", .udesc = "Number of times a valid DSB fill has been cancelled not because of exceeding way limit", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_dtlb_load_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes an page walk of any page size", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes an page walk of any page size", .ucode = 0x100, .uequiv = "MISS_CAUSES_A_WALK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "Number of DTLB lookups for loads which missed first level DTLB but hit second level DTLB (STLB); No page walk.", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Demand load miss in all TLB levels which causes a page walk that completes for any page size", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with a walk", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_dtlb_store_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uequiv = "MISS_CAUSES_A_WALK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "First level miss but second level hit; no page walk. Only relevant if multiple levels", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Miss in all TLB levels that causes a page walk that completes of any page size (4K/2M/4M/1G)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with this walk", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_fp_assist[]={ { .uname = "ANY", .udesc = "Cycles with any input/output SSE or FP assists", .ucode = 0x1e00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "SIMD_INPUT", .udesc = "Number of SIMD FP assists due to input values", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SIMD_OUTPUT", .udesc = "Number of SIMD FP assists due to output values", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_INPUT", .udesc = "Number of X87 assists due to input value", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_OUTPUT", .udesc = "Number of X87 assists due to output value", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_fp_comp_ops_exe[]={ { .uname = "X87", .udesc = "Number of X87 uops executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_PACKED_DOUBLE", .udesc = "Number of SSE double precision FP packed uops executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_SCALAR_SINGLE", .udesc = "Number of SSE single precision FP scalar uops executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_PACKED_SINGLE", .udesc = "Number of SSE single precision FP packed uops executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SCALAR_DOUBLE", .udesc = "Number of SSE double precision FP scalar uops executed", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_hw_interrupts[]={ { .uname = "RECEIVED", .udesc = "Number of hardware interrupts received by the processor", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_hw_pre_req[]={ { .uname = "L1D_MISS", .udesc = "Hardware prefetch requests that misses the L1D cache. A request is counted each time it accesses the cache and misses it, including if a block is applicable or if it hits the full buffer, for example. This accounts for both L1 streamer and IP-based Hw prefetchers", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_icache[]={ { .uname = "MISSES", .udesc = "Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_idq[]={ { .uname = "EMPTY", .udesc = "Cycles IDQ is empty", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS", .udesc = "Number of uops delivered to IDQ from MITE path", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS", .udesc = "Number of uops delivered to IDQ from DSB path", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by DSB", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by MITE", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS", .udesc = "Number of uops were delivered to IDQ from MS by either DSB or MITE", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from MITE (MITE active)", .uequiv = "MITE_UOPS:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from DSB (DSB active)", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by DSB", .uequiv = "MS_DSB_UOPS:c=1", .ucode = 0x1000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by MITE", .uequiv = "MS_MITE_UOPS:c=1", .ucode = 0x2000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ from MS by either BSD or MITE", .uequiv = "MS_UOPS:c=1", .ucode = 0x3000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_UOPS", .udesc = "Number of uops deliver from either DSB paths", .ucode = 0x1800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_CYCLES", .udesc = "Cycles MITE/MS deliver anything", .ucode = 0x1800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_UOPS", .udesc = "Number of uops delivered from either MITE paths", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_CYCLES", .udesc = "Cycles DSB/MS deliver anything", .ucode = 0x2400 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_UOPS", .udesc = "Number of uops delivered to IDQ from any path", .ucode = 0x3c00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_OCCUR", .udesc = "Occurences of DSB MS going active", .uequiv = "MS_DSB_UOPS:c=1:e=1", .ucode = 0x1000 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_idq_uops_not_delivered[]={ { .uname = "CORE", .udesc = "Number of non-delivered uops to RAT (use cmask to qualify further)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_ild_stall[]={ { .uname = "LCP", .udesc = "Stall caused by changing prefix length of the instruction", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IQ_FULL", .udesc = "Stall cycles due to IQ full", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_insts_written_to_iq[]={ { .uname = "INSTS", .udesc = "Number of instructions written to IQ every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_inst_retired[]={ { .uname = "ANY_P", .udesc = "Number of instructions retired", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PREC_DIST", .udesc = "Precise instruction retired event to reduce effect of PEBS shadow IP distribution (Precise Event)", .ucntmsk = 0x2, .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_int_misc[]={ { .uname = "RAT_STALL_CYCLES", .udesc = "Cycles RAT external stall is sent to IDQ for this thread", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RECOVERY_CYCLES", .udesc = "Cycles waiting to be recovered after Machine Clears due to all other cases except JEClear", .ucode = 0x300 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RECOVERY_STALLS_COUNT", .udesc = "Number of times need to wait after Machine Clears due to all other cases except JEClear", .ucode = 0x300 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_itlb[]={ { .uname = "ITLB_FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uequiv = "ITLB_FLUSH", .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l1d[]={ { .uname = "ALLOCATED_IN_M", .udesc = "Number of allocations of L1D cache lines in modified (M) state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_M_REPLACEMENT", .udesc = "Number of cache lines in M-state evicted of L1D due to snoop HITM or dirty line replacement", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_EVICT", .udesc = "Number of modified lines evicted from L1D due to replacement", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REPLACEMENT", .udesc = "Number of cache lines brought into the L1D cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l1d_blocks[]={ { .uname = "BANK_CONFLICT", .udesc = "Number of dispatched loads cancelled due to L1D bank conflicts with other load ports", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "BANK_CONFLICT_CYCLES", .udesc = "Cycles with l1d blocks due to bank conflicts", .ucode = 0x500, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_l1d_pend_miss[]={ { .uname = "OCCURRENCES", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "PENDING:e=1:c=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "EDGE", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "OCCURRENCES", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING", .udesc = "Number of L1D load misses outstanding every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING_CYCLES", .udesc = "Cycles with L1D load misses outstanding", .uequiv = "PENDING:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l2_l1d_wb_rqsts[]={ { .uname = "HIT_E", .udesc = "Non rejected writebacks from L1D to L2 cache lines in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "Non rejected writebacks from L1D to L2 cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l2_lines_in[]={ { .uname = "ANY", .udesc = "L2 cache lines filling (counting does not cover rejects)", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "E", .udesc = "L2 cache lines in E state (counting does not cover rejects)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I", .udesc = "L2 cache lines in I state (counting does not cover rejects)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S", .udesc = "L2 cache lines in S state (counting does not cover rejects)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l2_lines_out[]={ { .uname = "DEMAND_CLEAN", .udesc = "L2 clean line evicted by a demand", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "L2 dirty line evicted by a demand", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_CLEAN", .udesc = "L2 clean line evicted by a prefetch", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_DIRTY", .udesc = "L2 dirty line evicted by an MLC Prefetch", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRTY_ANY", .udesc = "Any L2 dirty line evicted (does not cover rejects)", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l2_rqsts[]={ { .uname = "ALL_CODE_RD", .udesc = "Any ifetch request to L2 cache", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_HIT", .udesc = "L2 cache hits when fetching instructions", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_MISS", .udesc = "L2 cache misses when fetching instructions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_DATA_RD", .udesc = "Demand data read requests to L2 cache", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_RD_HIT", .udesc = "Demand data read requests that hit L2", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "Any L2 HW prefetch request to L2 cache", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_HIT", .udesc = "Requests from the L2 hardware prefetchers that hit L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_MISS", .udesc = "Requests from the L2 hardware prefetchers that miss L2 cache", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_ANY", .udesc = "Any RFO requests to L2 cache", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HITS", .udesc = "RFO requests that hit L2 cache", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MISS", .udesc = "RFO requests that miss L2 cache", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l2_store_lock_rqsts[]={ { .uname = "HIT_E", .udesc = "RFOs that hit cache lines in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "RFOs that miss cache (I state)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "RFOs that hit cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "RFOs that access cache lines in any state", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_l2_trans[]={ { .uname = "ALL", .udesc = "Transactions accessing MLC pipe", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD", .udesc = "L2 cache accesses when fetching instructions", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writebacks that access L2 cache", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOAD", .udesc = "Demand Data Read* requests that access L2 cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_FILL", .udesc = "L2 fill requests that access L2 cache", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_WB", .udesc = "L2 writebacks that access L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PREFETCH", .udesc = "L2 or L3 HW prefetches that access L2 cache (including rejects)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "RFO requests that access L2 cache", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_ld_blocks[]={ { .uname = "DATA_UNKNOWN", .udesc = "Blocked loads due to store buffer blocks with unknown data", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORE_FORWARD", .udesc = "Loads blocked by overlapping with store buffer that cannot be forwarded", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NO_SR", .udesc = "Number of split loads blocked due to resource not available", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BLOCK", .udesc = "Number of cases where any load is blocked but has not DCU miss", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_ld_blocks_partial[]={ { .uname = "ADDRESS_ALIAS", .udesc = "False dependencies in MOB due to partial compare on address", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_STA_BLOCK", .udesc = "Number of times that load operations are temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_load_hit_pre[]={ { .uname = "HW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for HW prefetch", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for SW prefetch", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_l3_lat_cache[]={ { .uname = "MISS", .udesc = "Core-originated cacheable demand requests missed L3", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCE", .udesc = "Core-originated cacheable demand requests that refer to L3", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_machine_clears[]={ { .uname = "MASKMOV", .udesc = "The number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MEMORY_ORDERING", .udesc = "Number of Memory Ordering Machine Clears detected", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_mem_load_uops_llc_hit_retired[]={ { .uname = "XSNP_HIT", .udesc = "Load LLC Hit and a cross-core Snoop hits in on-pkg core cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_HITM", .udesc = "Load had HitM Response from a core on same socket (shared LLC) (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_MISS", .udesc = "Load LLC Hit and a cross-core Snoop missed in on-pkg core cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_NONE", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_mem_load_uops_misc_retired[]={ { .uname = "LLC_MISS", .udesc = "Counts load driven L3 misses and some non simd split loads (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_mem_load_uops_retired[]={ { .uname = "HIT_LFB", .udesc = "A load missed L1D but hit the Fill Buffer (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1_HIT", .udesc = "Load hit in nearest-level (L1D) cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Load hit in mid-level (L2) cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_HIT", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Retired load uops which data sources were data missed LLC (excluding unknown data source)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_mem_trans_retired[]={ { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory load instructions retired above programmed clocks, minimum threshold value is 3 (Precise Event and ldlat required)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT, }, { .uname = "PRECISE_STORE", .udesc = "Capture where stores occur, must use with PEBS (Precise Event required)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_mem_uops_retired[]={ { .uname = "ALL_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uequiv = "ALL_LOADS", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uequiv = "ALL_STORES", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_LOADS", .udesc = "Locked retired loads (Precise Event)", .ucode = 0x2100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_STORES", .udesc = "Locked retired stores (Precise Event)", .ucode = 0x2200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_LOADS", .udesc = "Retired loads causing cacheline splits (Precise Event)", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_STORES", .udesc = "Retired stores causing cacheline splits (Precise Event)", .ucode = 0x4200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_LOADS", .udesc = "STLB misses dues to retired loads (Precise Event)", .ucode = 0x1100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_STORES", .udesc = "STLB misses dues to retired stores (Precise Event)", .ucode = 0x1200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_misalign_mem_ref[]={ { .uname = "LOADS", .udesc = "Speculative cache-line split load uops dispatched to the L1D", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORES", .udesc = "Speculative cache-line split Store-address uops dispatched to L1D", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_offcore_requests[]={ { .uname = "ALL_DATA_RD", .udesc = "Demand and prefetch read requests sent to uncore", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_READ", .udesc = "Demand and prefetch read requests sent to uncore", .uequiv = "ALL_DATA_RD", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Offcore code read requests, including cacheable and un-cacheables", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand Data Read requests sent to uncore", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Offcore Demand RFOs, includes regular RFO, Locks, ItoM", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_offcore_requests_buffer[]={ { .uname = "SQ_FULL", .udesc = "Offcore requests buffer cannot take more entries for this thread core", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_offcore_requests_outstanding[]={ { .uname = "ALL_DATA_RD_CYCLES", .udesc = "Cycles with cacheable data read transactions in the superQ", .uequiv = "ALL_DATA_RD:c=1", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD_CYCLES", .udesc = "Cycles with demand code reads transactions in the superQ", .uequiv = "DEMAND_CODE_RD:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_CYCLES", .udesc = "Cycles with demand data read transactions in the superQ", .uequiv = "DEMAND_DATA_RD:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_RD", .udesc = "Cacheable data read transactions in the superQ every cycle", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Code read transactions in the superQ every cycle", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand data read transactions in the superQ every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Outstanding RFO (store) transactions in the superQ every cycle", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_CYCLES", .udesc = "Cycles with outstanding RFO (store) transactions in the superQ", .uequiv = "DEMAND_RFO:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_other_assists[]={ { .uname = "ITLB_MISS_RETIRED", .udesc = "Number of instructions that experienced an ITLB miss", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AVX_TO_SSE", .udesc = "Number of transitions from AVX-256 to legacy SSE when penalty applicable", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_TO_AVX", .udesc = "Number of transitions from legacy SSE to AVX-256 when penalty applicable", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_partial_rat_stalls[]={ { .uname = "FLAGS_MERGE_UOP", .udesc = "Number of flags-merge uops in flight in each cycle", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES_FLAGS_MERGE_UOP", .udesc = "Cycles in which flags-merge uops in flight", .uequiv = "FLAGS_MERGE_UOP:c=1", .ucode = 0x2000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MUL_SINGLE_UOP", .udesc = "Number of Multiply packed/scalar single precision uops allocated", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SLOW_LEA_WINDOW", .udesc = "Number of cycles with at least one slow LEA uop allocated", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_resource_stalls[]={ { .uname = "ANY", .udesc = "Cycles stalled due to Resource Related reason", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "LB", .udesc = "Cycles stalled due to lack of load buffers", .ucode = 0x200, }, { .uname = "RS", .udesc = "Cycles stalled due to no eligible RS entry available", .ucode = 0x400, }, { .uname = "SB", .udesc = "Cycles stalled due to no store buffers available (not including draining from sync)", .ucode = 0x800, }, { .uname = "ROB", .udesc = "Cycles stalled due to re-order buffer full", .ucode = 0x1000, }, { .uname = "FCSW", .udesc = "Cycles stalled due to writing the FPU control word", .ucode = 0x2000, }, { .uname = "MXCSR", .udesc = "Cycles stalled due to the MXCSR register ranme occurring too close to a previous MXCSR rename", .ucode = 0x4000, }, { .uname = "MEM_RS", .udesc = "Cycles stalled due to LB, SB or RS being completely in use", .ucode = 0xe00, .uequiv = "LB:SB:RS", }, }; static const intel_x86_umask_t snb_resource_stalls2[]={ { .uname = "ALL_FL_EMPTY", .udesc = "Cycles stalled due to free list empty", .ucode = 0xc00, }, { .uname = "ALL_PRF_CONTROL", .udesc = "Cycles stalls due to control structures full for physical registers", .ucode = 0xf00, }, { .uname = "ANY_PRF_CONTROL", .udesc = "Cycles stalls due to control structures full for physical registers", .ucode = 0xf00, .uequiv = "ALL_PRF_CONTROL", }, { .uname = "BOB_FULL", .udesc = "Cycles Allocator is stalled due Branch Order Buffer", .ucode = 0x4000, }, { .uname = "OOO_RSRC", .udesc = "Cycles stalled due to out of order resources full", .ucode = 0x4f00, }, }; static const intel_x86_umask_t snb_rob_misc_events[]={ { .uname = "LBR_INSERTS", .udesc = "Count each time an new LBR record is saved by HW", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_rs_events[]={ { .uname = "EMPTY_CYCLES", .udesc = "Cycles the RS is empty for this thread", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_simd_fp_256[]={ { .uname = "PACKED_SINGLE", .udesc = "Counts 256-bit packed single-precision", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_DOUBLE", .udesc = "Counts 256-bit packed double-precision", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_sq_misc[]={ { .uname = "SPLIT_LOCK", .udesc = "Split locks in SQ", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_tlb_flush[]={ { .uname = "DTLB_THREAD", .udesc = "Number of DTLB flushes of thread-specific entries", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_ANY", .udesc = "Number of STLB flushes", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_uops_dispatched[]={ { .uname = "CORE", .udesc = "Counts total number of uops dispatched from any thread", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Counts number of cycles no uops were dispatched on this thread", .uequiv = "THREAD:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD", .udesc = "Counts total number of uops to be dispatched per-thread each cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_uops_dispatched_port[]={ { .uname = "PORT_0", .udesc = "Cycles which a Uop is dispatched on port 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_1", .udesc = "Cycles which a Uop is dispatched on port 1", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2_LD", .udesc = "Cycles in which a load uop is dispatched on port 2", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2_STA", .udesc = "Cycles in which a store uop is dispatched on port 2", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2", .udesc = "Cycles in which a uop is dispatched on port 2", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3_LD", .udesc = "Cycles in which a load uop is disptached on port 3", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3_STA", .udesc = "Cycles in which a store uop is disptached on port 3", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3", .udesc = "Cycles in which a uop is disptached on port 3", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_4", .udesc = "Cycles which a uop is dispatched on port 4", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_5", .udesc = "Cycles which a Uop is dispatched on port 5", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_uops_issued[]={ { .uname = "ANY", .udesc = "Number of uops issued by the RAT to the Reservation Station (RS)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles no uops issued on this core (by any thread)", .uequiv = "ANY:c=1:i=1:t=1", .ucode = 0x100 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no uops issued by this thread", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_uops_retired[]={ { .uname = "ALL", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uequiv= "ALL", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "RETIRE_SLOTS", .udesc = "Number of retirement slots used (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no executable uop retired (Precise Event)", .uequiv = "ALL:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Total cycles using precise uop retired event (Precise Event)", .uequiv = "ALL:c=16", .ucode = 0x100 | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snb_offcore_response[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 1ULL << (0 + 8), .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO prefetches", .ucode = 1ULL << (1 + 8), .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 1ULL << (2 + 8), .grpid = 0, }, { .uname = "WB", .udesc = "Request: number of writebacks (modified to exclusive) transactions", .ucode = 1ULL << (3 + 8), .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: number of data cacheline reads generated by L2 prefetchers", .ucode = 1ULL << (4 + 8), .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetchers", .ucode = 1ULL << (5 + 8), .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: number of code reads generated by L2 prefetchers", .ucode = 1ULL << (6 + 8), .grpid = 0, }, { .uname = "PF_LLC_DATA_RD", .udesc = "Request: number of L3 prefetcher requests to L2 for loads", .ucode = 1ULL << (7 + 8), .grpid = 0, }, { .uname = "PF_LLC_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetcher", .ucode = 1ULL << (8 + 8), .grpid = 0, }, { .uname = "PF_LLC_IFETCH", .udesc = "Request: number of L2 prefetcher requests to L3 for instruction fetches", .ucode = 1ULL << (9 + 8), .grpid = 0, }, { .uname = "BUS_LOCKS", .udesc = "Request: number bus lock and split lock requests", .ucode = 1ULL << (10 + 8), .grpid = 0, }, { .uname = "STRM_ST", .udesc = "Request: number of streaming store requests", .ucode = 1ULL << (11 + 8), .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 1ULL << (15+8), .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH | PF_LLC_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH:PF_LLC_IFETCH", .ucode = 0x24400, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all request umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER", .ucode = 0x8fff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: combination of DMND_DATA | PF_DATA_RD | PF_LLC_DATA_RD", .uequiv = "DMND_DATA_RD:PF_DATA_RD:PF_LLC_DATA_RD", .ucode = 0x9100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO | PF_LLC_RFO", .uequiv = "DMND_RFO:PF_RFO:PF_LLC_RFO", .ucode = 0x12200, .grpid = 0, }, { .uname = "ANY_RESPONSE", .udesc = "Response: count any response type", .ucode = 1ULL << (16+8), .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_EXCL_GRP_GT, .grpid = 1, }, { .uname = "NO_SUPP", .udesc = "Supplier: counts number of times supplier information is not available", .ucode = 1ULL << (17+8), .grpid = 1, }, { .uname = "LLC_HITM", .udesc = "Supplier: counts L3 hits in M-state (initial lookup)", .ucode = 1ULL << (18+8), .grpid = 1, }, { .uname = "LLC_HITE", .udesc = "Supplier: counts L3 hits in E-state", .ucode = 1ULL << (19+8), .grpid = 1, }, { .uname = "LLC_HITS", .udesc = "Supplier: counts L3 hits in S-state", .ucode = 1ULL << (20+8), .grpid = 1, }, { .uname = "LLC_HITF", .udesc = "Supplier: counts L3 hits in F-state", .ucode = 1ULL << (21+8), .grpid = 1, }, { .uname = "LLC_MISS_LOCAL_DRAM", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .uequiv = "MISS_DRAM", .grpid = 1, }, { .uname = "LLC_MISS_LOCAL", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .uequiv = "MISS_DRAM", .grpid = 1, }, { .uname = "MISS_DRAM", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .grpid = 1, }, { .uname = "LLC_HITMESF", .udesc = "Supplier: counts L3 hits in any state (M, E, S, F)", .ucode = 0xfULL << (18+8), .uequiv = "LLC_HITM:LLC_HITE:LLC_HITS:LLC_HITF", .grpid = 1, }, { .uname = "SNP_NONE", .udesc = "Snoop: counts number of times no snoop-related information is available", .ucode = 1ULL << (31+8), .grpid = 2, }, { .uname = "SNP_NOT_NEEDED", .udesc = "Snoop: counts the number of times no snoop was needed to satisfy the request", .ucode = 1ULL << (32+8), .grpid = 2, }, { .uname = "NO_SNP_NEEDED", .udesc = "Snoop: counts the number of times no snoop was needed to satisfy the request", .ucode = 1ULL << (32+8), .uequiv = "SNP_NOT_NEEDED", .grpid = 2, }, { .uname = "SNP_MISS", .udesc = "Snoop: counts number of times a snoop was needed and it missed all snooped caches", .ucode = 1ULL << (33+8), .grpid = 2, }, { .uname = "SNP_NO_FWD", .udesc = "Snoop: counts number of times a snoop was needed and it hit in at leas one snooped cache", .ucode = 1ULL << (34+8), .grpid = 2, }, { .uname = "SNP_FWD", .udesc = "Snoop: counts number of times a snoop was needed and data was forwarded from a remote socket", .ucode = 1ULL << (35+8), .grpid = 2, }, { .uname = "HITM", .udesc = "Snoop: counts number of times a snoop was needed and it hitM-ed in local or remote cache", .ucode = 1ULL << (36+8), .grpid = 2, }, { .uname = "NON_DRAM", .udesc = "Snoop: counts number of times target was a non-DRAM system address. This includes MMIO transactions", .ucode = 1ULL << (37+8), .grpid = 2, }, { .uname = "SNP_ANY", .udesc = "Snoop: any snoop reason", .ucode = 0x7fULL << (31+8), .uequiv = "SNP_NONE:SNP_NOT_NEEDED:SNP_MISS:SNP_NO_FWD:SNP_FWD:HITM:NON_DRAM", .uflags= INTEL_X86_DFL, .grpid = 2, }, }; static const intel_x86_umask_t snb_baclears[]={ { .uname = "ANY", .udesc = "Counts the number of times the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_cycle_activity[]={ { .uname = "CYCLES_L2_PENDING", .udesc = "Cycles with pending L2 miss loads", .ucode = 0x0100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, .ucntmsk= 0xf, }, { .uname = "CYCLES_L1D_PENDING", .udesc = "Cycles with pending L1D load cache misses", .ucode = 0x0200 | (0x2 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CYCLES_NO_DISPATCH", .udesc = "Cycles of dispatch stalls", .ucode = 0x0400 | (0x4 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L2_PENDING", .udesc = "Execution stalls due to L2 pending loads", .ucode = 0x0500 | (0x5 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L1D_PENDING", .udesc = "Execution stalls due to L1D pending loads", .ucode = 0x0600 | (0x6 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snb_pe[]={ { .name = "AGU_BYPASS_CANCEL", .desc = "Number of executed load operations with all the following traits: 1. addressing of the format [base + offset], 2. the offset is between 1 and 2047, 3. the address specified in the base register is in one page and the address [base+offset] is in another page", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb6, .numasks = LIBPFM_ARRAY_SIZE(snb_agu_bypass_cancel), .ngrp = 1, .umasks = snb_agu_bypass_cancel, }, { .name = "ARITH", .desc = "Counts arithmetic multiply operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x14, .numasks = LIBPFM_ARRAY_SIZE(snb_arith), .ngrp = 1, .umasks = snb_arith, }, { .name = "BACLEARS", .desc = "Branch resteered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(snb_baclears), .ngrp = 1, .umasks = snb_baclears, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x88, .numasks = LIBPFM_ARRAY_SIZE(snb_br_inst_exec), .ngrp = 1, .umasks = snb_br_inst_exec, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_br_inst_retired), .ngrp = 1, .umasks = snb_br_inst_retired, }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x89, .numasks = LIBPFM_ARRAY_SIZE(snb_br_misp_exec), .ngrp = 1, .umasks = snb_br_misp_exec, }, { .name = "BR_MISP_RETIRED", .desc = "Mispredicted retired branches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc5, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_br_misp_retired), .ngrp = 1, .umasks = snb_br_misp_retired, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_MISP_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc5, }, { .name = "LOCK_CYCLES", .desc = "Locked cycles in L1D and L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(snb_lock_cycles), .ngrp = 1, .umasks = snb_lock_cycles, }, { .name = "CPL_CYCLES", .desc = "Unhalted core cycles at a specific ring level", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5c, .numasks = LIBPFM_ARRAY_SIZE(snb_cpl_cycles), .ngrp = 1, .umasks = snb_cpl_cycles, }, { .name = "CPU_CLK_UNHALTED", .desc = "Cycles when processor is not in halted state", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(snb_cpu_clk_unhalted), .ngrp = 1, .umasks = snb_cpu_clk_unhalted, }, { .name = "DSB2MITE_SWITCHES", .desc = "Number of DSB to MITE switches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xab, .numasks = LIBPFM_ARRAY_SIZE(snb_dsb2mite_switches), .ngrp = 1, .umasks = snb_dsb2mite_switches, }, { .name = "DSB_FILL", .desc = "DSB fills", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xac, .numasks = LIBPFM_ARRAY_SIZE(snb_dsb_fill), .ngrp = 1, .umasks = snb_dsb_fill, }, { .name = "DTLB_LOAD_MISSES", .desc = "Data TLB load misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(snb_dtlb_load_misses), .ngrp = 1, .umasks = snb_dtlb_load_misses, }, { .name = "DTLB_STORE_MISSES", .desc = "Data TLB store misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(snb_dtlb_store_misses), .ngrp = 1, .umasks = snb_dtlb_store_misses, }, { .name = "FP_ASSIST", .desc = "X87 Floating point assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xca, .numasks = LIBPFM_ARRAY_SIZE(snb_fp_assist), .ngrp = 1, .umasks = snb_fp_assist, }, { .name = "FP_COMP_OPS_EXE", .desc = "Counts number of floating point events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(snb_fp_comp_ops_exe), .ngrp = 1, .umasks = snb_fp_comp_ops_exe, }, { .name = "HW_INTERRUPTS", .desc = "Number of hardware interrupts received by the processor", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(snb_hw_interrupts), .ngrp = 1, .umasks = snb_hw_interrupts, }, { .name = "HW_PRE_REQ", .desc = "Hardware prefetch requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x4e, .numasks = LIBPFM_ARRAY_SIZE(snb_hw_pre_req), .ngrp = 1, .umasks = snb_hw_pre_req, }, { .name = "ICACHE", .desc = "Instruction Cache accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(snb_icache), .ngrp = 1, .umasks = snb_icache, }, { .name = "IDQ", .desc = "IDQ operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x79, .numasks = LIBPFM_ARRAY_SIZE(snb_idq), .ngrp = 1, .umasks = snb_idq, }, { .name = "IDQ_UOPS_NOT_DELIVERED", .desc = "Uops not delivered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x9c, .numasks = LIBPFM_ARRAY_SIZE(snb_idq_uops_not_delivered), .ngrp = 1, .umasks = snb_idq_uops_not_delivered, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(snb_ild_stall), .ngrp = 1, .umasks = snb_ild_stall, }, { .name = "INSTS_WRITTEN_TO_IQ", .desc = "Instructions written to IQ", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x17, .numasks = LIBPFM_ARRAY_SIZE(snb_insts_written_to_iq), .ngrp = 1, .umasks = snb_insts_written_to_iq, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_inst_retired), .ngrp = 1, .umasks = snb_inst_retired, }, { .name = "INSTRUCTION_RETIRED", .desc = "Number of instructions at retirement", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INT_MISC", .desc = "Miscellaneous internals", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd, .numasks = LIBPFM_ARRAY_SIZE(snb_int_misc), .ngrp = 1, .umasks = snb_int_misc, }, { .name = "ITLB", .desc = "Instruction TLB", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xae, .numasks = LIBPFM_ARRAY_SIZE(snb_itlb), .ngrp = 1, .umasks = snb_itlb, }, { .name = "ITLB_MISSES", .desc = "Instruction TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(snb_dtlb_store_misses), .ngrp = 1, .umasks = snb_dtlb_store_misses, /* identical to actual umasks list for this event */ }, { .name = "L1D", .desc = "L1D cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x51, .numasks = LIBPFM_ARRAY_SIZE(snb_l1d), .ngrp = 1, .umasks = snb_l1d, }, { .name = "L1D_BLOCKS", .desc = "L1D is blocking", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xbf, .numasks = LIBPFM_ARRAY_SIZE(snb_l1d_blocks), .ngrp = 1, .umasks = snb_l1d_blocks, }, { .name = "L1D_PEND_MISS", .desc = "L1D pending misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x4, .code = 0x48, .numasks = LIBPFM_ARRAY_SIZE(snb_l1d_pend_miss), .ngrp = 1, .umasks = snb_l1d_pend_miss, }, { .name = "L2_L1D_WB_RQSTS", .desc = "Writeback requests from L1D to L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_l1d_wb_rqsts), .ngrp = 1, .umasks = snb_l2_l1d_wb_rqsts, }, { .name = "L2_LINES_IN", .desc = "L2 lines alloacated", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf1, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_lines_in), .ngrp = 1, .umasks = snb_l2_lines_in, }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf2, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_lines_out), .ngrp = 1, .umasks = snb_l2_lines_out, }, { .name = "L2_RQSTS", .desc = "L2 requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_rqsts), .ngrp = 1, .umasks = snb_l2_rqsts, }, { .name = "L2_STORE_LOCK_RQSTS", .desc = "L2 store lock requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_store_lock_rqsts), .ngrp = 1, .umasks = snb_l2_store_lock_rqsts, }, { .name = "L2_TRANS", .desc = "L2 transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf0, .numasks = LIBPFM_ARRAY_SIZE(snb_l2_trans), .ngrp = 1, .umasks = snb_l2_trans, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for L3_LAT_CACHE:MISS", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:MISS", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LLC_MISSES", .desc = "Alias for LAST_LEVEL_CACHE_MISSES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_MISSES", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for L3_LAT_CACHE:REFERENCE", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:REFERENCE", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LLC_REFERENCES", .desc = "Alias for LAST_LEVEL_CACHE_REFERENCES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_REFERENCES", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LD_BLOCKS", .desc = "Blocking loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(snb_ld_blocks), .ngrp = 1, .umasks = snb_ld_blocks, }, { .name = "LD_BLOCKS_PARTIAL", .desc = "Partial load blocks", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(snb_ld_blocks_partial), .ngrp = 1, .umasks = snb_ld_blocks_partial, }, { .name = "LOAD_HIT_PRE", .desc = "Load dispatches that hit fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(snb_load_hit_pre), .ngrp = 1, .umasks = snb_load_hit_pre, }, { .name = "L3_LAT_CACHE", .desc = "Core-originated cacheable demand requests to L3", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(snb_l3_lat_cache), .ngrp = 1, .umasks = snb_l3_lat_cache, }, { .name = "MACHINE_CLEARS", .desc = "Machine clear asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(snb_machine_clears), .ngrp = 1, .umasks = snb_machine_clears, }, { .name = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .desc = "L3 hit loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = snb_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_LLC_HIT_RETIRED", .desc = "L3 hit loads uops retired (deprecated use MEM_LOAD_UOPS_LLC_HIT_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .equiv = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = snb_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_UOPS_MISC_RETIRED", .desc = "Loads and some non simd split loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_misc_retired), .ngrp = 1, .umasks = snb_mem_load_uops_misc_retired, }, { .name = "MEM_LOAD_MISC_RETIRED", .desc = "Loads and some non simd split loads uops retired (deprecated use MEM_LOAD_UOPS_MISC_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd4, .equiv = "MEM_LOAD_UOPS_MISC_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_misc_retired), .ngrp = 1, .umasks = snb_mem_load_uops_misc_retired, }, { .name = "MEM_LOAD_UOPS_RETIRED", .desc = "Memory loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_retired), .ngrp = 1, .umasks = snb_mem_load_uops_retired, }, { .name = "MEM_LOAD_RETIRED", .desc = "Memory loads uops retired (deprecated use MEM_LOAD_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .equiv = "MEM_LOAD_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_load_uops_retired), .ngrp = 1, .umasks = snb_mem_load_uops_retired, }, { .name = "MEM_TRANS_RETIRED", .desc = "Memory transactions retired", .modmsk = INTEL_V3_ATTRS | _INTEL_X86_ATTR_LDLAT, .cntmsk = 0x8, .code = 0xcd, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_trans_retired), .ngrp = 1, .umasks = snb_mem_trans_retired, }, { .name = "MEM_UOPS_RETIRED", .desc = "Memory uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_uops_retired), .ngrp = 1, .umasks = snb_mem_uops_retired, }, { .name = "MEM_UOP_RETIRED", .desc = "Memory uops retired (deprecated use MEM_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .equiv = "MEM_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_mem_uops_retired), .ngrp = 1, .umasks = snb_mem_uops_retired, }, { .name = "MISALIGN_MEM_REF", .desc = "Misaligned memory references", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(snb_misalign_mem_ref), .ngrp = 1, .umasks = snb_misalign_mem_ref, }, { .name = "OFFCORE_REQUESTS", .desc = "Offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(snb_offcore_requests), .ngrp = 1, .umasks = snb_offcore_requests, }, { .name = "OFFCORE_REQUESTS_BUFFER", .desc = "Offcore requests buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb2, .numasks = LIBPFM_ARRAY_SIZE(snb_offcore_requests_buffer), .ngrp = 1, .umasks = snb_offcore_requests_buffer, }, { .name = "OFFCORE_REQUESTS_OUTSTANDING", .desc = "Outstanding offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(snb_offcore_requests_outstanding), .ngrp = 1, .umasks = snb_offcore_requests_outstanding, }, { .name = "OTHER_ASSISTS", .desc = "Count hardware assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc1, .numasks = LIBPFM_ARRAY_SIZE(snb_other_assists), .ngrp = 1, .umasks = snb_other_assists, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Partial Register Allocation Table stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x59, .numasks = LIBPFM_ARRAY_SIZE(snb_partial_rat_stalls), .ngrp = 1, .umasks = snb_partial_rat_stalls, }, { .name = "RESOURCE_STALLS", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa2, .numasks = LIBPFM_ARRAY_SIZE(snb_resource_stalls), .ngrp = 1, .umasks = snb_resource_stalls, }, { .name = "RESOURCE_STALLS2", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5b, .numasks = LIBPFM_ARRAY_SIZE(snb_resource_stalls2), .ngrp = 1, .umasks = snb_resource_stalls2, }, { .name = "ROB_MISC_EVENTS", .desc = "Reorder buffer events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(snb_rob_misc_events), .ngrp = 1, .umasks = snb_rob_misc_events, }, { .name = "RS_EVENTS", .desc = "Reservation station events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5e, .numasks = LIBPFM_ARRAY_SIZE(snb_rs_events), .ngrp = 1, .umasks = snb_rs_events, }, { .name = "SIMD_FP_256", .desc = "Counts 256-bit packed floating point instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x11, .numasks = LIBPFM_ARRAY_SIZE(snb_simd_fp_256), .ngrp = 1, .umasks = snb_simd_fp_256, }, { .name = "SQ_MISC", .desc = "SuperQ events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf4, .numasks = LIBPFM_ARRAY_SIZE(snb_sq_misc), .ngrp = 1, .umasks = snb_sq_misc, }, { .name = "TLB_FLUSH", .desc = "TLB flushes", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xbd, .numasks = LIBPFM_ARRAY_SIZE(snb_tlb_flush), .ngrp = 1, .umasks = snb_tlb_flush, }, { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "UOPS_DISPATCHED", .desc = "Uops dispatched", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(snb_uops_dispatched), .ngrp = 1, .umasks = snb_uops_dispatched, }, { .name = "UOPS_DISPATCHED_PORT", .desc = "Uops dispatch to specific ports", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa1, .numasks = LIBPFM_ARRAY_SIZE(snb_uops_dispatched_port), .ngrp = 1, .umasks = snb_uops_dispatched_port, }, { .name = "UOPS_ISSUED", .desc = "Uops issued", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe, .numasks = LIBPFM_ARRAY_SIZE(snb_uops_issued), .ngrp = 1, .umasks = snb_uops_issued, }, { .name = "UOPS_RETIRED", .desc = "Uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_uops_retired), .ngrp = 1, .umasks = snb_uops_retired, }, { .name = "CYCLE_ACTIVITY", .desc = "Stalled cycles", .modmsk = INTEL_V3_ATTRS & ~_INTEL_X86_ATTR_C, .cntmsk = 0xff, .code = 0xa3, .numasks = LIBPFM_ARRAY_SIZE(snb_cycle_activity), .ngrp = 1, .umasks = snb_cycle_activity, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(snb_offcore_response), .ngrp = 3, .umasks = snb_offcore_response, }, { .name = "OFFCORE_RESPONSE_1", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1bb, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(snb_offcore_response), .ngrp = 3, .umasks = snb_offcore_response, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/perf_events.h0000600003276200002170000002542412247131124021035 0ustar ralphundrgrad/* * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #define CACHE_ST_ACCESS(n, d, e) \ {\ .name = #n"-STORES",\ .desc = d" store accesses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":WRITE:ACCESS"\ },\ {\ .name = #n"-STORE-MISSES",\ .desc = d" store misses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":WRITE:MISS"\ } #define CACHE_PF_ACCESS(n, d, e) \ {\ .name = #n"-PREFETCHES",\ .desc = d" prefetch accesses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":PREFETCH:ACCESS"\ },\ {\ .name = #n"-PREFETCH-MISSES",\ .desc = d" prefetch misses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":PREFETCH:MISS"\ } #define CACHE_LD_ACCESS(n, d, e) \ {\ .name = #n"-LOADS",\ .desc = d" load accesses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":READ:ACCESS"\ },\ {\ .name = #n"-LOAD-MISSES",\ .desc = d" load misses",\ .id = PERF_COUNT_HW_CACHE_##e,\ .type = PERF_TYPE_HW_CACHE,\ .modmsk = PERF_ATTR_HW,\ .umask_ovfl_idx = -1,\ .equiv = "PERF_COUNT_HW_CACHE_"#e":READ:MISS"\ } #define CACHE_ACCESS(n, d, e) \ CACHE_LD_ACCESS(n, d, e), \ CACHE_ST_ACCESS(n, d, e), \ CACHE_PF_ACCESS(n, d, e) #define ICACHE_ACCESS(n, d, e) \ CACHE_LD_ACCESS(n, d, e), \ CACHE_PF_ACCESS(n, d, e) static perf_event_t perf_static_events[]={ PCL_EVT_HW(CPU_CYCLES), PCL_EVT_AHW(CYCLES, CPU_CYCLES), PCL_EVT_AHW(CPU-CYCLES, CPU_CYCLES), PCL_EVT_HW(INSTRUCTIONS), PCL_EVT_AHW(INSTRUCTIONS, INSTRUCTIONS), PCL_EVT_HW(CACHE_REFERENCES), PCL_EVT_AHW(CACHE-REFERENCES, CACHE_REFERENCES), PCL_EVT_HW(CACHE_MISSES), PCL_EVT_AHW(CACHE-MISSES,CACHE_MISSES), PCL_EVT_HW(BRANCH_INSTRUCTIONS), PCL_EVT_AHW(BRANCH-INSTRUCTIONS, BRANCH_INSTRUCTIONS), PCL_EVT_AHW(BRANCHES, BRANCH_INSTRUCTIONS), PCL_EVT_HW(BRANCH_MISSES), PCL_EVT_AHW(BRANCH-MISSES, BRANCH_MISSES), PCL_EVT_HW(BUS_CYCLES), PCL_EVT_AHW(BUS-CYCLES, BUS_CYCLES), PCL_EVT_HW(STALLED_CYCLES_FRONTEND), PCL_EVT_AHW(STALLED-CYCLES-FRONTEND, STALLED_CYCLES_FRONTEND), PCL_EVT_AHW(IDLE-CYCLES-FRONTEND, STALLED_CYCLES_FRONTEND), PCL_EVT_HW(STALLED_CYCLES_BACKEND), PCL_EVT_AHW(STALLED-CYCLES-BACKEND, STALLED_CYCLES_BACKEND), PCL_EVT_AHW(IDLE-CYCLES-BACKEND, STALLED_CYCLES_BACKEND), PCL_EVT_HW(REF_CPU_CYCLES), PCL_EVT_AHW(REF-CYCLES,REF_CPU_CYCLES), PCL_EVT_SW(CPU_CLOCK), PCL_EVT_ASW(CPU-CLOCK, CPU_CLOCK), PCL_EVT_SW(TASK_CLOCK), PCL_EVT_ASW(TASK-CLOCK, TASK_CLOCK), PCL_EVT_SW(PAGE_FAULTS), PCL_EVT_ASW(PAGE-FAULTS, PAGE_FAULTS), PCL_EVT_ASW(FAULTS, PAGE_FAULTS), PCL_EVT_SW(CONTEXT_SWITCHES), PCL_EVT_ASW(CONTEXT-SWITCHES, CONTEXT_SWITCHES), PCL_EVT_ASW(CS, CONTEXT_SWITCHES), PCL_EVT_SW(CPU_MIGRATIONS), PCL_EVT_ASW(CPU-MIGRATIONS, CPU_MIGRATIONS), PCL_EVT_ASW(MIGRATIONS, CPU_MIGRATIONS), PCL_EVT_SW(PAGE_FAULTS_MIN), PCL_EVT_ASW(MINOR-FAULTS, PAGE_FAULTS_MIN), PCL_EVT_SW(PAGE_FAULTS_MAJ), PCL_EVT_ASW(MAJOR-FAULTS, PAGE_FAULTS_MAJ), { .name = "PERF_COUNT_HW_CACHE_L1D", .desc = "L1 data cache", .id = PERF_COUNT_HW_CACHE_L1D, .type = PERF_TYPE_HW_CACHE, .numasks = 5, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "WRITE", .udesc = "write access", .uid = PERF_COUNT_HW_CACHE_OP_WRITE << 8, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "prefetch access", .uid = PERF_COUNT_HW_CACHE_OP_PREFETCH << 8, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, CACHE_ACCESS(L1-DCACHE, "L1 cache", L1D), { .name = "PERF_COUNT_HW_CACHE_L1I", .desc = "L1 instruction cache", .id = PERF_COUNT_HW_CACHE_L1I, .type = PERF_TYPE_HW_CACHE, .numasks = 4, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "prefetch access", .uid = PERF_COUNT_HW_CACHE_OP_PREFETCH << 8, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, ICACHE_ACCESS(L1-ICACHE, "L1I cache", L1I), { .name = "PERF_COUNT_HW_CACHE_LL", .desc = "Last level cache", .id = PERF_COUNT_HW_CACHE_LL, .type = PERF_TYPE_HW_CACHE, .numasks = 5, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "WRITE", .udesc = "write access", .uid = PERF_COUNT_HW_CACHE_OP_WRITE << 8, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "prefetch access", .uid = PERF_COUNT_HW_CACHE_OP_PREFETCH << 8, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, CACHE_ACCESS(LLC, "Last level cache", LL), { .name = "PERF_COUNT_HW_CACHE_DTLB", .desc = "Data Translation Lookaside Buffer", .id = PERF_COUNT_HW_CACHE_DTLB, .type = PERF_TYPE_HW_CACHE, .numasks = 5, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "WRITE", .udesc = "write access", .uid = PERF_COUNT_HW_CACHE_OP_WRITE << 8, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "prefetch access", .uid = PERF_COUNT_HW_CACHE_OP_PREFETCH << 8, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, CACHE_ACCESS(DTLB, "Data TLB", DTLB), { .name = "PERF_COUNT_HW_CACHE_ITLB", .desc = "Instruction Translation Lookaside Buffer", .id = PERF_COUNT_HW_CACHE_ITLB, .type = PERF_TYPE_HW_CACHE, .numasks = 3, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, CACHE_LD_ACCESS(ITLB, "Instruction TLB", ITLB), { .name = "PERF_COUNT_HW_CACHE_BPU", .desc = "Branch Prediction Unit", .id = PERF_COUNT_HW_CACHE_BPU, .type = PERF_TYPE_HW_CACHE, .numasks = 3, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } } }, CACHE_LD_ACCESS(BRANCH, "Branch ", BPU), { .name = "PERF_COUNT_HW_CACHE_NODE", .desc = "Node memory access", .id = PERF_COUNT_HW_CACHE_NODE, .type = PERF_TYPE_HW_CACHE, .numasks = 5, .modmsk = PERF_ATTR_HW, .umask_ovfl_idx = -1, .ngrp = 2, .umasks = { { .uname = "READ", .udesc = "read access", .uid = PERF_COUNT_HW_CACHE_OP_READ << 8, .uflags= PERF_FL_DEFAULT, .grpid = 0, }, { .uname = "WRITE", .udesc = "write access", .uid = PERF_COUNT_HW_CACHE_OP_WRITE << 8, .grpid = 0, }, { .uname = "PREFETCH", .udesc = "prefetch access", .uid = PERF_COUNT_HW_CACHE_OP_PREFETCH << 8, .grpid = 0, }, { .uname = "ACCESS", .udesc = "hit access", .uid = PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16, .grpid = 1, }, { .uname = "MISS", .udesc = "miss access", .uid = PERF_COUNT_HW_CACHE_RESULT_MISS << 16, .uflags= PERF_FL_DEFAULT, .grpid = 1, } }, }, CACHE_ACCESS(NODE, "Node ", NODE) }; #define PME_PERF_EVENT_COUNT (sizeof(perf_static_events)/sizeof(perf_event_t)) papi-5.3.0/src/libpfm4/lib/events/cell_events.h0000600003276200002170000033644112247131123021023 0ustar ralphundrgrad/* * Copyright (c) 2007 TOSHIBA CORPORATION based on code from * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ static pme_cell_entry_t cell_pe[] = { {.pme_name = "CYCLES", .pme_desc = "CPU cycles", .pme_code = 0x0, /* 0 */ .pme_enable_word = WORD_NONE, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BRANCH_COMMIT_TH0", .pme_desc = "Branch instruction committed.", .pme_code = 0x834, /* 2100 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BRANCH_FLUSH_TH0", .pme_desc = "Branch instruction that caused a misprediction flush is committed. Branch misprediction includes", .pme_code = 0x835, /* 2101 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "INST_BUFF_EMPTY_TH0", .pme_desc = "Instruction buffer empty.", .pme_code = 0x836, /* 2102 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "INST_ERAT_MISS_TH0", .pme_desc = "Instruction effective-address-to-real-address translation (I-ERAT) miss.", .pme_code = 0x837, /* 2103 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L1_ICACHE_MISS_CYCLES_TH0", .pme_desc = "L1 Instruction cache miss cycles. Counts the cycles from the miss event until the returned instruction is dispatched or cancelled due to branch misprediction, completion restart, or exceptions.", .pme_code = 0x838, /* 2104 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "DISPATCH_BLOCKED_TH0", .pme_desc = "Valid instruction available for dispatch, but dispatch is blocked.", .pme_code = 0x83a, /* 2106 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "INST_FLUSH_TH0", .pme_desc = "Instruction in pipeline stage EX7 causes a flush.", .pme_code = 0x83d, /* 2109 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "PPC_INST_COMMIT_TH0", .pme_desc = "Two PowerPC instructions committed. For microcode sequences, only the last microcode operation is counted. Committed instructions are counted two at a time. If only one instruction has committed for a given cycle, this event will not be raised until another instruction has been committed in a future cycle.", .pme_code = 0x83f, /* 2111 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BRANCH_COMMIT_TH1", .pme_desc = "Branch instruction committed.", .pme_code = 0x847, /* 2119 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BRANCH_FLUSH_TH1", .pme_desc = "Branch instruction that caused a misprediction flush is committed. Branch misprediction includes", .pme_code = 0x848, /* 2120 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "INST_BUFF_EMPTY_TH1", .pme_desc = "Instruction buffer empty.", .pme_code = 0x849, /* 2121 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "INST_ERAT_MISS_TH1", .pme_desc = "Instruction effective-address-to-real-address translation (I-ERAT) miss.", .pme_code = 0x84a, /* 2122 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L1_ICACHE_MISS_CYCLES_TH1", .pme_desc = "L1 Instruction cache miss cycles. Counts the cycles from the miss event until the returned instruction is dispatched or cancelled due to branch misprediction, completion restart, or exceptions.", .pme_code = 0x84b, /* 2123 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "DISPATCH_BLOCKED_TH1", .pme_desc = "Valid instruction available for dispatch, but dispatch is blocked.", .pme_code = 0x84d, /* 2125 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "INST_FLUSH_TH1", .pme_desc = "Instruction in pipeline stage EX7 causes a flush.", .pme_code = 0x850, /* 2128 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "PPC_INST_COMMIT_TH1", .pme_desc = "Two PowerPC instructions committed. For microcode sequences, only the last microcode operation is counted. Committed instructions are counted two at a time. If only one instruction has committed for a given cycle, this event will not be raised until another instruction has been committed in a future cycle.", .pme_code = 0x852, /* 2130 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "DATA_ERAT_MISS_TH0", .pme_desc = "Data effective-address-to-real-address translation (D-ERAT) miss. Not speculative.", .pme_code = 0x89a, /* 2202 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "ST_REQ_TH0", .pme_desc = "Store request counted at the L2 interface. Counts microcoded PPE sequences more than once. (Thread 0 and 1)", .pme_code = 0x89b, /* 2203 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "LD_VALID_TH0", .pme_desc = "Load valid at a particular pipe stage. Speculative, since flushed operations are counted as well. Counts microcoded PPE sequences more than once. Misaligned flushes might be counted the first time as well. Load operations include all loads that read data from the cache, dcbt and dcbtst. Does not include load Vector/SIMD multimedia extension pattern instructions.", .pme_code = 0x89c, /* 2204 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "L1_DCACHE_MISS_TH0", .pme_desc = "L1 D-cache load miss. Pulsed when there is a miss request that has a tag miss but not an ERAT miss. Speculative, since flushed operations are counted as well.", .pme_code = 0x89d, /* 2205 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "DATA_ERAT_MISS_TH1", .pme_desc = "Data effective-address-to-real-address translation (D-ERAT) miss. Not speculative.", .pme_code = 0x8aa, /* 2218 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "LD_VALID_TH1", .pme_desc = "Load valid at a particular pipe stage. Speculative, since flushed operations are counted as well. Counts microcoded PPE sequences more than once. Misaligned flushes might be counted the first time as well. Load operations include all loads that read data from the cache, dcbt and dcbtst. Does not include load Vector/SIMD multimedia extension pattern instructions.", .pme_code = 0x8ac, /* 2220 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "DATA_ERAT_MISS_TH1", .pme_desc = "L1 D-cache load miss. Pulsed when there is a miss request that has a tag miss but not an ERAT miss. Speculative, since flushed operations are counted as well.", .pme_code = 0x8ad, /* 2221 */ .pme_enable_word = WORD_0_AND_1, .pme_freq = PFM_CELL_PME_FREQ_PPU_MFC, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "LD_MFC_MMIO", .pme_desc = "Load from MFC memory-mapped I/O (MMIO) space.", .pme_code = 0xc1c, /* 3100 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "ST_MFC_MMIO", .pme_desc = "Stores to MFC MMIO space.", .pme_code = 0xc1d, /* 3101 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "REQ_TOKEN_TYPE", .pme_desc = "Request token for even memory bank numbers 0-14.", .pme_code = 0xc22, /* 3106 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "RCV_8BEAT_DATA", .pme_desc = "Receive 8-beat data from the Element Interconnect Bus (EIB).", .pme_code = 0xc2b, /* 3115 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "SEND_8BEAT_DATA", .pme_desc = "Send 8-beat data to the EIB.", .pme_code = 0xc2c, /* 3116 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "SEND_CMD", .pme_desc = "Send a command to the EIB; includes retried commands.", .pme_code = 0xc2d, /* 3117 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "DATA_GRANT_CYCLES", .pme_desc = "Cycles between data request and data grant.", .pme_code = 0xc2e, /* 3118 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_ST_Q_NOT_EMPTY_CYCLES", .pme_desc = "The five-entry Non-Cacheable Unit (NCU) Store Command queue not empty.", .pme_code = 0xc33, /* 3123 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "L2_CACHE_HIT", .pme_desc = "Cache hit for core interface unit (CIU) loads and stores.", .pme_code = 0xc80, /* 3200 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_CACHE_MISS", .pme_desc = "Cache miss for CIU loads and stores.", .pme_code = 0xc81, /* 3201 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_LD_MISS", .pme_desc = "CIU load miss.", .pme_code = 0xc84, /* 3204 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_ST_MISS", .pme_desc = "CIU store to Invalid state (miss).", .pme_code = 0xc85, /* 3205 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_LWARX_LDARX_MISS_TH0", .pme_desc = "Load word and reserve indexed (lwarx/ldarx) for Thread 0 hits Invalid cache state", .pme_code = 0xc87, /* 3207 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_STWCX_STDCX_MISS_TH0", .pme_desc = "Store word conditional indexed (stwcx/stdcx) for Thread 0 hits Invalid cache state when reservation is set.", .pme_code = 0xc8e, /* 3214 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_ALL_SNOOP_SM_BUSY", .pme_desc = "All four snoop state machines busy.", .pme_code = 0xc99, /* 3225 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "L2_DCLAIM_GOOD", .pme_desc = "Data line claim (dclaim) that received good combined response; includes store/stcx/dcbz to Shared (S), Shared Last (SL),or Tagged (T) cache state; does not include dcbz to Invalid (I) cache state.", .pme_code = 0xce8, /* 3304 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_DCLAIM_TO_RWITM", .pme_desc = "Dclaim converted into rwitm; may still not get to the bus if stcx is aborted .", .pme_code = 0xcef, /* 3311 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_ST_TO_M_MU_E", .pme_desc = "Store to modified (M), modified unsolicited (MU), or exclusive (E) cache state.", .pme_code = 0xcf0, /* 3312 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_ST_Q_FULL", .pme_desc = "8-entry store queue (STQ) full.", .pme_code = 0xcf1, /* 3313 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "L2_ST_TO_RC_ACKED", .pme_desc = "Store dispatched to RC machine is acknowledged.", .pme_code = 0xcf2, /* 3314 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_GATHERABLE_ST", .pme_desc = "Gatherable store (type = 00000) received from CIU.", .pme_code = 0xcf3, /* 3315 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_PUSH", .pme_desc = "Snoop push.", .pme_code = 0xcf6, /* 3318 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_INTERVENTION_FROM_SL_E_SAME_MODE", .pme_desc = "Send intervention from (SL | E) cache state to a destination within the same CBE chip.", .pme_code = 0xcf7, /* 3319 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_INTERVENTION_FROM_M_MU_SAME_MODE", .pme_desc = "Send intervention from (M | MU) cache state to a destination within the same CBE chip.", .pme_code = 0xcf8, /* 3320 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RETRY_CONFLICTS", .pme_desc = "Respond with Retry to a snooped request due to one of the following conflicts", .pme_code = 0xcfd, /* 3325 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RETRY_BUSY", .pme_desc = "Respond with Retry to a snooped request because all snoop machines are busy.", .pme_code = 0xcfe, /* 3326 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RESP_MMU_TO_EST", .pme_desc = "Snooped response causes a cache state transition from (M | MU) to (E | S | T).", .pme_code = 0xcff, /* 3327 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RESP_E_TO_S", .pme_desc = "Snooped response causes a cache state transition from E to S.", .pme_code = 0xd00, /* 3328 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RESP_ESLST_TO_I", .pme_desc = "Snooped response causes a cache state transition from (E | SL | S | T) to Invalid (I).", .pme_code = 0xd01, /* 3329 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_SNOOP_RESP_MMU_TO_I", .pme_desc = "Snooped response causes a cache state transition from (M | MU) to I.", .pme_code = 0xd02, /* 3330 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_LWARX_LDARX_MISS_TH1", .pme_desc = "Load and reserve indexed (lwarx/ldarx) for Thread 1 hits Invalid cache state.", .pme_code = 0xd54, /* 3412 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "L2_STWCX_STDCX_MISS_TH1", .pme_desc = "Store conditional indexed (stwcx/stdcx) for Thread 1 hits Invalid cache state.", .pme_code = 0xd5b, /* 3419 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_NON_CACHEABLE_ST_ALL", .pme_desc = "Non-cacheable store request received from CIU; includes all synchronization operations such as sync and eieio.", .pme_code = 0xdac, /* 3500 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_SYNC_REQ", .pme_desc = "sync received from CIU.", .pme_code = 0xdad, /* 3501 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_NON_CACHEABLE_ST", .pme_desc = "Non-cacheable store request received from CIU; includes only stores.", .pme_code = 0xdb0, /* 3504 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_EIEIO_REQ", .pme_desc = "eieio received from CIU.", .pme_code = 0xdb2, /* 3506 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_TLBIE_REQ", .pme_desc = "tlbie received from CIU.", .pme_code = 0xdb3, /* 3507 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_SYNC_WAIT", .pme_desc = "sync at the bottom of the store queue, while waiting on st_done signal from the Bus Interface Unit (BIU) and sync_done signal from L2.", .pme_code = 0xdb4, /* 3508 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_LWSYNC_WAIT", .pme_desc = "lwsync at the bottom of the store queue, while waiting for a sync_done signal from the L2.", .pme_code = 0xdb5, /* 3509 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_EIEIO_WAIT", .pme_desc = "eieio at the bottom of the store queue, while waiting for a st_done signal from the BIU and a sync_done signal from the L2.", .pme_code = 0xdb6, /* 3510 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_TLBIE_WAIT", .pme_desc = "tlbie at the bottom of the store queue, while waiting for a st_done signal from the BIU.", .pme_code = 0xdb7, /* 3511 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_COMBINED_NON_CACHEABLE_ST", .pme_desc = "Non-cacheable store combined with the previous non-cacheable store with a contiguous address.", .pme_code = 0xdb8, /* 3512 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_ALL_ST_GATHER_BUFFS_FULL", .pme_desc = "All four store-gather buffers full.", .pme_code = 0xdbb, /* 3515 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_LD_REQ", .pme_desc = "Non-cacheable load request received from CIU; includes instruction and data fetches.", .pme_code = 0xdbc, /* 3516 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "NCU_ST_Q_NOT_EMPTY", .pme_desc = "The four-deep store queue not empty.", .pme_code = 0xdbd, /* 3517 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_ST_Q_FULL", .pme_desc = "The four-deep store queue full.", .pme_code = 0xdbe, /* 3518 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "NCU_AT_LEAST_ONE_ST_GATHER_BUFF_NOT_EMPTY", .pme_desc = "At least one store gather buffer not empty.", .pme_code = 0xdbf, /* 3519 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_DUAL_INST_COMMITTED", .pme_desc = "A dual instruction is committed.", .pme_code = 0x1004, /* 4100 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_SINGLE_INST_COMMITTED", .pme_desc = "A single instruction is committed.", .pme_code = 0x1005, /* 4101 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_PIPE0_INST_COMMITTED", .pme_desc = "A pipeline 0 instruction is committed.", .pme_code = 0x1006, /* 4102 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_PIPE1_INST_COMMITTED", .pme_desc = "A pipeline 1 instruction is committed.", .pme_code = 0x1007, /* 4103 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_LS_BUSY", .pme_desc = "Local storage is busy.", .pme_code = 0x1009, /* 4105 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_DMA_CONFLICT_LD_ST", .pme_desc = "A direct memory access (DMA) might conflict with a load or store.", .pme_code = 0x100a, /* 4106 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_LS_ST", .pme_desc = "A store instruction to local storage is issued.", .pme_code = 0x100b, /* 4107 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_LS_LD", .pme_desc = "A load instruction from local storage is issued.", .pme_code = 0x100c, /* 4108 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_FP_EXCEPTION", .pme_desc = "A floating-point unit exception occurred.", .pme_code = 0x100d, /* 4109 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_BRANCH_COMMIT", .pme_desc = "A branch instruction is committed.", .pme_code = 0x100e, /* 4110 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_NON_SEQ_PC", .pme_desc = "A nonsequential change of the SPU program counter has occurred. This can be caused by branch, asynchronous interrupt, stalled wait on channel, error-correction code (ECC) error, and so forth.", .pme_code = 0x100f, /* 4111 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_BRANCH_NOT_TAKEN", .pme_desc = "A branch was not taken.", .pme_code = 0x1010, /* 4112 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_BRANCH_MISS_PREDICTION", .pme_desc = "Branch miss prediction. This count is not exact. Certain other code sequences can cause additional pulses on this signal.", .pme_code = 0x1011, /* 4113 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_BRANCH_HINT_MISS_PREDICTION", .pme_desc = "Branch hint miss prediction. This count is not exact. Certain other code sequences can cause additional pulses on this signal.", .pme_code = 0x1012, /* 4114 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_INST_SEQ_ERROR", .pme_desc = "Instruction sequence error", .pme_code = 0x1013, /* 4115 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "SPU_STALL_CH_WRITE", .pme_desc = "Stalled waiting on any blocking channel write.", .pme_code = 0x1015, /* 4117 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_EXTERNAL_EVENT_CH0", .pme_desc = "Stalled waiting on external event status (Channel 0).", .pme_code = 0x1016, /* 4118 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_SIGNAL_1_CH3", .pme_desc = "Stalled waiting on SPU Signal Notification 1 (Channel 3).", .pme_code = 0x1017, /* 4119 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_SIGNAL_2_CH4", .pme_desc = "Stalled waiting on SPU Signal Notification 2 (Channel 4).", .pme_code = 0x1018, /* 4120 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_DMA_CH21", .pme_desc = "Stalled waiting on DMA Command Opcode or ClassID Register (Channel 21).", .pme_code = 0x1019, /* 4121 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_MFC_READ_CH24", .pme_desc = "Stalled waiting on memory flow control (MFC) Read Tag-Group Status (Channel 24).", .pme_code = 0x101a, /* 4122 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_MFC_READ_CH25", .pme_desc = "Stalled waiting on MFC Read List Stall-and-Notify Tag Status (Channel 25).", .pme_code = 0x101b, /* 4123 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_OUTBOUND_MAILBOX_WRITE_CH28", .pme_desc = "Stalled waiting on SPU Write Outbound Mailbox (Channel 28).", .pme_code = 0x101c, /* 4124 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_STALL_MAILBOX_CH29", .pme_desc = "Stalled waiting on SPU Mailbox (Channel 29).", .pme_code = 0x1022, /* 4130 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_TR_STALL_CH", .pme_desc = "Stalled waiting on a channel operation.", .pme_code = 0x10a1, /* 4257 */ .pme_enable_word = WORD_NONE, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_EV_INST_FETCH_STALL", .pme_desc = "Instruction fetch stall", .pme_code = 0x1107, /* 4359 */ .pme_enable_word = WORD_NONE, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "SPU_EV_ADDR_TRACE", .pme_desc = "Serialized SPU address (program counter) trace.", .pme_code = 0x110b, /* 4363 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_SPU, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD", .pme_desc = "An atomic load was received from direct memory access controller (DMAC).", .pme_code = 0x13ed, /* 5101 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_DCLAIM", .pme_desc = "An atomic dclaim was sent to synergistic bus interface (SBI); includes retried requests.", .pme_code = 0x13ee, /* 5102 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_RWITM", .pme_desc = "An atomic rwitm performed was sent to SBI; includes retried requests.", .pme_code = 0x13ef, /* 5103 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD_CACHE_MISS_MU", .pme_desc = "An atomic load miss caused MU cache state.", .pme_code = 0x13f0, /* 5104 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD_CACHE_MISS_E", .pme_desc = "An atomic load miss caused E cache state.", .pme_code = 0x13f1, /* 5105 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD_CACHE_MISS_SL", .pme_desc = "An atomic load miss caused SL cache state.", .pme_code = 0x13f2, /* 5106 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD_CACHE_HIT", .pme_desc = "An atomic load hits cache.", .pme_code = 0x13f3, /* 5107 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_LD_CACHE_MISS_INTERVENTION", .pme_desc = "Atomic load misses cache with data intervention; sum of signals 4 and 6 in this group.", .pme_code = 0x13f4, /* 5108 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ATOMIC_PUTLLXC_CACHE_MISS_WO_INTERVENTION", .pme_desc = "putllc or putlluc misses cache without data intervention; for putllc, counts only when reservation is set for the address.", .pme_code = 0x13fa, /* 5114 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SNOOP_MACHINE_BUSY", .pme_desc = "Snoop machine busy.", .pme_code = 0x13fd, /* 5117 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MFC_SNOOP_MMU_TO_I", .pme_desc = "A snoop caused cache transition from [M | MU] to I.", .pme_code = 0x13ff, /* 5119 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SNOOP_ESSL_TO_I", .pme_desc = "A snoop caused cache transition from [E | S | SL] to I.", .pme_code = 0x1401, /* 5121 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SNOOP_MU_TO_T", .pme_desc = "A snoop caused cache transition from MU to T cache state.", .pme_code = 0x1403, /* 5123 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SENT_INTERVENTION_LOCAL", .pme_desc = "Sent modified data intervention to a destination within the same CBE chip.", .pme_code = 0x1407, /* 5127 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ANY_DMA_GET", .pme_desc = "Any flavor of DMA get[] command issued to Synergistic Bus Interface (SBI); sum of signals 17-25 in this group.", .pme_code = 0x1450, /* 5200 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ANY_DMA_PUT", .pme_desc = "Any flavor of DMA put[] command issued to SBI; sum of signals 2-16 in this group.", .pme_code = 0x1451, /* 5201 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_DMA_PUT", .pme_desc = "DMA put (put) is issued to SBI.", .pme_code = 0x1452, /* 5202 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_DMA_GET", .pme_desc = "DMA get data from effective address to local storage (get) issued to SBI.", .pme_code = 0x1461, /* 5217 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_LD_REQ", .pme_desc = "Load request sent to element interconnect bus (EIB); includes read, read atomic, rwitm, rwitm atomic, and retried commands.", .pme_code = 0x14b8, /* 5304 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_ST_REQ", .pme_desc = "Store request sent to EIB; includes wwf, wwc, wwk, dclaim, dclaim atomic, and retried commands.", .pme_code = 0x14b9, /* 5305 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_RECV_DATA", .pme_desc = "Received data from EIB, including partial cache line data.", .pme_code = 0x14ba, /* 5306 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SENT_DATA", .pme_desc = "Sent data to EIB, both as a master and a snooper.", .pme_code = 0x14bb, /* 5307 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SBI_Q_NOT_EMPTY", .pme_desc = "16-deep synergistic bus interface (SBI) queue with outgoing requests not empty; does not include atomic requests.", .pme_code = 0x14bc, /* 5308 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MFC_SBI_Q_FULL", .pme_desc = "16-deep SBI queue with outgoing requests full; does not include atomic requests.", .pme_code = 0x14bd, /* 5309 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MFC_SENT_REQ", .pme_desc = "Sent request to EIB.", .pme_code = 0x14be, /* 5310 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_RECV_DATA_BUS_GRANT", .pme_desc = "Received data bus grant; includes data sent for MMIO operations.", .pme_code = 0x14c0, /* 5312 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_WAIT_DATA_BUS_GRANT", .pme_desc = "Cycles between data bus request and data bus grant.", .pme_code = 0x14c1, /* 5313 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MFC_CMD_O_MEM", .pme_desc = "Command (read or write) for an odd-numbered memory bank; valid only when resource allocation is turned on.", .pme_code = 0x14c2, /* 5314 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_CMD_E_MEM", .pme_desc = "Command (read or write) for an even-numbered memory bank; valid only when resource allocation is turned on.", .pme_code = 0x14c3, /* 5315 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_RECV_RETRY_RESP", .pme_desc = "Request gets the Retry response; includes local and global requests.", .pme_code = 0x14c6, /* 5318 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_SENT_DATA_BUS_REQ", .pme_desc = "Sent data bus request to EIB.", .pme_code = 0x14c7, /* 5319 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_TLB_MISS", .pme_desc = "Translation Lookaside Buffer (TLB) miss without parity or protection errors.", .pme_code = 0x1518, /* 5400 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "MFC_TLB_CYCLES", .pme_desc = "TLB miss (cycles).", .pme_code = 0x1519, /* 5401 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MFC_TLB_HIT", .pme_desc = "TLB hit.", .pme_code = 0x151a, /* 5402 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_READ_RWITM_1", .pme_desc = "Number of read and rwitm commands (including atomic) AC1 to AC0. (Group 1)", .pme_code = 0x17d4, /* 6100 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_DCLAIM_1", .pme_desc = "Number of dclaim commands (including atomic) AC1 to AC0. (Group 1)", .pme_code = 0x17d5, /* 6101 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_WWK_WWC_WWF_1", .pme_desc = "Number of wwk, wwc, and wwf commands from AC1 to AC0. (Group 1)", .pme_code = 0x17d6, /* 6102 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SYNC_TLBSYNC_EIEIO_1", .pme_desc = "Number of sync, tlbsync, and eieio commands from AC1 to AC0. (Group 1)", .pme_code = 0x17d7, /* 6103 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_TLBIE_1", .pme_desc = "Number of tlbie commands from AC1 to AC0. (Group 1)", .pme_code = 0x17d8, /* 6104 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PAAM_CAM_HIT_1", .pme_desc = "Previous adjacent address match (PAAM) Content Addressable Memory (CAM) hit. (Group 1)", .pme_code = 0x17df, /* 6111 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PAAM_CAM_MISS_1", .pme_desc = "PAAM CAM miss. (Group 1)", .pme_code = 0x17e0, /* 6112 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_CMD_REFLECTED_1", .pme_desc = "Command reflected. (Group 1)", .pme_code = 0x17e2, /* 6114 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_READ_RWITM_2", .pme_desc = "Number of read and rwitm commands (including atomic) AC1 to AC0. (Group 2)", .pme_code = 0x17e4, /* 6116 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_DCLAIM_2", .pme_desc = "Number of dclaim commands (including atomic) AC1 to AC0. (Group 2)", .pme_code = 0x17e5, /* 6117 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_WWK_WWC_WWF_2", .pme_desc = "Number of wwk, wwc, and wwf commands from AC1 to AC0. (Group 2)", .pme_code = 0x17e6, /* 6118 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SYNC_TLBSYNC_EIEIO_2", .pme_desc = "Number of sync, tlbsync, and eieio commands from AC1 to AC0. (Group 2)", .pme_code = 0x17e7, /* 6119 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_TLBIE_2", .pme_desc = "Number of tlbie commands from AC1 to AC0. (Group 2)", .pme_code = 0x17e8, /* 6120 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PAAM_CAM_HIT_2", .pme_desc = "PAAM CAM hit. (Group 2)", .pme_code = 0x17ef, /* 6127 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PAAM_CAM_MISS_2", .pme_desc = "PAAM CAM miss. (Group 2)", .pme_code = 0x17f0, /* 6128 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_CMD_REFLECTED_2", .pme_desc = "Command reflected. (Group 2)", .pme_code = 0x17f2, /* 6130 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE6", .pme_desc = "Local command from SPE 6.", .pme_code = 0x1839, /* 6201 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE4", .pme_desc = "Local command from SPE 4.", .pme_code = 0x183a, /* 6202 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CME_FROM_SPE2", .pme_desc = "Local command from SPE 2.", .pme_code = 0x183b, /* 6203 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_PPE", .pme_desc = "Local command from PPE.", .pme_code = 0x183d, /* 6205 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE1", .pme_desc = "Local command from SPE 1.", .pme_code = 0x183e, /* 6206 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE3", .pme_desc = "Local command from SPE 3.", .pme_code = 0x183f, /* 6207 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE5", .pme_desc = "Local command from SPE 5.", .pme_code = 0x1840, /* 6208 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_LOCAL_CMD_FROM_SPE7", .pme_desc = "Local command from SPE 7.", .pme_code = 0x1841, /* 6209 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE6", .pme_desc = "AC1-to-AC0 global command from SPE 6.", .pme_code = 0x1844, /* 6212 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE4", .pme_desc = "AC1-to-AC0 global command from SPE 4.", .pme_code = 0x1845, /* 6213 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE2", .pme_desc = "AC1-to-AC0 global command from SPE 2.", .pme_code = 0x1846, /* 6214 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE0", .pme_desc = "AC1-to-AC0 global command from SPE 0.", .pme_code = 0x1847, /* 6215 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_PPE", .pme_desc = "AC1-to-AC0 global command from PPE.", .pme_code = 0x1848, /* 6216 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE1", .pme_desc = "AC1-to-AC0 global command from SPE 1.", .pme_code = 0x1849, /* 6217 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE3", .pme_desc = "AC1-to-AC0 global command from SPE 3.", .pme_code = 0x184a, /* 6218 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE5", .pme_desc = "AC1-to-AC0 global command from SPE 5.", .pme_code = 0x184b, /* 6219 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GLOBAL_CMD_FROM_SPE7", .pme_desc = "AC1-to-AC0 global command from SPE 7", .pme_code = 0x184c, /* 6220 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_AC1_REFLECTING_LOCAL_CMD", .pme_desc = "AC1 is reflecting any local command.", .pme_code = 0x184e, /* 6222 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_AC1_SEND_GLOBAL_CMD", .pme_desc = "AC1 sends a global command to AC0.", .pme_code = 0x184f, /* 6223 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_AC0_REFLECT_GLOBAL_CMD", .pme_desc = "AC0 reflects a global command back to AC1.", .pme_code = 0x1850, /* 6224 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_AC1_REFLECT_CMD_TO_BM", .pme_desc = "AC1 reflects a command back to the bus masters.", .pme_code = 0x1851, /* 6225 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING0_1", .pme_desc = "Grant on data ring 0.", .pme_code = 0x189c, /* 6300 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING1_1", .pme_desc = "Grant on data ring 1.", .pme_code = 0x189d, /* 6301 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING2_1", .pme_desc = "Grant on data ring 2.", .pme_code = 0x189e, /* 6302 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING3_1", .pme_desc = "Grant on data ring 3.", .pme_code = 0x189f, /* 6303 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_DATA_RING0_INUSE_1", .pme_desc = "Data ring 0 is in use.", .pme_code = 0x18a0, /* 6304 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_DATA_RING1_INUSE_1", .pme_desc = "Data ring 1 is in use.", .pme_code = 0x18a1, /* 6305 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_DATA_RING2_INUSE_1", .pme_desc = "Data ring 2 is in use.", .pme_code = 0x18a2, /* 6306 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_DATA_RING3_INUSE_1", .pme_desc = "Data ring 3 is in use.", .pme_code = 0x18a3, /* 6307 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_ALL_DATA_RINGS_IDLE_1", .pme_desc = "All data rings are idle.", .pme_code = 0x18a4, /* 6308 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_ONE_DATA_RING_BUSY_1", .pme_desc = "One data ring is busy.", .pme_code = 0x18a5, /* 6309 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TWO_OR_THREE_DATA_RINGS_BUSY_1", .pme_desc = "Two or three data rings are busy.", .pme_code = 0x18a6, /* 6310 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_ALL_DATA_RINGS_BUSY_1", .pme_desc = "All data rings are busy.", .pme_code = 0x18a7, /* 6311 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_IOIF0_DATA_REQ_PENDING_1", .pme_desc = "BIC(IOIF0) data request pending.", .pme_code = 0x18a8, /* 6312 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE6_DATA_REQ_PENDING_1", .pme_desc = "SPE 6 data request pending.", .pme_code = 0x18a9, /* 6313 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE4_DATA_REQ_PENDING_1", .pme_desc = "SPE 4 data request pending.", .pme_code = 0x18aa, /* 6314 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE2_DATA_REQ_PENDING_1", .pme_desc = "SPE 2 data request pending.", .pme_code = 0x18ab, /* 6315 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE0_DATA_REQ_PENDING_1", .pme_desc = "SPE 0 data request pending.", .pme_code = 0x18ac, /* 6316 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_MIC_DATA_REQ_PENDING_1", .pme_desc = "MIC data request pending.", .pme_code = 0x18ad, /* 6317 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_PPE_DATA_REQ_PENDING_1", .pme_desc = "PPE data request pending.", .pme_code = 0x18ae, /* 6318 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE1_DATA_REQ_PENDING_1", .pme_desc = "SPE 1 data request pending.", .pme_code = 0x18af, /* 6319 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE3_DATA_REQ_PENDING_1", .pme_desc = "SPE 3 data request pending.", .pme_code = 0x18b0, /* 6320 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE5_DATA_REQ_PENDING_1", .pme_desc = "SPE 5 data request pending.", .pme_code = 0x18b1, /* 6321 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE7_DATA_REQ_PENDING_1", .pme_desc = "SPE 7 data request pending.", .pme_code = 0x18b2, /* 6322 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_IOIF0_DATA_DEST_1", .pme_desc = "IOIF0 is data destination.", .pme_code = 0x18b4, /* 6324 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE6_DATA_DEST_1", .pme_desc = "SPE 6 is data destination.", .pme_code = 0x18b5, /* 6325 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE4_DATA_DEST_1", .pme_desc = "SPE 4 is data destination.", .pme_code = 0x18b6, /* 6326 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE2_DATA_DEST_1", .pme_desc = "SPE 2 is data destination.", .pme_code = 0x18b7, /* 6327 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE0_DATA_DEST_1", .pme_desc = "SPE 0 is data destination.", .pme_code = 0x18b8, /* 6328 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_MIC_DATA_DEST_1", .pme_desc = "MIC is data destination.", .pme_code = 0x18b9, /* 6329 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PPE_DATA_DEST_1", .pme_desc = "PPE is data destination.", .pme_code = 0x18ba, /* 6330 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE1_DATA_DEST_1", .pme_desc = "SPE 1 is data destination.", .pme_code = 0x18bb, /* 6331 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_IOIF0_DATA_REQ_PENDING_2", .pme_desc = "BIC(IOIF0) data request pending.", .pme_code = 0x1900, /* 6400 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE6_DATA_REQ_PENDING_2", .pme_desc = "SPE 6 data request pending.", .pme_code = 0x1901, /* 6401 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE4_DATA_REQ_PENDING_2", .pme_desc = "SPE 4 data request pending.", .pme_code = 0x1902, /* 6402 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE2_DATA_REQ_PENDING_2", .pme_desc = "SPE 2 data request pending.", .pme_code = 0x1903, /* 6403 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE0_DATA_REQ_PENDING_2", .pme_desc = "SPE 0 data request pending.", .pme_code = 0x1904, /* 6404 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_MIC_DATA_REQ_PENDING_2", .pme_desc = "MIC data request pending.", .pme_code = 0x1905, /* 6405 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_PPE_DATA_REQ_PENDING_2", .pme_desc = "PPE data request pending.", .pme_code = 0x1906, /* 6406 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE1_DATA_REQ_PENDING_2", .pme_desc = "SPE 1 data request pending.", .pme_code = 0x1907, /* 6407 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE3_DATA_REQ_PENDING_2", .pme_desc = "SPE 3 data request pending.", .pme_code = 0x1908, /* 6408 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE5_DATA_REQ_PENDING_2", .pme_desc = "SPE 5 data request pending.", .pme_code = 0x1909, /* 6409 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_SPE7_DATA_REQ_PENDING_2", .pme_desc = "SPE 7 data request pending.", .pme_code = 0x190a, /* 6410 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_IOIF1_DATA_REQ_PENDING_2", .pme_desc = "IOIF1 data request pending.", .pme_code = 0x190b, /* 6411 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "EIB_IOIF0_DATA_DEST_2", .pme_desc = "IOIF0 is data destination.", .pme_code = 0x190c, /* 6412 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE6_DATA_DEST_2", .pme_desc = "SPE 6 is data destination.", .pme_code = 0x190d, /* 6413 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE4_DATA_DEST_2", .pme_desc = "SPE 4 is data destination.", .pme_code = 0x190e, /* 6414 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE2_DATA_DEST_2", .pme_desc = "SPE 2 is data destination.", .pme_code = 0x190f, /* 6415 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE0_DATA_DEST_2", .pme_desc = "SPE 0 is data destination.", .pme_code = 0x1910, /* 6416 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_MIC_DATA_DEST_2", .pme_desc = "MIC is data destination.", .pme_code = 0x1911, /* 6417 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_PPE_DATA_DEST_2", .pme_desc = "PPE is data destination.", .pme_code = 0x1912, /* 6418 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE1_DATA_DEST_2", .pme_desc = "SPE 1 is data destination.", .pme_code = 0x1913, /* 6419 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE3_DATA_DEST_2", .pme_desc = "SPE 3 is data destination.", .pme_code = 0x1914, /* 6420 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE5_DATA_DEST_2", .pme_desc = "SPE 5 is data destination.", .pme_code = 0x1915, /* 6421 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_SPE7_DATA_DEST_2", .pme_desc = "SPE 7 is data destination.", .pme_code = 0x1916, /* 6422 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_IOIF1_DATA_DEST_2", .pme_desc = "IOIF1 is data destination.", .pme_code = 0x1917, /* 6423 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING0_2", .pme_desc = "Grant on data ring 0.", .pme_code = 0x1918, /* 6424 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING1_2", .pme_desc = "Grant on data ring 1.", .pme_code = 0x1919, /* 6425 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING2_2", .pme_desc = "Grant on data ring 2.", .pme_code = 0x191a, /* 6426 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_GRANT_DATA_RING3_2", .pme_desc = "Grant on data ring 3.", .pme_code = 0x191b, /* 6427 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "EIB_ALL_DATA_RINGS_IDLE_2", .pme_desc = "All data rings are idle.", .pme_code = 0x191c, /* 6428 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_ONE_DATA_RING_BUSY_2", .pme_desc = "One data ring is busy.", .pme_code = 0x191d, /* 6429 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TWO_OR_THREE_DATA_RINGS_BUSY_2", .pme_desc = "Two or three data rings are busy.", .pme_code = 0x191e, /* 6430 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_ALL_DATA_RINGS_BUSY_2", .pme_desc = "All four data rings are busy.", .pme_code = 0x191f, /* 6431 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_XIO_UNUSED", .pme_desc = "Even XIO token unused by RAG 0.", .pme_code = 0xfe4c, /* 65100 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_XIO_UNUSED", .pme_desc = "Odd XIO token unused by RAG 0.", .pme_code = 0xfe4d, /* 65101 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_BANK_UNUSED", .pme_desc = "Even bank token unused by RAG 0.", .pme_code = 0xfe4e, /* 65102 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_BANK_UNUSED", .pme_desc = "Odd bank token unused by RAG 0.", .pme_code = 0xfe4f, /* 65103 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE0", .pme_desc = "Token granted for SPE 0.", .pme_code = 0xfe54, /* 65108 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE1", .pme_desc = "Token granted for SPE 1.", .pme_code = 0xfe55, /* 65109 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE2", .pme_desc = "Token granted for SPE 2.", .pme_code = 0xfe56, /* 65110 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE3", .pme_desc = "Token granted for SPE 3.", .pme_code = 0xfe57, /* 65111 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE4", .pme_desc = "Token granted for SPE 4.", .pme_code = 0xfe58, /* 65112 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE5", .pme_desc = "Token granted for SPE 5.", .pme_code = 0xfe59, /* 65113 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE6", .pme_desc = "Token granted for SPE 6.", .pme_code = 0xfe5a, /* 65114 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_SPE7", .pme_desc = "Token granted for SPE 7.", .pme_code = 0xfe5b, /* 65115 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_XIO_WASTED", .pme_desc = "Even XIO token wasted by RAG 0; valid only when Unused Enable (UE) = 1 in TKM_CR register.", .pme_code = 0xfeb0, /* 65200 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_XIO_WASTED", .pme_desc = "Odd XIO token wasted by RAG 0; valid only when Unused Enable (UE) = 1 in TKM_CR register.", .pme_code = 0xfeb1, /* 65201 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_BANK_WASTED", .pme_desc = "Even bank token wasted by RAG 0; valid only when Unused Enable (UE) = 1 in TKM_CR register.", .pme_code = 0xfeb2, /* 65202 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_BANK_WASTED", .pme_desc = "Odd bank token wasted by RAG 0; valid only when Unused Enable (UE) = 1 in TKM_CR register.", .pme_code = 0xfeb3, /* 65203 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_E_XIO_WASTED", .pme_desc = "Even XIO token wasted by RAG U.", .pme_code = 0xfebc, /* 65212 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_O_XIO_WASTED", .pme_desc = "Odd XIO token wasted by RAG U.", .pme_code = 0xfebd, /* 65213 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_E_BANK_WASTED", .pme_desc = "Even bank token wasted by RAG U.", .pme_code = 0xfebe, /* 65214 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_O_BANK_WASTED", .pme_desc = "Odd bank token wasted by RAG U.", .pme_code = 0xfebf, /* 65215 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_XIO_RAG1", .pme_desc = "Even XIO token from RAG 0 shared with RAG 1", .pme_code = 0xff14, /* 65300 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_XIO_RAG2", .pme_desc = "Even XIO token from RAG 0 shared with RAG 2", .pme_code = 0xff15, /* 65301 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_XIO_RAG3", .pme_desc = "Even XIO token from RAG 0 shared with RAG 3", .pme_code = 0xff16, /* 65302 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_XIO_RAG1", .pme_desc = "Odd XIO token from RAG 0 shared with RAG 1", .pme_code = 0xff17, /* 65303 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_XIO_RAG2", .pme_desc = "Odd XIO token from RAG 0 shared with RAG 2", .pme_code = 0xff18, /* 65304 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_XIO_RAG3", .pme_desc = "Odd XIO token from RAG 0 shared with RAG 3", .pme_code = 0xff19, /* 65305 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_BANK_RAG1", .pme_desc = "Even bank token from RAG 0 shared with RAG 1", .pme_code = 0xff1a, /* 65306 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_BANK_RAG2", .pme_desc = "Even bank token from RAG 0 shared with RAG 2", .pme_code = 0xff1b, /* 65307 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_E_BANK_RAG3", .pme_desc = "Even bank token from RAG 0 shared with RAG 3", .pme_code = 0xff1c, /* 65308 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_BANK_RAG1", .pme_desc = "Odd bank token from RAG 0 shared with RAG 1", .pme_code = 0xff1d, /* 65309 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_BANK_RAG2", .pme_desc = "Odd bank token from RAG 0 shared with RAG 2", .pme_code = 0xff1e, /* 65310 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_O_BANK_RAG3", .pme_desc = "Odd bank token from RAG 0 shared with RAG 3", .pme_code = 0xff1f, /* 65311 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_XIO_UNUSED", .pme_desc = "Even XIO token was unused by RAG 1.", .pme_code = 0xff88, /* 65416 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_XIO_UNUSED", .pme_desc = "Odd XIO token was unused by RAG 1.", .pme_code = 0xff89, /* 65417 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_BANK_UNUSED", .pme_desc = "Even bank token was unused by RAG 1.", .pme_code = 0xff8a, /* 65418 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_BANK_UNUSED", .pme_desc = "Odd bank token was unused by RAG 1.", .pme_code = 0xff8b, /* 65419 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_IOC0", .pme_desc = "Token was granted for IOC0.", .pme_code = 0xff91, /* 65425 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_TOKEN_GRANTED_IOC1", .pme_desc = "Token was granted for IOC1.", .pme_code = 0xff92, /* 65426 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_XIO_WASTED", .pme_desc = "Even XIO token was wasted by RAG 1. This is valid only when UE = 1 in TKM_CR.", .pme_code = 0xffec, /* 65516 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_XIO_WASTED", .pme_desc = "Odd XIO token was wasted by RAG 1. This is valid only when UE = 1 in TKM_CR.", .pme_code = 0xffed, /* 65517 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_BANK_WASTED", .pme_desc = "Even bank token was wasted by RAG 1. This is valid only when UE = 1 in TKM_CR.", .pme_code = 0xffee, /* 65518 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_BANK_WASTED", .pme_desc = "Odd bank token was wasted by RAG 1. This is valid only when UE = 1 in TKM_CR.", .pme_code = 0xffef, /* 65519 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_XIO_RAG0", .pme_desc = "Even XIO token from RAG 1 shared with RAG 0", .pme_code = 0x10050, /* 65616 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_XIO_RAG2", .pme_desc = "Even XIO token from RAG 1 shared with RAG 2", .pme_code = 0x10051, /* 65617 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_XIO_RAG3", .pme_desc = "Even XIO token from RAG 1 shared with RAG 3", .pme_code = 0x10052, /* 65618 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_XIO_RAG0", .pme_desc = "Odd XIO token from RAG 1 shared with RAG 0", .pme_code = 0x10053, /* 65619 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_XIO_RAG2", .pme_desc = "Odd XIO token from RAG 1 shared with RAG 2", .pme_code = 0x10054, /* 65620 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_XIO_RAG3", .pme_desc = "Odd XIO token from RAG 1 shared with RAG 3", .pme_code = 0x10055, /* 65621 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_BANK_RAG0", .pme_desc = "Even bank token from RAG 1 shared with RAG 0", .pme_code = 0x10056, /* 65622 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_BANK_RAG2", .pme_desc = "Even bank token from RAG 1 shared with RAG 2", .pme_code = 0x10057, /* 65623 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_E_BANK_RAG3", .pme_desc = "Even bank token from RAG 1 shared with RAG 3", .pme_code = 0x10058, /* 65624 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_BANK_RAG0", .pme_desc = "Odd bank token from RAG 1 shared with RAG 0", .pme_code = 0x10059, /* 65625 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_BANK_RAG2", .pme_desc = "Odd bank token from RAG 1 shared with RAG 2", .pme_code = 0x1005a, /* 65626 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG1_O_BANK_RAG3", .pme_desc = "Odd bank token from RAG 1 shared with RAG 3", .pme_code = 0x1005b, /* 65627 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_E_XIO_RAG1", .pme_desc = "Even XIO token from RAG U shared with RAG 1", .pme_code = 0x1005c, /* 65628 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_O_XIO_RAG1", .pme_desc = "Odd XIO token from RAG U shared with RAG 1", .pme_code = 0x1005d, /* 65629 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_E_BANK_RAG1", .pme_desc = "Even bank token from RAG U shared with RAG 1", .pme_code = 0x1005e, /* 65630 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAGU_O_BANK_RAG1", .pme_desc = "Odd bank token from RAG U shared with RAG 1", .pme_code = 0x1005f, /* 65631 */ .pme_enable_word = WORD_0_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_XIO_UNUSED", .pme_desc = "Even XIO token unused by RAG 2", .pme_code = 0x100e4, /* 65764 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_XIO_UNUSED", .pme_desc = "Odd XIO token unused by RAG 2", .pme_code = 0x100e5, /* 65765 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_BANK_UNUSED", .pme_desc = "Even bank token unused by RAG 2", .pme_code = 0x100e6, /* 65766 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_BANK_UNUSED", .pme_desc = "Odd bank token unused by RAG 2", .pme_code = 0x100e7, /* 65767 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF0_IN_TOKEN_UNUSED", .pme_desc = "IOIF0 In token unused by RAG 0", .pme_code = 0x100e8, /* 65768 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF0_OUT_TOKEN_UNUSED", .pme_desc = "IOIF0 Out token unused by RAG 0", .pme_code = 0x100e9, /* 65769 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF1_IN_TOKEN_UNUSED", .pme_desc = "IOIF1 In token unused by RAG 0", .pme_code = 0x100ea, /* 65770 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF1_OUT_TOKEN_UNUSED", .pme_desc = "IOIF1 Out token unused by RAG 0", .pme_code = 0x100eb, /* 65771 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_XIO_WASTED", .pme_desc = "Even XIO token wasted by RAG 2", .pme_code = 0x10148, /* 65864 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_XIO_WASTED", .pme_desc = "Odd XIO token wasted by RAG 2", .pme_code = 0x10149, /* 65865 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_BANK_WASTED", .pme_desc = "Even bank token wasted by RAG 2", .pme_code = 0x1014a, /* 65866 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_BANK_WASTED", .pme_desc = "Odd bank token wasted by RAG 2", .pme_code = 0x1014b, /* 65867 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_XIO_RAG0", .pme_desc = "Even XIO token from RAG 2 shared with RAG 0", .pme_code = 0x101ac, /* 65964 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_XIO_RAG1", .pme_desc = "Even XIO token from RAG 2 shared with RAG 1", .pme_code = 0x101ad, /* 65965 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_XIO_RAG3", .pme_desc = "Even XIO token from RAG 2 shared with RAG 3", .pme_code = 0x101ae, /* 65966 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_XIO_RAG0", .pme_desc = "Odd XIO token from RAG 2 shared with RAG 0", .pme_code = 0x101af, /* 65967 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_XIO_RAG1", .pme_desc = "Odd XIO token from RAG 2 shared with RAG 1", .pme_code = 0x101b0, /* 65968 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_XIO_RAG3", .pme_desc = "Odd XIO token from RAG 2 shared with RAG 3", .pme_code = 0x101b1, /* 65969 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_BANK_RAG0", .pme_desc = "Even bank token from RAG 2 shared with RAG 0", .pme_code = 0x101b2, /* 65970 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_BANK_RAG1", .pme_desc = "Even bank token from RAG 2 shared with RAG 1", .pme_code = 0x101b3, /* 65971 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_E_BANK_RAG3", .pme_desc = "Even bank token from RAG 2 shared with RAG 3", .pme_code = 0x101b4, /* 65972 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_BANK_RAG0", .pme_desc = "Odd bank token from RAG 2 shared with RAG 0", .pme_code = 0x101b5, /* 65973 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_BANK_RAG1", .pme_desc = "Odd bank token from RAG 2 shared with RAG 1", .pme_code = 0x101b6, /* 65974 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG2_O_BANK_RAG3", .pme_desc = "Odd bank token from RAG 2 shared with RAG 3", .pme_code = 0x101b7, /* 65975 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF0_IN_TOKEN_WASTED", .pme_desc = "IOIF0 In token wasted by RAG 0", .pme_code = 0x9ef38, /* 651064 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF0_OUT_TOKEN_WASTED", .pme_desc = "IOIF0 Out token wasted by RAG 0", .pme_code = 0x9ef39, /* 651065 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF1_IN_TOKEN_WASTED", .pme_desc = "IOIF1 In token wasted by RAG 0", .pme_code = 0x9ef3a, /* 651066 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG0_IOIF1_OUT_TOKEN_WASTED", .pme_desc = "IOIF1 Out token wasted by RAG 0", .pme_code = 0x9ef3b, /* 651067 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_XIO_UNUSED", .pme_desc = "Even XIO token was unused by RAG 3.", .pme_code = 0x9efac, /* 651180 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_XIO_UNUSED", .pme_desc = "Odd XIO token was unused by RAG 3.", .pme_code = 0x9efad, /* 651181 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_BANK_UNUSED", .pme_desc = "Even bank token was unused by RAG 3.", .pme_code = 0x9efae, /* 651182 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_BANK_UNUSED", .pme_desc = "Odd bank token was unused by RAG 3.", .pme_code = 0x9efaf, /* 651183 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_XIO_WASTED", .pme_desc = "Even XIO token wasted by RAG 3", .pme_code = 0x9f010, /* 651280 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_XIO_WASTED", .pme_desc = "Odd XIO token wasted by RAG 3", .pme_code = 0x9f011, /* 651281 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_BANK_WASTED", .pme_desc = "Even bank token wasted by RAG 3", .pme_code = 0x9f012, /* 651282 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_BANK_WASTED", .pme_desc = "Odd bank token wasted by RAG 3", .pme_code = 0x9f013, /* 651283 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_XIO_RAG0", .pme_desc = "Even XIO token from RAG 3 shared with RAG 0", .pme_code = 0x9f074, /* 651380 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_XIO_RAG1", .pme_desc = "Even XIO token from RAG 3 shared with RAG 1", .pme_code = 0x9f075, /* 651381 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_XIO_RAG2", .pme_desc = "Even XIO token from RAG 3 shared with RAG 2", .pme_code = 0x9f076, /* 651382 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_XIO_RAG0", .pme_desc = "Odd XIO token from RAG 3 shared with RAG 0", .pme_code = 0x9f077, /* 651383 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_XIO_RAG1", .pme_desc = "Odd XIO token from RAG 3 shared with RAG 1", .pme_code = 0x9f078, /* 651384 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_XIO_RAG2", .pme_desc = "Odd XIO token from RAG 3 shared with RAG 2", .pme_code = 0x9f079, /* 651385 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_BANK_RAG0", .pme_desc = "Even bank token from RAG 3 shared with RAG 0", .pme_code = 0x9f07a, /* 651386 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_BANK_RAG1", .pme_desc = "Even bank token from RAG 3 shared with RAG 1", .pme_code = 0x9f07b, /* 651387 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_E_BANK_RAG2", .pme_desc = "Even bank token from RAG 3 shared with RAG 2", .pme_code = 0x9f07c, /* 651388 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_BANK_RAG0", .pme_desc = "Odd bank token from RAG 3 shared with RAG 0", .pme_code = 0x9f07d, /* 651389 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_BANK_RAG1", .pme_desc = "Odd bank token from RAG 3 shared with RAG 1", .pme_code = 0x9f07e, /* 651390 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "EIB_RAG3_O_BANK_RAG2", .pme_desc = "Odd bank token from RAG 3 shared with RAG 2", .pme_code = 0x9f07f, /* 651391 */ .pme_enable_word = WORD_2_ONLY, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_READ_CMD_Q_EMPTY", .pme_desc = "XIO1 - Read command queue is empty.", .pme_code = 0x1bc5, /* 7109 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_WRITE_CMD_Q_EMPTY", .pme_desc = "XIO1 - Write command queue is empty.", .pme_code = 0x1bc6, /* 7110 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_READ_CMD_Q_FULL", .pme_desc = "XIO1 - Read command queue is full.", .pme_code = 0x1bc8, /* 7112 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_RESPONDS_READ_RETRY", .pme_desc = "XIO1 - MIC responds with a Retry for a read command because the read command queue is full.", .pme_code = 0x1bc9, /* 7113 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_WRITE_CMD_Q_FULL", .pme_desc = "XIO1 - Write command queue is full.", .pme_code = 0x1bca, /* 7114 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_RESPONDS_WRITE_RETRY", .pme_desc = "XIO1 - MIC responds with a Retry for a write command because the write command queue is full.", .pme_code = 0x1bcb, /* 7115 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_READ_CMD_DISPATCHED", .pme_desc = "XIO1 - Read command dispatched; includes high-priority and fast-path reads.", .pme_code = 0x1bde, /* 7134 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_WRITE_CMD_DISPATCHED", .pme_desc = "XIO1 - Write command dispatched.", .pme_code = 0x1bdf, /* 7135 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_READ_MOD_WRITE_CMD_DISPATCHED", .pme_desc = "XIO1 - Read-Modify-Write command (data size < 16 bytes) dispatched.", .pme_code = 0x1be0, /* 7136 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_REFRESH_DISPATCHED", .pme_desc = "XIO1 - Refresh dispatched.", .pme_code = 0x1be1, /* 7137 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_BYTE_MSK_WRITE_CMD_DISPATCHED", .pme_desc = "XIO1 - Byte-masking write command (data size >= 16 bytes) dispatched.", .pme_code = 0x1be3, /* 7139 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_WRITE_CMD_DISPATCHED_AFTER_READ", .pme_desc = "XIO1 - Write command dispatched after a read command was previously dispatched.", .pme_code = 0x1be5, /* 7141 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO1_READ_CMD_DISPATCHED_AFTER_WRITE", .pme_desc = "XIO1 - Read command dispatched after a write command was previously dispatched.", .pme_code = 0x1be6, /* 7142 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_CMD_Q_EMPTY", .pme_desc = "XIO0 - Read command queue is empty.", .pme_code = 0x1c29, /* 7209 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_WRITE_CMD_Q_EMPTY", .pme_desc = "XIO0 - Write command queue is empty.", .pme_code = 0x1c2a, /* 7210 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_CMD_Q_FULL", .pme_desc = "XIO0 - Read command queue is full.", .pme_code = 0x1c2c, /* 7212 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_RESPONDS_READ_RETRY", .pme_desc = "XIO0 - MIC responds with a Retry for a read command because the read command queue is full.", .pme_code = 0x1c2d, /* 7213 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_WRITE_CMD_Q_FULL", .pme_desc = "XIO0 - Write command queue is full.", .pme_code = 0x1c2e, /* 7214 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_RESPONDS_WRITE_RETRY", .pme_desc = "XIO0 - MIC responds with a Retry for a write command because the write command queue is full.", .pme_code = 0x1c2f, /* 7215 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_CMD_DISPATCHED", .pme_desc = "XIO0 - Read command dispatched; includes high-priority and fast-path reads.", .pme_code = 0x1c42, /* 7234 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_WRITE_CMD_DISPATCHED", .pme_desc = "XIO0 - Write command dispatched.", .pme_code = 0x1c43, /* 7235 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_MOD_WRITE_CMD_DISPATCHED", .pme_desc = "XIO0 - Read-Modify-Write command (data size < 16 bytes) dispatched.", .pme_code = 0x1c44, /* 7236 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_REFRESH_DISPATCHED", .pme_desc = "XIO0 - Refresh dispatched.", .pme_code = 0x1c45, /* 7237 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_WRITE_CMD_DISPATCHED_AFTER_READ", .pme_desc = "XIO0 - Write command dispatched after a read command was previously dispatched.", .pme_code = 0x1c49, /* 7241 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_CMD_DISPATCHED_AFTER_WRITE", .pme_desc = "XIO0 - Read command dispatched after a write command was previously dispatched.", .pme_code = 0x1c4a, /* 7242 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_WRITE_CMD_DISPATCHED_2", .pme_desc = "XIO0 - Write command dispatched.", .pme_code = 0x1ca7, /* 7335 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_READ_MOD_WRITE_CMD_DISPATCHED_2", .pme_desc = "XIO0 - Read-Modify-Write command (data size < 16 bytes) dispatched.", .pme_code = 0x1ca8, /* 7336 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_REFRESH_DISPATCHED_2", .pme_desc = "XIO0 - Refresh dispatched.", .pme_code = 0x1ca9, /* 7337 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "MIC_XIO0_BYTE_MSK_WRITE_CMD_DISPATCHED", .pme_desc = "XIO0 - Byte-masking write command (data size >= 16 bytes) dispatched.", .pme_code = 0x1cab, /* 7339 */ .pme_enable_word = 0xF, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEA_DATA_PLG", .pme_desc = "Type A data physical layer group (PLG). Does not include header-only or credit-only data PLGs. In IOIF mode, counts I/O device read data; in BIF mode, counts all outbound data.", .pme_code = 0x1fb0, /* 8112 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEB_DATA_PLG", .pme_desc = "Type B data PLG. In IOIF mode, counts I/O device read data; in BIF mode, counts all outbound data.", .pme_code = 0x1fb1, /* 8113 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_IOIF_TYPEA_DATA_PLG", .pme_desc = "Type A data PLG. Does not include header-only or credit-only PLGs. In IOIF mode, counts CBE store data to I/O device. Does not apply in BIF mode.", .pme_code = 0x1fb2, /* 8114 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_IOIF_TYPEB_DATA_PLG", .pme_desc = "Type B data PLG. In IOIF mode, counts CBE store data to an I/O device. Does not apply in BIF mode.", .pme_code = 0x1fb3, /* 8115 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_DATA_PLG", .pme_desc = "Data PLG. Does not include header-only or credit-only PLGs.", .pme_code = 0x1fb4, /* 8116 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_CMD_PLG", .pme_desc = "Command PLG (no credit-only PLG). In IOIF mode, counts I/O command or reply PLGs. In BIF mode, counts command/ reflected command or snoop/combined responses.", .pme_code = 0x1fb5, /* 8117 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEA_TRANSFER", .pme_desc = "Type A data transfer regardless of length. Can also be used to count Type A data header PLGs (but not credit-only PLGs).", .pme_code = 0x1fb6, /* 8118 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEB_TRANSFER", .pme_desc = "Type B data transfer.", .pme_code = 0x1fb7, /* 8119 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_CMD_GREDIT_ONLY_PLG", .pme_desc = "Command-credit-only command PLG in either IOIF or BIF mode.", .pme_code = 0x1fb8, /* 8120 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_DATA_CREDIT_ONLY_PLG", .pme_desc = "Data-credit-only data PLG sent in either IOIF or BIF mode.", .pme_code = 0x1fb9, /* 8121 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_NON_NULL_ENVLP_SENT", .pme_desc = "Non-null envelope sent (does not include long envelopes).", .pme_code = 0x1fba, /* 8122 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_NULL_ENVLP_SENT", .pme_desc = "Null envelope sent.", .pme_code = 0x1fbc, /* 8124 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_NO_VALID_DATA_SENT", .pme_desc = "No valid data sent this cycle.", .pme_code = 0x1fbd, /* 8125 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_NORMAL_ENVLP_SENT", .pme_desc = "Normal envelope sent.", .pme_code = 0x1fbe, /* 8126 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_LONG_ENVLP_SENT", .pme_desc = "Long envelope sent.", .pme_code = 0x1fbf, /* 8127 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_NULL_PLG_INSERTED", .pme_desc = "A Null PLG inserted in an outgoing envelope.", .pme_code = 0x1fc0, /* 8128 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_OUTBOUND_ENV_ARRAY_FULL", .pme_desc = "Outbound envelope array is full.", .pme_code = 0x1fc1, /* 8129 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_TYPEB_TRANSFER", .pme_desc = "Type B data transfer.", .pme_code = 0x201b, /* 8219 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_NULL_ENVLP_RECV", .pme_desc = "Null envelope received.", .pme_code = 0x206d, /* 8301 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_CMD_PLG_2", .pme_desc = "Command PLG, but not credit-only PLG. In IOIF mode, counts I/O command or reply PLGs. In BIF mode, counts command/reflected command or snoop/combined responses.", .pme_code = 0x207a, /* 8314 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_CMD_GREDIT_ONLY_PLG_2", .pme_desc = "Command-credit-only command PLG.", .pme_code = 0x207b, /* 8315 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_NORMAL_ENVLP_RECV", .pme_desc = "Normal envelope received is good.", .pme_code = 0x2080, /* 8320 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_LONG_ENVLP_RECV", .pme_desc = "Long envelope received is good.", .pme_code = 0x2081, /* 8321 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF0_DATA_GREDIT_ONLY_PLG_2", .pme_desc = "Data-credit-only data PLG in either IOIF or BIF mode; will count a maximum of one per envelope.", .pme_code = 0x2082, /* 8322 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_NON_NULL_ENVLP", .pme_desc = "Non-null envelope; does not include long envelopes; includes retried envelopes.", .pme_code = 0x2083, /* 8323 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_DATA_GRANT_RECV", .pme_desc = "Data grant received.", .pme_code = 0x2084, /* 8324 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_DATA_PLG_2", .pme_desc = "Data PLG. Does not include header-only or credit-only PLGs.", .pme_code = 0x2088, /* 8328 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEA_TRANSFER_2", .pme_desc = "Type A data transfer regardless of length. Can also be used to count Type A data header PLGs, but not credit-only PLGs.", .pme_code = 0x2089, /* 8329 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF0_TYPEB_TRANSFER_2", .pme_desc = "Type B data transfer.", .pme_code = 0x208a, /* 8330 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_NULL_ENVLP_RECV", .pme_desc = "Null envelope received.", .pme_code = 0x20d1, /* 8401 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF1_CMD_PLG_2", .pme_desc = "Command PLG (no credit-only PLG). Counts I/O command or reply PLGs.", .pme_code = 0x20de, /* 8414 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_CMD_GREDIT_ONLY_PLG_2", .pme_desc = "Command-credit-only command PLG.", .pme_code = 0x20df, /* 8415 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_NORMAL_ENVLP_RECV", .pme_desc = "Normal envelope received is good.", .pme_code = 0x20e4, /* 8420 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF1_LONG_ENVLP_RECV", .pme_desc = "Long envelope received is good.", .pme_code = 0x20e5, /* 8421 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "BIF_IOIF1_DATA_GREDIT_ONLY_PLG_2", .pme_desc = "Data-credit-only data PLG received; will count a maximum of one per envelope.", .pme_code = 0x20e6, /* 8422 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_NON_NULL_ENVLP", .pme_desc = "Non-Null envelope received; does not include long envelopes; includes retried envelopes.", .pme_code = 0x20e7, /* 8423 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_DATA_GRANT_RECV", .pme_desc = "Data grant received.", .pme_code = 0x20e8, /* 8424 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_DATA_PLG_2", .pme_desc = "Data PLG received. Does not include header-only or credit-only PLGs.", .pme_code = 0x20ec, /* 8428 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_TYPEA_TRANSFER_2", .pme_desc = "Type I A data transfer regardless of length. Can also be used to count Type A data header PLGs (but not credit-only PLGs).", .pme_code = 0x20ed, /* 8429 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "BIF_IOIF1_TYPEB_TRANSFER_2", .pme_desc = "Type B data transfer received.", .pme_code = 0x20ee, /* 8430 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_MMIO_READ_IOIF1", .pme_desc = "Received MMIO read targeted to IOIF1.", .pme_code = 0x213c, /* 8508 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_MMIO_WRITE_IOIF1", .pme_desc = "Received MMIO write targeted to IOIF1.", .pme_code = 0x213d, /* 8509 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_MMIO_READ_IOIF0", .pme_desc = "Received MMIO read targeted to IOIF0.", .pme_code = 0x213e, /* 8510 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_MMIO_WRITE_IOIF0", .pme_desc = "Received MMIO write targeted to IOIF0.", .pme_code = 0x213f, /* 8511 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_CMD_TO_IOIF0", .pme_desc = "Sent command to IOIF0.", .pme_code = 0x2140, /* 8512 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_CMD_TO_IOIF1", .pme_desc = "Sent command to IOIF1.", .pme_code = 0x2141, /* 8513 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_MATRIX3_OCCUPIED", .pme_desc = "IOIF0 Dependency Matrix 3 is occupied by a dependent command.", .pme_code = 0x219d, /* 8605 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "IOC_IOIF0_MATRIX4_OCCUPIED", .pme_desc = "IOIF0 Dependency Matrix 4 is occupied by a dependent command.", .pme_code = 0x219e, /* 8606 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "IOC_IOIF0_MATRIX5_OCCUPIED", .pme_desc = "IOIF0 Dependency Matrix 5 is occupied by a dependent command.", .pme_code = 0x219f, /* 8607 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_BOTH_TYPE, }, {.pme_name = "IOC_DMA_READ_IOIF0", .pme_desc = "Received read request from IOIF0.", .pme_code = 0x21a2, /* 8610 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_DMA_WRITE_IOIF0", .pme_desc = "Received write request from IOIF0.", .pme_code = 0x21a3, /* 8611 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_INTERRUPT_IOIF0", .pme_desc = "Received interrupt from the IOIF0.", .pme_code = 0x21a6, /* 8614 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_E_MEM", .pme_desc = "IOIF0 request for token for even memory banks 0-14.", .pme_code = 0x220c, /* 8716 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_O_MEM", .pme_desc = "IOIF0 request for token for odd memory banks 1-15.", .pme_code = 0x220d, /* 8717 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_1357", .pme_desc = "IOIF0 request for token type 1, 3, 5, or 7.", .pme_code = 0x220e, /* 8718 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_9111315", .pme_desc = "IOIF0 request for token type 9, 11, 13, or 15.", .pme_code = 0x220f, /* 8719 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_16", .pme_desc = "IOIF0 request for token type 16.", .pme_code = 0x2214, /* 8724 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_17", .pme_desc = "IOIF0 request for token type 17.", .pme_code = 0x2215, /* 8725 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_18", .pme_desc = "IOIF0 request for token type 18.", .pme_code = 0x2216, /* 8726 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOIF0_REQ_TOKEN_19", .pme_desc = "IOIF0 request for token type 19.", .pme_code = 0x2217, /* 8727 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_CUMULATIVE_LEN, }, {.pme_name = "IOC_IOPT_CACHE_HIT", .pme_desc = "I/O page table cache hit for commands from IOIF.", .pme_code = 0x2260, /* 8800 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IOPT_CACHE_MISS", .pme_desc = "I/O page table cache miss for commands from IOIF.", .pme_code = 0x2261, /* 8801 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IOST_CACHE_HIT", .pme_desc = "I/O segment table cache hit.", .pme_code = 0x2263, /* 8803 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IOST_CACHE_MISS", .pme_desc = "I/O segment table cache miss.", .pme_code = 0x2264, /* 8804 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_INTERRUPT_FROM_SPU", .pme_desc = "Interrupt received from any SPU (reflected cmd when IIC has sent ACK response).", .pme_code = 0x2278, /* 8824 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IIC_INTERRUPT_TO_PPU_TH0", .pme_desc = "Internal interrupt controller (IIC) generated interrupt to PPU thread 0.", .pme_code = 0x2279, /* 8825 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_IIC_INTERRUPT_TO_PPU_TH1", .pme_desc = "IIC generated interrupt to PPU thread 1.", .pme_code = 0x227a, /* 8826 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_RECV_EXTERNAL_INTERRUPT_TO_TH0", .pme_desc = "Received external interrupt (using MMIO) from PPU to PPU thread 0.", .pme_code = 0x227b, /* 8827 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, {.pme_name = "IOC_RECV_EXTERNAL_INTERRUPT_TO_TH1", .pme_desc = "Received external interrupt (using MMIO) from PPU to PPU thread 1.", .pme_code = 0x227c, /* 8828 */ .pme_enable_word = WORD_0_AND_2, .pme_freq = PFM_CELL_PME_FREQ_HALF, .pme_type = COUNT_TYPE_OCCURRENCE, }, }; /*--- The number of events : 435 ---*/ #define PME_CELL_EVENT_COUNT (sizeof(cell_pe)/sizeof(pme_cell_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_ppro_events.h0000600003276200002170000004656512247131123022264 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: ppro (Intel Pentium Pro) */ static const intel_x86_umask_t ppro_l2_ifetch[]={ { .uname = "I", .udesc = "Invalid state", .ucode = 0x100, }, { .uname = "S", .udesc = "Shared state", .ucode = 0x200, }, { .uname = "E", .udesc = "Exclusive state", .ucode = 0x400, }, { .uname = "M", .udesc = "Modified state", .ucode = 0x800, }, }; static const intel_x86_umask_t ppro_bus_drdy_clocks[]={ { .uname = "SELF", .udesc = "Clocks when processor is driving bus", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "Clocks when any agent is driving bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_ppro_pe[]={ { .name = "CPU_CLK_UNHALTED", .desc = "Number cycles during which the processor is not halted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x79, }, { .name = "INST_RETIRED", .desc = "Number of instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, }, { .name = "DATA_MEM_REFS", .desc = "All loads from any memory type. All stores to any memory typeEach part of a split is counted separately. The internal logic counts not only memory loads and stores but also internal retries. 80-bit floating point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performe, is only counted once). Does ot include I/O accesses or other non-memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x43, }, { .name = "DCU_LINES_IN", .desc = "Total lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x45, }, { .name = "DCU_M_LINES_IN", .desc = "Number of M state lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "DCU_M_LINES_OUT", .desc = "Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "DCU_MISS_OUTSTANDING", .desc = "Weighted number of cycle while a DCU miss is outstanding, incremented by the number of cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded Read-for-ownerships are counted, as well as line fills, invalidates, and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "IFU_IFETCH", .desc = "Number of instruction fetches, both cacheable and noncacheable including UC fetches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "IFU_IFETCH_MISS", .desc = "Number of instruction fetch misses. All instructions fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB_MISS", .desc = "Number of ITLB misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x85, }, { .name = "IFU_MEM_STALL", .desc = "Number of cycles instruction fetch is stalled for any reason. Includs IFU cache misses, ITLB misses, ITLB faults, and other minor stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Number of cycles that the instruction length decoder is stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "L2_IFETCH", .desc = "Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches: it does not include UC instruction fetches It does not include ITLB miss accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(ppro_l2_ifetch), .ngrp = 1, .umasks = ppro_l2_ifetch, }, { .name = "L2_ST", .desc = "Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indictes that the DCU sent a read-for ownership request to the L2. It also includes Invalid to Modified reqyests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(ppro_l2_ifetch), .ngrp = 1, .umasks = ppro_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_INM", .desc = "Number of modified lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, }, { .name = "L2_RQSTS", .desc = "Total number of L2 requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(ppro_l2_ifetch), .ngrp = 1, .umasks = ppro_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_ADS", .desc = "Number of L2 address strobes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, }, { .name = "L2_DBUS_BUSY", .desc = "Number of cycles during which the L2 cache data bus was busy", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x22, }, { .name = "L2_DBUS_BUSY_RD", .desc = "Number of cycles during which the data bus was busy transferring read data from L2 to the processor", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, }, { .name = "BUS_DRDY_CLOCKS", .desc = "Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, }, { .name = "BUS_LOCK_CLOCKS", .desc = "Number of clocks during which LOCK# is asserted on the external system bus", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_REQ_OUTSTANDING", .desc = "Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, }, { .name = "BUS_TRANS_BRD", .desc = "Number of burst read transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "Number of completed read for ownership transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Number of completed write back transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_IFETCH", .desc = "Number of completed instruction fetch transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_INVAL", .desc = "Number of completed invalidate transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_PWR", .desc = "Number of completed partial write transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Number of completed partial transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "Number of completed I/O transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_DEF", .desc = "Number of completed deferred transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_BURST", .desc = "Number of completed burst transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_ANY", .desc = "Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_MEM", .desc = "Number of completed memory transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(ppro_bus_drdy_clocks), .ngrp = 1, .umasks = ppro_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RECV", .desc = "Number of bus clock cycles during which this processor is receiving data", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x64, }, { .name = "BUS_BNR_DRV", .desc = "Number of bus clock cycles during which this processor is driving the BNR# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, }, { .name = "BUS_HIT_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HIT# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7a, }, { .name = "BUS_HITM_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HITM# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7b, }, { .name = "BUS_SNOOP_STALL", .desc = "Number of clock cycles during which the bus is snoop stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, }, { .name = "FLOPS", .desc = "Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating point loads and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xc1, }, { .name = "FP_COMP_OPS_EXE", .desc = "Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "Number of floating-point exception cases handled by microcode.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Number of multiplies.This count includes integer as well as FP multiplies and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Number of divides.This count includes integer as well as FP divides and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Number of cycles during which the divider is busy, and cannot accept new divides. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "LD_BLOCKS", .desc = "Number of load operations delayed due to store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, }, { .name = "SB_DRAINS", .desc = "Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, }, { .name = "MISALIGN_MEM_REF", .desc = "Number of misaligned data memory references. Incremented by 1 every cycle during which, either the processor's load or store pipeline dispatches a misaligned micro-op Counting is performed if it is the first or second half or if it is blocked, squashed, or missed. In this context, misaligned means crossing a 64-bit boundary", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x5, }, { .name = "UOPS_RETIRED", .desc = "Number of micro-ops retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, }, { .name = "INST_DECODED", .desc = "Number of instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd0, }, { .name = "HW_INT_RX", .desc = "Number of hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "CYCLES_INT_MASKED", .desc = "Number of processor cycles for which interrupts are disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc6, }, { .name = "CYCLES_INT_PENDING_AND_MASKED", .desc = "Number of processor cycles for which interrupts are disabled and interrupts are pending.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, }, { .name = "BR_INST_RETIRED", .desc = "Number of branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, }, { .name = "BR_MISS_PRED_RETIRED", .desc = "Number of mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, }, { .name = "BR_TAKEN_RETIRED", .desc = "Number of taken branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "BR_MISS_PRED_TAKEN_RET", .desc = "Number of taken mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, }, { .name = "BR_INST_DECODED", .desc = "Number of branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BTB_MISSES", .desc = "Number of branches for which the BTB did not produce a prediction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe2, }, { .name = "BR_BOGUS", .desc = "Number of bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "RESOURCE_STALLS", .desc = "Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa2, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Number of cycles or events for partial stalls. This includes flag partial stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd2, }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "L2_LD", .desc = "Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(ppro_l2_ifetch), .ngrp = 1, .umasks = ppro_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "Number of lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, }, { .name = "L2_LINES_OUT", .desc = "Number of lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, }, { .name = "L2_M_LINES_OUTM", .desc = "Number of modified lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, }, }; papi-5.3.0/src/libpfm4/lib/events/sparc_ultra12_events.h0000600003276200002170000000616012247131124022557 0ustar ralphundrgradstatic const sparc_entry_t ultra12_pe[] = { /* These two must always be first. */ { .name = "Cycle_cnt", .desc = "Accumulated cycles", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x1, }, { .name = "Dispatch0_IC_miss", .desc = "I-buffer is empty from I-Cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, /* PIC0 events for UltraSPARC-I/II/IIi/IIe */ { .name = "Dispatch0_storeBuf", .desc = "Store buffer can not hold additional stores", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "IC_ref", .desc = "I-cache references", .ctrl = PME_CTRL_S0, .code = 0x8, }, { .name = "DC_rd", .desc = "D-cache read references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0x9, }, { .name = "DC_wr", .desc = "D-cache write references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0xa, }, { .name = "Load_use", .desc = "An instruction in the execute stage depends on an earlier load result that is not yet available", .ctrl = PME_CTRL_S0, .code = 0xb, }, { .name = "EC_ref", .desc = "Total E-cache references", .ctrl = PME_CTRL_S0, .code = 0xc, }, { .name = "EC_write_hit_RDO", .desc = "E-cache hits that do a read for ownership UPA transaction", .ctrl = PME_CTRL_S0, .code = 0xd, }, { .name = "EC_snoop_inv", .desc = "E-cache invalidates from the following UPA transactions: S_INV_REQ, S_CPI_REQ", .ctrl = PME_CTRL_S0, .code = 0xe, }, { .name = "EC_rd_hit", .desc = "E-cache read hits from D-cache misses", .ctrl = PME_CTRL_S0, .code = 0xf, }, /* PIC1 events for UltraSPARC-I/II/IIi/IIe */ { .name = "Dispatch0_mispred", .desc = "I-buffer is empty from Branch misprediction", .ctrl = PME_CTRL_S1, .code = 0x2, }, { .name = "Dispatch0_FP_use", .desc = "First instruction in the group depends on an earlier floating point result that is not yet available", .ctrl = PME_CTRL_S1, .code = 0x3, }, { .name = "IC_hit", .desc = "I-cache hits", .ctrl = PME_CTRL_S1, .code = 0x8, }, { .name = "DC_rd_hit", .desc = "D-cache read hits", .ctrl = PME_CTRL_S1, .code = 0x9, }, { .name = "DC_wr_hit", .desc = "D-cache write hits", .ctrl = PME_CTRL_S1, .code = 0xa, }, { .name = "Load_use_RAW", .desc = "There is a load use in the execute stage and there is a read-after-write hazard on the oldest outstanding load", .ctrl = PME_CTRL_S1, .code = 0xb, }, { .name = "EC_hit", .desc = "Total E-cache hits", .ctrl = PME_CTRL_S1, .code = 0xc, }, { .name = "EC_wb", .desc = "E-cache misses that do writebacks", .ctrl = PME_CTRL_S1, .code = 0xd, }, { .name = "EC_snoop_cb", .desc = "E-cache snoop copy-backs from the following UPA transactions: S_CPB_REQ, S_CPI_REQ, S_CPD_REQ, S_CPB_MIS_REQ", .ctrl = PME_CTRL_S1, .code = 0xe, }, { .name = "EC_ic_hit", .desc = "E-cache read hits from I-cache misses", .ctrl = PME_CTRL_S1, .code = 0xf, }, }; #define PME_SPARC_ULTRA12_EVENT_COUNT (sizeof(ultra12_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/sparc_niagara2_events.h0000600003276200002170000001473612247131124022761 0ustar ralphundrgradstatic const sparc_entry_t niagara2_pe[] = { /* PIC0 Niagara-2 events */ { .name = "All_strands_idle", .desc = "Cycles when no strand can be picked for the physical core on which the monitoring strand resides.", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x2, .umasks = { { .uname = "branches", .udesc = "Completed branches", .ubit = 0, }, { .uname = "taken_branches", .udesc = "Taken branches, which are always mispredicted", .ubit = 1, }, { .uname = "FGU_arith", .udesc = "All FADD, FSUB, FCMP, convert, FMUL, FDIV, FNEG, FABS, FSQRT, FMOV, FPADD, FPSUB, FPACK, FEXPAND, FPMERGE, FMUL8, FMULD8, FALIGNDATA, BSHUFFLE, FZERO, FONE, FSRC, FNOT1, FNOT2, FOR, FNOR, FAND, FNAND, FXOR, FXNOR, FORNOT1, FORNOT2, FANDNOT1, FANDNOT2, PDIST, SIAM", .ubit = 2, }, { .uname = "Loads", .udesc = "Load instructions", .ubit = 3, }, { .uname = "Stores", .udesc = "Stores instructions", .ubit = 3, }, { .uname = "SW_count", .udesc = "Software count 'sethi %hi(fc00), %g0' instructions", .ubit = 5, }, { .uname = "other", .udesc = "Instructions not covered by other mask bits", .ubit = 6, }, { .uname = "atomics", .udesc = "Atomics are LDSTUB/A, CASA/XA, SWAP/A", .ubit = 7, }, }, .numasks = 8, }, { .name = "cache", .desc = "Cache events", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x3, .umasks = { { .uname = "IC_miss", .udesc = "I-cache misses. This counts only primary instruction cache misses, and does not count duplicate instruction cache misses.4 Also, only 'true' misses are counted. If a thread encounters an I$ miss, but the thread is redirected (due to a branch misprediction or trap, for example) before the line returns from L2 and is loaded into the I$, then the miss is not counted.", .ubit = 0, }, { .uname = "DC_miss", .udesc = "D-cache misses. This counts both primary and duplicate data cache misses.", .ubit = 1, }, { .uname = "L2IC_miss", .udesc = "L2 cache instruction misses", .ubit = 4, }, { .uname = "L2LD_miss", .udesc = "L2 cache load misses. Block loads are treated as one L2 miss event. In reality, each individual load can hit or miss in the L2 since the block load is not atomic.", .ubit = 5, }, }, .numasks = 4, }, { .name = "TLB", .desc = "TLB events", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x4, .umasks = { { .uname = "ITLB_L2ref", .udesc = "ITLB references to L2. For each ITLB miss with hardware tablewalk enabled, count each access the ITLB hardware tablewalk makes to L2.", .ubit = 2, }, { .uname = "DTLB_L2ref", .udesc = "DTLB references to L2. For each DTLB miss with hardware tablewalk enabled, count each access the DTLB hardware tablewalk makes to L2.", .ubit = 3, }, { .uname = "ITLB_L2miss", .udesc = "For each ITLB miss with hardware tablewalk enabled, count each access the ITLB hardware tablewalk makes to L2 which misses in L2. Note: Depending upon the hardware table walk configuration, each ITLB miss may issue from 1 to 4 requests to L2 to search TSBs.", .ubit = 4, }, { .uname = "DTLB_L2miss", .udesc = "For each DTLB miss with hardware tablewalk enabled, count each access the DTLB hardware tablewalk makes to L2 which misses in L2. Note: Depending upon the hardware table walk configuration, each DTLB miss may issue from 1 to 4 requests to L2 to search TSBs.", .ubit = 5, }, }, .numasks = 4, }, { .name = "mem", .desc = "Memory operations", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x5, .umasks = { { .uname = "stream_load", .udesc = "Stream Unit load operations to L2", .ubit = 0, }, { .uname = "stream_store", .udesc = "Stream Unit store operations to L2", .ubit = 1, }, { .uname = "cpu_load", .udesc = "CPU loads to L2", .ubit = 2, }, { .uname = "cpu_ifetch", .udesc = "CPU instruction fetches to L2", .ubit = 3, }, { .uname = "cpu_store", .udesc = "CPU stores to L2", .ubit = 6, }, { .uname = "mmu_load", .udesc = "MMU loads to L2", .ubit = 7, }, }, .numasks = 6, }, { .name = "spu_ops", .desc = "Stream Unit operations. User, supervisor, and hypervisor counting must all be enabled to properly count these events.", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x6, .umasks = { { .uname = "DES", .udesc = "Increment for each CWQ or ASI operation that uses DES/3DES unit", .ubit = 0, }, { .uname = "AES", .udesc = "Increment for each CWQ or ASI operation that uses AES unit", .ubit = 1, }, { .uname = "RC4", .udesc = "Increment for each CWQ or ASI operation that uses RC4 unit", .ubit = 2, }, { .uname = "HASH", .udesc = "Increment for each CWQ or ASI operation that uses MD5/SHA-1/SHA-256 unit", .ubit = 3, }, { .uname = "MA", .udesc = "Increment for each CWQ or ASI modular arithmetic operation", .ubit = 4, }, { .uname = "CSUM", .udesc = "Increment for each iSCSI CRC or TCP/IP checksum operation", .ubit = 5, }, }, .numasks = 6, }, { .name = "spu_busy", .desc = "Stream Unit busy cycles. User, supervisor, and hypervisor counting must all be enabled to properly count these events.", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x07, .umasks = { { .uname = "DES", .udesc = "Cycles the DES/3DES unit is busy", .ubit = 0, }, { .uname = "AES", .udesc = "Cycles the AES unit is busy", .ubit = 1, }, { .uname = "RC4", .udesc = "Cycles the RC4 unit is busy", .ubit = 2, }, { .uname = "HASH", .udesc = "Cycles the MD5/SHA-1/SHA-256 unit is busy", .ubit = 3, }, { .uname = "MA", .udesc = "Cycles the modular arithmetic unit is busy", .ubit = 4, }, { .uname = "CSUM", .udesc = "Cycles the CRC/MPA/checksum unit is busy", .ubit = 5, }, }, .numasks = 6, }, { .name = "tlb_miss", .desc = "TLB misses", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0xb, .umasks = { { .uname = "ITLB", .udesc = "I-TLB misses", .ubit = 2, }, { .uname = "DTLB", .udesc = "D-TLB misses", .ubit = 3, }, }, .numasks = 2, }, }; #define PME_SPARC_NIAGARA2_EVENT_COUNT (sizeof(niagara2_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_ubo_events.h0000600003276200002170000000451612247131123024113 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: snbep_unc_ubo (Intel SandyBridge-EP U-Box uncore PMU) */ static const intel_x86_umask_t snbep_unc_u_event_msg[]={ { .uname = "DOORBELL_RCVD", .udesc = "TBD", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "INT_PRIO", .udesc = "TBD", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IPI_RCVD", .udesc = "TBD", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MSI_RCVD", .udesc = "TBD", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "VLW_RCVD", .udesc = "TBD", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_u_pe[]={ { .name = "UNC_U_EVENT_MSG", .desc = "VLW Received", .code = 0x42, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_UBO_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_u_event_msg), .umasks = snbep_unc_u_event_msg }, { .name = "UNC_U_LOCK_CYCLES", .desc = "IDI Lock/SplitLock Cycles", .code = 0x44, .cntmsk = 0x3, .modmsk = SNBEP_UNC_UBO_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/intel_snbep_events.h0000600003276200002170000023070612247131123022403 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snb (Intel Sandy Bridge EP) */ static const intel_x86_umask_t snbep_agu_bypass_cancel[]={ { .uname = "COUNT", .udesc = "This event counts executed load operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_arith[]={ { .uname = "FPU_DIV_ACTIVE", .udesc = "Cycles that the divider is active, includes integer and floating point", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FPU_DIV", .udesc = "Number of cycles the divider is activated, includes integer and floating point", .uequiv = "FPU_DIV_ACTIVE:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_br_inst_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All macro conditional non-taken branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All macro conditional taken branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_DIRECT_JUMP", .udesc = "All macro unconditional non-taken branch instructions, excluding calls and indirects", .ucode = 0x4200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_JUMP", .udesc = "All macro unconditional taken branch instructions, excluding calls and indirects", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All non-taken indirect branches that are not calls nor returns", .ucode = 0x4400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_RETURN_NEAR", .udesc = "All taken indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All near executed branches instructions (not necessarily retired)", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_CONDITIONAL", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All macro conditional branch instructions", .ucode = 0xc100, .uequiv = "ALL_CONDITIONAL", .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_br_inst_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All taken and not taken macro branches including far branches (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "All taken and not taken macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "FAR_BRANCH", .udesc = "Number of far branch instructions retired (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "All macro direct and indirect near calls, does not count far calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_RETURN", .udesc = "Number of near ret instructions retired (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_TAKEN", .udesc = "Number of near branch taken instructions retired (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "All not taken macro branch instructions retired (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_br_misp_exec[]={ { .uname = "NONTAKEN_COND", .udesc = "All non-taken mispredicted macro conditional branch instructions", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_COND", .udesc = "All taken mispredicted macro conditional branch instructions", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All non-taken mispredicted indirect branches that are not calls nor returns", .ucode = 0x4400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All taken mispredicted indirect branches that are not calls nor returns", .ucode = 0x8400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_RETURN_NEAR", .udesc = "All non-taken mispredicted indirect branches that have a return mnemonic", .ucode = 0x4800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_RETURN_NEAR", .udesc = "All taken mispredicted indirect branches that have a return mnemonic", .ucode = 0x8800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_DIRECT_NEAR_CALL", .udesc = "All non-taken mispredicted non-indirect calls", .ucode = 0x5000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "All taken mispredicted non-indirect calls", .ucode = 0x9000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NONTAKEN_INDIRECT_NEAR_CALL", .udesc = "All nontaken mispredicted indirect calls, including both register and memory indirect", .ucode = 0x6000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All taken mispredicted indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_COND", .udesc = "All mispredicted macro conditional branch instructions", .ucode = 0xc100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_RETURN_NEAR", .udesc = "All mispredicted indirect branches that have a return mnemonic", .ucode = 0xc800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_DIRECT_NEAR_CALL", .udesc = "All mispredicted non-indirect calls", .ucode = 0xd000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All mispredicted indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All mispredicted branch instructions", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_br_misp_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "All mispredicted macro branches (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "All mispredicted macro conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "All macro direct and indirect near calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "Number of branch instructions retired that were mispredicted and not-taken (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TAKEN", .udesc = "Number of branch instructions retired that were mispredicted and taken (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_lock_cycles[]={ { .uname = "SPLIT_LOCK_UC_LOCK_DURATION", .udesc = "Cycles in which the L1D and L2 are locked, due to a UC lock or split lock", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CACHE_LOCK_DURATION", .udesc = "Cycles in which the L1D is locked", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_cpl_cycles[]={ { .uname = "RING0", .udesc = "Unhalted core cycles the thread was in ring 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING0_TRANS", .udesc = "Transitions from rings 1, 2, or 3 to ring 0", .uequiv = "RING0:c=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "RING123", .udesc = "Unhalted core cycles the thread was in rings 1, 2, or 3", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_cpu_clk_unhalted[]={ { .uname = "REF_P", .udesc = "Cycles when the core is unhalted (count at 100 Mhz)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_dsb2mite_switches[]={ { .uname = "COUNT", .udesc = "Number of DSB to MITE switches", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PENALTY_CYCLES", .udesc = "Cycles SB to MITE switches caused delay", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_dsb_fill[]={ { .uname = "ALL_CANCEL", .udesc = "Number of times a valid DSB fill has been cancelled for any reason", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "EXCEED_DSB_LINES", .udesc = "DSB Fill encountered > 3 DSB lines", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "OTHER_CANCEL", .udesc = "Number of times a valid DSB fill has been cancelled not because of exceeding way limit", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_dtlb_load_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes an page walk of any page size", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CAUSES_A_WALK", .udesc = "Demand load miss in all TLB levels which causes an page walk of any page size", .ucode = 0x100, .uequiv = "MISS_CAUSES_A_WALK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "Number of DTLB lookups for loads which missed first level DTLB but hit second level DTLB (STLB); No page walk.", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Demand load miss in all TLB levels which causes a page walk that completes for any page size", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with a walk", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_dtlb_store_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CAUSES_A_WALK", .udesc = "Miss in all TLB levels that causes a page walk of any page size (4K/2M/4M/1G)", .ucode = 0x100, .uequiv = "MISS_CAUSES_A_WALK", .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "First level miss but second level hit; no page walk. Only relevant if multiple levels", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Miss in all TLB levels that causes a page walk that completes of any page size (4K/2M/4M/1G)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles PMH is busy with this walk", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_fp_assist[]={ { .uname = "ANY", .udesc = "Cycles with any input/output SSE or FP assists", .ucode = 0x1e00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "SIMD_INPUT", .udesc = "Number of SIMD FP assists due to input values", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SIMD_OUTPUT", .udesc = "Number of SIMD FP assists due to output values", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_INPUT", .udesc = "Number of X87 assists due to input value", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87_OUTPUT", .udesc = "Number of X87 assists due to output value", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_fp_comp_ops_exe[]={ { .uname = "X87", .udesc = "Number of X87 uops executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_PACKED_DOUBLE", .udesc = "Number of SSE double precision FP packed uops executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_SCALAR_SINGLE", .udesc = "Number of SSE single precision FP scalar uops executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_PACKED_SINGLE", .udesc = "Number of SSE single precision FP packed uops executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SCALAR_DOUBLE", .udesc = "Number of SSE double precision FP scalar uops executed", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_hw_interrupts[]={ { .uname = "RECEIVED", .udesc = "Number of hardware interrupts received by the processor", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_hw_pre_req[]={ { .uname = "L1D_MISS", .udesc = "Hardware prefetch requests that misses the L1D cache. A request is counted each time it accesses the cache and misses it, including if a block is applicable or if it hits the full buffer, for example. This accounts for both L1 streamer and IP-based Hw prefetchers", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_icache[]={ { .uname = "MISSES", .udesc = "Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes UC accesses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_idq[]={ { .uname = "EMPTY", .udesc = "Cycles IDQ is empty", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS", .udesc = "Number of uops delivered to IDQ from MITE path", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS", .udesc = "Number of uops delivered to IDQ from DSB path", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by DSB", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS", .udesc = "Number of uops delivered to IDQ when MS busy by MITE", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS", .udesc = "Number of uops were delivered to IDQ from MS by either DSB or MITE", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from MITE (MITE active)", .uequiv = "MITE_UOPS:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS_CYCLES", .udesc = "Cycles where uops are delivered to IDQ from DSB (DSB active)", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by DSB", .uequiv = "MS_DSB_UOPS:c=1", .ucode = 0x1000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ when MS busy by MITE", .uequiv = "MS_MITE_UOPS:c=1", .ucode = 0x2000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS_CYCLES", .udesc = "Cycles where uops delivered to IDQ from MS by either BSD or MITE", .uequiv = "MS_UOPS:c=1", .ucode = 0x3000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_UOPS", .udesc = "Number of uops deliver from either DSB paths", .ucode = 0x1800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DSB_CYCLES", .udesc = "Cycles MITE/MS deliver anything", .ucode = 0x1800 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_UOPS", .udesc = "Number of uops delivered from either MITE paths", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_MITE_CYCLES", .udesc = "Cycles DSB/MS deliver anything", .ucode = 0x2400 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_UOPS", .udesc = "Number of uops delivered to IDQ from any path", .ucode = 0x3c00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS_OCCUR", .udesc = "Occurences of DSB MS going active", .uequiv = "MS_DSB_UOPS:c=1:e=1", .ucode = 0x1000 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_idq_uops_not_delivered[]={ { .uname = "CORE", .udesc = "Number of non-delivered uops to RAT (use cmask to qualify further)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_ild_stall[]={ { .uname = "LCP", .udesc = "Stall caused by changing prefix length of the instruction", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IQ_FULL", .udesc = "Stall cycles due to IQ full", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_insts_written_to_iq[]={ { .uname = "INSTS", .udesc = "Number of instructions written to IQ every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_inst_retired[]={ { .uname = "ANY_P", .udesc = "Number of instructions retired", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PREC_DIST", .udesc = "Precise instruction retired event to reduce effect of PEBS shadow IP distribution (Precise Event)", .ucntmsk = 0x2, .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_int_misc[]={ { .uname = "RAT_STALL_CYCLES", .udesc = "Cycles RAT external stall is sent to IDQ for this thread", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RECOVERY_CYCLES", .udesc = "Cycles waiting to be recovered after Machine Clears due to all other cases except JEClear", .ucode = 0x300 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RECOVERY_STALLS_COUNT", .udesc = "Number of times need to wait after Machine Clears due to all other cases except JEClear", .ucode = 0x300 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_itlb[]={ { .uname = "ITLB_FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FLUSH", .udesc = "Number of ITLB flushes, includes 4k/2M/4M pages", .ucode = 0x100, .uequiv = "ITLB_FLUSH", .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l1d[]={ { .uname = "ALLOCATED_IN_M", .udesc = "Number of allocations of L1D cache lines in modified (M) state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_M_REPLACEMENT", .udesc = "Number of cache lines in M-state evicted of L1D due to snoop HITM or dirty line replacement", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_EVICT", .udesc = "Number of modified lines evicted from L1D due to replacement", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REPLACEMENT", .udesc = "Number of cache lines brought into the L1D cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l1d_blocks[]={ { .uname = "BANK_CONFLICT", .udesc = "Number of dispatched loads cancelled due to L1D bank conflicts with other load ports", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "BANK_CONFLICT_CYCLES", .udesc = "Cycles with l1d blocks due to bank conflicts", .ucode = 0x500, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_l1d_pend_miss[]={ { .uname = "OCCURRENCES", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "PENDING:e=1:c=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "EDGE", .udesc = "Occurrences of L1D_PEND_MISS going active", .uequiv = "OCCURRENCES", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING", .udesc = "Number of L1D load misses outstanding every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PENDING_CYCLES", .udesc = "Cycles with L1D load misses outstanding", .uequiv = "PENDING:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l2_l1d_wb_rqsts[]={ { .uname = "HIT_E", .udesc = "Non rejected writebacks from L1D to L2 cache lines in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "Non rejected writebacks from L1D to L2 cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l2_lines_in[]={ { .uname = "ANY", .udesc = "L2 cache lines filling (counting does not cover rejects)", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "E", .udesc = "L2 cache lines in E state (counting does not cover rejects)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I", .udesc = "L2 cache lines in I state (counting does not cover rejects)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S", .udesc = "L2 cache lines in S state (counting does not cover rejects)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l2_lines_out[]={ { .uname = "DEMAND_CLEAN", .udesc = "L2 clean line evicted by a demand", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "L2 dirty line evicted by a demand", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_CLEAN", .udesc = "L2 clean line evicted by a prefetch", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_DIRTY", .udesc = "L2 dirty line evicted by an MLC Prefetch", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRTY_ANY", .udesc = "Any L2 dirty line evicted (does not cover rejects)", .ucode = 0xa00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l2_rqsts[]={ { .uname = "ALL_CODE_RD", .udesc = "Any ifetch request to L2 cache", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_HIT", .udesc = "L2 cache hits when fetching instructions", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_MISS", .udesc = "L2 cache misses when fetching instructions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_DATA_RD", .udesc = "Demand data read requests to L2 cache", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_RD_HIT", .udesc = "Demand data read requests that hit L2", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "Any L2 HW prefetch request to L2 cache", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_HIT", .udesc = "Requests from the L2 hardware prefetchers that hit L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PF_MISS", .udesc = "Requests from the L2 hardware prefetchers that miss L2 cache", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_ANY", .udesc = "Any RFO requests to L2 cache", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HITS", .udesc = "RFO requests that hit L2 cache", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MISS", .udesc = "RFO requests that miss L2 cache", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l2_store_lock_rqsts[]={ { .uname = "HIT_E", .udesc = "RFOs that hit cache lines in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "RFOs that miss cache (I state)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT_M", .udesc = "RFOs that hit cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "RFOs that access cache lines in any state", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_l2_trans[]={ { .uname = "ALL", .udesc = "Transactions accessing MLC pipe", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_RD", .udesc = "L2 cache accesses when fetching instructions", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writebacks that access L2 cache", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOAD", .udesc = "Demand Data Read* requests that access L2 cache", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_FILL", .udesc = "L2 fill requests that access L2 cache", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L2_WB", .udesc = "L2 writebacks that access L2 cache", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_PREFETCH", .udesc = "L2 or L3 HW prefetches that access L2 cache (including rejects)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "RFO requests that access L2 cache", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_ld_blocks[]={ { .uname = "DATA_UNKNOWN", .udesc = "Blocked loads due to store buffer blocks with unknown data", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORE_FORWARD", .udesc = "Loads blocked by overlapping with store buffer that cannot be forwarded", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NO_SR", .udesc = "Number of split loads blocked due to resource not available", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_BLOCK", .udesc = "Number of cases where any load is blocked but has not DCU miss", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_ld_blocks_partial[]={ { .uname = "ADDRESS_ALIAS", .udesc = "False dependencies in MOB due to partial compare on address", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_STA_BLOCK", .udesc = "Number of times that load operations are temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_load_hit_pre[]={ { .uname = "HW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for HW prefetch", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SW_PF", .udesc = "Non sw-prefetch load dispatches that hit the fill buffer allocated for SW prefetch", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_l3_lat_cache[]={ { .uname = "MISS", .udesc = "Core-originated cacheable demand requests missed L3", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCE", .udesc = "Core-originated cacheable demand requests that refer to L3", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_machine_clears[]={ { .uname = "MASKMOV", .udesc = "The number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MEMORY_ORDERING", .udesc = "Number of Memory Ordering Machine Clears detected", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_mem_load_uops_llc_hit_retired[]={ { .uname = "XSNP_HIT", .udesc = "Load LLC Hit and a cross-core Snoop hits in on-pkg core cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_HITM", .udesc = "Load had HitM Response from a core on same socket (shared LLC) (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_MISS", .udesc = "Load LLC Hit and a cross-core Snoop missed in on-pkg core cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_NONE", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_mem_load_uops_llc_miss_retired[]={ { .uname = "LOCAL_DRAM", .udesc = "Load uops that miss in the L3 and hit local DRAM", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_DRAM", .udesc = "Load uops that miss in the L3 and hit remote DRAM", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_mem_load_uops_misc_retired[]={ { .uname = "LLC_MISS", .udesc = "Counts load driven L3 misses and some non simd split loads (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_mem_load_uops_retired[]={ { .uname = "HIT_LFB", .udesc = "A load missed L1D but hit the Fill Buffer (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1_HIT", .udesc = "Load hit in nearest-level (L1D) cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Load hit in mid-level (L2) cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_HIT", .udesc = "Load hit in last-level (L3) cache with no snoop needed (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Retired load uops which data sources were data missed LLC (excluding unknown data source)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_mem_trans_retired[]={ { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory load instructions retired above programmed clocks, minimum value threshold is 4 (Precise Event required)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT | INTEL_X86_NO_AUTOENCODE, }, { .uname = "PRECISE_STORE", .udesc = "Capture where stores occur, must use with PEBS (Precise Event required)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_mem_uops_retired[]={ { .uname = "ALL_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_LOADS", .udesc = "Any retired loads (Precise Event)", .ucode = 0x8100, .uequiv = "ALL_LOADS", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ANY_STORES", .udesc = "Any retired stores (Precise Event)", .ucode = 0x8200, .uequiv = "ALL_STORES", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_LOADS", .udesc = "Locked retired loads (Precise Event)", .ucode = 0x2100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_STORES", .udesc = "Locked retired stores (Precise Event)", .ucode = 0x2200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_LOADS", .udesc = "Retired loads causing cacheline splits (Precise Event)", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_STORES", .udesc = "Retired stores causing cacheline splits (Precise Event)", .ucode = 0x4200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_LOADS", .udesc = "STLB misses dues to retired loads (Precise Event)", .ucode = 0x1100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_STORES", .udesc = "STLB misses dues to retired stores (Precise Event)", .ucode = 0x1200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_misalign_mem_ref[]={ { .uname = "LOADS", .udesc = "Speculative cache-line split load uops dispatched to the L1D", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORES", .udesc = "Speculative cache-line split Store-address uops dispatched to L1D", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_offcore_requests[]={ { .uname = "ALL_DATA_RD", .udesc = "Demand and prefetch read requests sent to uncore", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_READ", .udesc = "Demand and prefetch read requests sent to uncore", .uequiv = "ALL_DATA_RD", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Offcore code read requests, including cacheable and un-cacheables", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand Data Read requests sent to uncore", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Offcore Demand RFOs, includes regular RFO, Locks, ItoM", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_offcore_requests_buffer[]={ { .uname = "SQ_FULL", .udesc = "Offcore requests buffer cannot take more entries for this thread core", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_offcore_requests_outstanding[]={ { .uname = "ALL_DATA_RD_CYCLES", .udesc = "Cycles with cacheable data read transactions in the superQ", .uequiv = "ALL_DATA_RD:c=1", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD_CYCLES", .udesc = "Cycles with demand code reads transactions in the superQ", .uequiv = "DEMAND_CODE_RD:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_CYCLES", .udesc = "Cycles with demand data read transactions in the superQ", .uequiv = "DEMAND_DATA_RD:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_RD", .udesc = "Cacheable data read transactions in the superQ every cycle", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Code read transactions in the superQ every cycle", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand data read transactions in the superQ every cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Outstanding RFO (store) transactions in the superQ every cycle", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_CYCLES", .udesc = "Cycles with outstanding RFO (store) transactions in the superQ", .uequiv = "DEMAND_RFO:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_other_assists[]={ { .uname = "ITLB_MISS_RETIRED", .udesc = "Number of instructions that experienced an ITLB miss", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "AVX_TO_SSE", .udesc = "Number of transitions from AVX-256 to legacy SSE when penalty applicable", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_TO_AVX", .udesc = "Number of transitions from legacy SSE to AVX-256 when penalty applicable", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_partial_rat_stalls[]={ { .uname = "FLAGS_MERGE_UOP", .udesc = "Number of flags-merge uops in flight in each cycle", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES_FLAGS_MERGE_UOP", .udesc = "Cycles in which flags-merge uops in flight", .uequiv = "FLAGS_MERGE_UOP:c=1", .ucode = 0x2000 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MUL_SINGLE_UOP", .udesc = "Number of Multiply packed/scalar single precision uops allocated", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SLOW_LEA_WINDOW", .udesc = "Number of cycles with at least one slow LEA uop allocated", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_resource_stalls[]={ { .uname = "ANY", .udesc = "Cycles stalled due to Resource Related reason", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "LB", .udesc = "Cycles stalled due to lack of load buffers", .ucode = 0x200, }, { .uname = "RS", .udesc = "Cycles stalled due to no eligible RS entry available", .ucode = 0x400, }, { .uname = "SB", .udesc = "Cycles stalled due to no store buffers available (not including draining from sync)", .ucode = 0x800, }, { .uname = "ROB", .udesc = "Cycles stalled due to re-order buffer full", .ucode = 0x1000, }, { .uname = "FCSW", .udesc = "Cycles stalled due to writing the FPU control word", .ucode = 0x2000, }, { .uname = "MXCSR", .udesc = "Cycles stalled due to the MXCSR register ranme occurring too close to a previous MXCSR rename", .ucode = 0x4000, }, { .uname = "MEM_RS", .udesc = "Cycles stalled due to LB, SB or RS being completely in use", .ucode = 0xe00, .uequiv = "LB:SB:RS", }, }; static const intel_x86_umask_t snbep_resource_stalls2[]={ { .uname = "ALL_FL_EMPTY", .udesc = "Cycles stalled due to free list empty", .ucode = 0xc00, }, { .uname = "ALL_PRF_CONTROL", .udesc = "Cycles stalls due to control structures full for physical registers", .ucode = 0xf00, }, { .uname = "ANY_PRF_CONTROL", .udesc = "Cycles stalls due to control structures full for physical registers", .ucode = 0xf00, .uequiv = "ALL_PRF_CONTROL", }, { .uname = "BOB_FULL", .udesc = "Cycles Allocator is stalled due Branch Order Buffer", .ucode = 0x4000, }, { .uname = "OOO_RSRC", .udesc = "Cycles stalled due to out of order resources full", .ucode = 0x4f00, }, }; static const intel_x86_umask_t snbep_rob_misc_events[]={ { .uname = "LBR_INSERTS", .udesc = "Count each time an new LBR record is saved by HW", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_rs_events[]={ { .uname = "EMPTY_CYCLES", .udesc = "Cycles the RS is empty for this thread", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_simd_fp_256[]={ { .uname = "PACKED_SINGLE", .udesc = "Counts 256-bit packed single-precision", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_DOUBLE", .udesc = "Counts 256-bit packed double-precision", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_sq_misc[]={ { .uname = "SPLIT_LOCK", .udesc = "Split locks in SQ", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_tlb_flush[]={ { .uname = "DTLB_THREAD", .udesc = "Number of DTLB flushes of thread-specific entries", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_ANY", .udesc = "Number of STLB flushes", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_uops_dispatched[]={ { .uname = "CORE", .udesc = "Counts total number of uops dispatched from any thread", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Counts number of cycles no uops were dispatched on this thread", .uequiv = "THREAD:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "THREAD", .udesc = "Counts total number of uops to be dispatched per-thread each cycle", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_uops_dispatched_port[]={ { .uname = "PORT_0", .udesc = "Cycles which a Uop is dispatched on port 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_1", .udesc = "Cycles which a Uop is dispatched on port 1", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2_LD", .udesc = "Cycles in which a load uop is dispatched on port 2", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2_STA", .udesc = "Cycles in which a store uop is dispatched on port 2", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_2", .udesc = "Cycles in which a uop is dispatched on port 2", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3_LD", .udesc = "Cycles in which a load uop is disptached on port 3", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3_STA", .udesc = "Cycles in which a store uop is disptached on port 3", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_3", .udesc = "Cycles in which a uop is disptached on port 3", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_4", .udesc = "Cycles which a uop is dispatched on port 4", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT_5", .udesc = "Cycles which a Uop is dispatched on port 5", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_uops_issued[]={ { .uname = "ANY", .udesc = "Number of uops issued by the RAT to the Reservation Station (RS)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles no uops issued on this core (by any thread)", .uequiv = "ANY:c=1:i=1:t=1", .ucode = 0x100 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no uops issued by this thread", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_uops_retired[]={ { .uname = "ALL", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "All uops that actually retired (Precise Event)", .ucode = 0x100, .uequiv= "ALL", .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "RETIRE_SLOTS", .udesc = "Number of retirement slots used (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no executable uop retired (Precise Event)", .uequiv = "ALL:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Total cycles using precise uop retired event (Precise Event)", .uequiv = "ALL:c=16", .ucode = 0x100 | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t snbep_offcore_response[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 1ULL << (0 + 8), .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO prefetches", .ucode = 1ULL << (1 + 8), .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 1ULL << (2 + 8), .grpid = 0, }, { .uname = "WB", .udesc = "Request: number of writebacks (modified to exclusive) transactions", .ucode = 1ULL << (3 + 8), .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: number of data cacheline reads generated by L2 prefetchers", .ucode = 1ULL << (4 + 8), .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetchers", .ucode = 1ULL << (5 + 8), .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: number of code reads generated by L2 prefetchers", .ucode = 1ULL << (6 + 8), .grpid = 0, }, { .uname = "PF_LLC_DATA_RD", .udesc = "Request: number of L3 prefetcher requests to L2 for loads", .ucode = 1ULL << (7 + 8), .grpid = 0, }, { .uname = "PF_LLC_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetcher", .ucode = 1ULL << (8 + 8), .grpid = 0, }, { .uname = "PF_LLC_IFETCH", .udesc = "Request: number of L2 prefetcher requests to L3 for instruction fetches", .ucode = 1ULL << (9 + 8), .grpid = 0, }, { .uname = "BUS_LOCKS", .udesc = "Request: number bus lock and split lock requests", .ucode = 1ULL << (10 + 8), .grpid = 0, }, { .uname = "STRM_ST", .udesc = "Request: number of streaming store requests", .ucode = 1ULL << (11 + 8), .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 1ULL << (15+8), .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH | PF_LLC_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH:PF_LLC_IFETCH", .ucode = 0x24400, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all request umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER", .ucode = 0x8fff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: combination of DMND_DATA | PF_DATA_RD | PF_LLC_DATA_RD", .uequiv = "DMND_DATA_RD:PF_DATA_RD:PF_LLC_DATA_RD", .ucode = 0x9100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO | PF_LLC_RFO", .uequiv = "DMND_RFO:PF_RFO:PF_LLC_RFO", .ucode = 0x12200, .grpid = 0, }, { .uname = "ANY_RESPONSE", .udesc = "Response: count any response type", .ucode = 1ULL << (16+8), .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_EXCL_GRP_GT, .grpid = 1, }, { .uname = "NO_SUPP", .udesc = "Supplier: counts number of times supplier information is not available", .ucode = 1ULL << (17+8), .grpid = 1, }, { .uname = "LLC_HITM", .udesc = "Supplier: counts L3 hits in M-state (initial lookup)", .ucode = 1ULL << (18+8), .grpid = 1, }, { .uname = "LLC_HITE", .udesc = "Supplier: counts L3 hits in E-state", .ucode = 1ULL << (19+8), .grpid = 1, }, { .uname = "LLC_HITS", .udesc = "Supplier: counts L3 hits in S-state", .ucode = 1ULL << (20+8), .grpid = 1, }, { .uname = "LLC_HITF", .udesc = "Supplier: counts L3 hits in F-state", .ucode = 1ULL << (21+8), .grpid = 1, }, { .uname = "LLC_MISS_LOCAL", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .uequiv = "LLC_MISS_LOCAL_DRAM", .grpid = 1, }, { .uname = "LLC_MISS_LOCAL_DRAM", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .grpid = 1, }, { .uname = "LLC_MISS_REMOTE", .udesc = "Supplier: counts L3 misses to remote DRAM", .ucode = 0xffULL << (23+8), .uequiv = "LLC_MISS_REMOTE_DRAM", .grpid = 1, }, { .uname = "LLC_MISS_REMOTE_DRAM", .udesc = "Supplier: counts L3 misses to remote DRAM", .ucode = 0xffULL << (23+8), .grpid = 1, }, { .uname = "LLC_HITMESF", .udesc = "Supplier: counts L3 hits in any state (M, E, S, F)", .ucode = 0xfULL << (18+8), .uequiv = "LLC_HITM:LLC_HITE:LLC_HITS:LLC_HITF", .grpid = 1, }, { .uname = "SNP_NONE", .udesc = "Snoop: counts number of times no snoop-related information is available", .ucode = 1ULL << (31+8), .grpid = 2, }, { .uname = "SNP_NOT_NEEDED", .udesc = "Snoop: counts the number of times no snoop was needed to satisfy the request", .ucode = 1ULL << (32+8), .grpid = 2, }, { .uname = "SNP_MISS", .udesc = "Snoop: counts number of times a snoop was needed and it missed all snooped caches", .ucode = 1ULL << (33+8), .grpid = 2, }, { .uname = "SNP_NO_FWD", .udesc = "Snoop: counts number of times a snoop was needed and it hit in at leas one snooped cache", .ucode = 1ULL << (34+8), .grpid = 2, }, { .uname = "SNP_FWD", .udesc = "Snoop: counts number of times a snoop was needed and data was forwarded from a remote socket", .ucode = 1ULL << (35+8), .grpid = 2, }, { .uname = "HITM", .udesc = "Snoop: counts number of times a snoop was needed and it hitM-ed in local or remote cache", .ucode = 1ULL << (36+8), .grpid = 2, }, { .uname = "NON_DRAM", .udesc = "Snoop: counts number of times target was a non-DRAM system address. This includes MMIO transactions", .ucode = 1ULL << (37+8), .grpid = 2, }, { .uname = "SNP_ANY", .udesc = "Snoop: any snoop reason", .ucode = 0x7fULL << (31+8), .uequiv = "SNP_NONE:SNP_NOT_NEEDED:SNP_MISS:SNP_NO_FWD:SNP_FWD:HITM:NON_DRAM", .uflags= INTEL_X86_DFL, .grpid = 2, }, }; static const intel_x86_umask_t snbep_baclears[]={ { .uname = "ANY", .udesc = "Counts the number of times the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_cycle_activity[]={ { .uname = "CYCLES_L2_PENDING", .udesc = "Cycles with pending L2 miss loads", .ucode = 0x0100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, .ucntmsk= 0xf, }, { .uname = "CYCLES_L1D_PENDING", .udesc = "Cycles with pending L1D load cache misses", .ucode = 0x0200 | (0x2 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CYCLES_NO_DISPATCH", .udesc = "Cycles of dispatch stalls", .ucode = 0x0400 | (0x4 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L2_PENDING", .udesc = "Execution stalls due to L2 pending loads", .ucode = 0x0500 | (0x5 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L1D_PENDING", .udesc = "Execution stalls due to L1D pending loads", .ucode = 0x0600 | (0x6 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_pe[]={ { .name = "AGU_BYPASS_CANCEL", .desc = "Number of executed load operations with all the following traits: 1. addressing of the format [base + offset], 2. the offset is between 1 and 2047, 3. the address specified in the base register is in one page and the address [base+offset] is in another page", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb6, .numasks = LIBPFM_ARRAY_SIZE(snbep_agu_bypass_cancel), .ngrp = 1, .umasks = snbep_agu_bypass_cancel, }, { .name = "ARITH", .desc = "Counts arithmetic multiply operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x14, .numasks = LIBPFM_ARRAY_SIZE(snbep_arith), .ngrp = 1, .umasks = snbep_arith, }, { .name = "BACLEARS", .desc = "Branch resteered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(snbep_baclears), .ngrp = 1, .umasks = snbep_baclears, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x88, .numasks = LIBPFM_ARRAY_SIZE(snbep_br_inst_exec), .ngrp = 1, .umasks = snbep_br_inst_exec, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_br_inst_retired), .ngrp = 1, .umasks = snbep_br_inst_retired, }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x89, .numasks = LIBPFM_ARRAY_SIZE(snbep_br_misp_exec), .ngrp = 1, .umasks = snbep_br_misp_exec, }, { .name = "BR_MISP_RETIRED", .desc = "Mispredicted retired branches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc5, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_br_misp_retired), .ngrp = 1, .umasks = snbep_br_misp_retired, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_MISP_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc5, }, { .name = "LOCK_CYCLES", .desc = "Locked cycles in L1D and L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(snbep_lock_cycles), .ngrp = 1, .umasks = snbep_lock_cycles, }, { .name = "CPL_CYCLES", .desc = "Unhalted core cycles at a specific ring level", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5c, .numasks = LIBPFM_ARRAY_SIZE(snbep_cpl_cycles), .ngrp = 1, .umasks = snbep_cpl_cycles, }, { .name = "CPU_CLK_UNHALTED", .desc = "Cycles when processor is not in halted state", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(snbep_cpu_clk_unhalted), .ngrp = 1, .umasks = snbep_cpu_clk_unhalted, }, { .name = "DSB2MITE_SWITCHES", .desc = "Number of DSB to MITE switches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xab, .numasks = LIBPFM_ARRAY_SIZE(snbep_dsb2mite_switches), .ngrp = 1, .umasks = snbep_dsb2mite_switches, }, { .name = "DSB_FILL", .desc = "DSB fills", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xac, .numasks = LIBPFM_ARRAY_SIZE(snbep_dsb_fill), .ngrp = 1, .umasks = snbep_dsb_fill, }, { .name = "DTLB_LOAD_MISSES", .desc = "Data TLB load misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(snbep_dtlb_load_misses), .ngrp = 1, .umasks = snbep_dtlb_load_misses, }, { .name = "DTLB_STORE_MISSES", .desc = "Data TLB store misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(snbep_dtlb_store_misses), .ngrp = 1, .umasks = snbep_dtlb_store_misses, }, { .name = "FP_ASSIST", .desc = "X87 Floating point assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xca, .numasks = LIBPFM_ARRAY_SIZE(snbep_fp_assist), .ngrp = 1, .umasks = snbep_fp_assist, }, { .name = "FP_COMP_OPS_EXE", .desc = "Counts number of floating point events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(snbep_fp_comp_ops_exe), .ngrp = 1, .umasks = snbep_fp_comp_ops_exe, }, { .name = "HW_INTERRUPTS", .desc = "Number of hardware interrupts received by the processor", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(snbep_hw_interrupts), .ngrp = 1, .umasks = snbep_hw_interrupts, }, { .name = "HW_PRE_REQ", .desc = "Hardware prefetch requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x4e, .numasks = LIBPFM_ARRAY_SIZE(snbep_hw_pre_req), .ngrp = 1, .umasks = snbep_hw_pre_req, }, { .name = "ICACHE", .desc = "Instruction Cache accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(snbep_icache), .ngrp = 1, .umasks = snbep_icache, }, { .name = "IDQ", .desc = "IDQ operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x79, .numasks = LIBPFM_ARRAY_SIZE(snbep_idq), .ngrp = 1, .umasks = snbep_idq, }, { .name = "IDQ_UOPS_NOT_DELIVERED", .desc = "Uops not delivered", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x9c, .numasks = LIBPFM_ARRAY_SIZE(snbep_idq_uops_not_delivered), .ngrp = 1, .umasks = snbep_idq_uops_not_delivered, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(snbep_ild_stall), .ngrp = 1, .umasks = snbep_ild_stall, }, { .name = "INSTS_WRITTEN_TO_IQ", .desc = "Instructions written to IQ", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x17, .numasks = LIBPFM_ARRAY_SIZE(snbep_insts_written_to_iq), .ngrp = 1, .umasks = snbep_insts_written_to_iq, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_inst_retired), .ngrp = 1, .umasks = snbep_inst_retired, }, { .name = "INSTRUCTION_RETIRED", .desc = "Number of instructions at retirement", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INT_MISC", .desc = "Miscellaneous internals", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd, .numasks = LIBPFM_ARRAY_SIZE(snbep_int_misc), .ngrp = 1, .umasks = snbep_int_misc, }, { .name = "ITLB", .desc = "Instruction TLB", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xae, .numasks = LIBPFM_ARRAY_SIZE(snbep_itlb), .ngrp = 1, .umasks = snbep_itlb, }, { .name = "ITLB_MISSES", .desc = "Instruction TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(snbep_dtlb_store_misses), .ngrp = 1, .umasks = snbep_dtlb_store_misses, /* identical to actual umasks list for this event */ }, { .name = "L1D", .desc = "L1D cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x51, .numasks = LIBPFM_ARRAY_SIZE(snbep_l1d), .ngrp = 1, .umasks = snbep_l1d, }, { .name = "L1D_BLOCKS", .desc = "L1D is blocking", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xbf, .numasks = LIBPFM_ARRAY_SIZE(snbep_l1d_blocks), .ngrp = 1, .umasks = snbep_l1d_blocks, }, { .name = "L1D_PEND_MISS", .desc = "L1D pending misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x4, .code = 0x48, .numasks = LIBPFM_ARRAY_SIZE(snbep_l1d_pend_miss), .ngrp = 1, .umasks = snbep_l1d_pend_miss, }, { .name = "L2_L1D_WB_RQSTS", .desc = "Writeback requests from L1D to L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_l1d_wb_rqsts), .ngrp = 1, .umasks = snbep_l2_l1d_wb_rqsts, }, { .name = "L2_LINES_IN", .desc = "L2 lines alloacated", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf1, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_lines_in), .ngrp = 1, .umasks = snbep_l2_lines_in, }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf2, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_lines_out), .ngrp = 1, .umasks = snbep_l2_lines_out, }, { .name = "L2_RQSTS", .desc = "L2 requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_rqsts), .ngrp = 1, .umasks = snbep_l2_rqsts, }, { .name = "L2_STORE_LOCK_RQSTS", .desc = "L2 store lock requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_store_lock_rqsts), .ngrp = 1, .umasks = snbep_l2_store_lock_rqsts, }, { .name = "L2_TRANS", .desc = "L2 transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf0, .numasks = LIBPFM_ARRAY_SIZE(snbep_l2_trans), .ngrp = 1, .umasks = snbep_l2_trans, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for L3_LAT_CACHE:MISS", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:MISS", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LLC_MISSES", .desc = "Alias for LAST_LEVEL_CACHE_MISSES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_MISSES", .cntmsk = 0xff, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for L3_LAT_CACHE:REFERENCE", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:REFERENCE", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LLC_REFERENCES", .desc = "Alias for LAST_LEVEL_CACHE_REFERENCES", .modmsk = INTEL_V3_ATTRS, .equiv = "LAST_LEVEL_CACHE_REFERENCES", .cntmsk = 0xff, .code = 0x4f2e, }, { .name = "LD_BLOCKS", .desc = "Blocking loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(snbep_ld_blocks), .ngrp = 1, .umasks = snbep_ld_blocks, }, { .name = "LD_BLOCKS_PARTIAL", .desc = "Partial load blocks", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(snbep_ld_blocks_partial), .ngrp = 1, .umasks = snbep_ld_blocks_partial, }, { .name = "LOAD_HIT_PRE", .desc = "Load dispatches that hit fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(snbep_load_hit_pre), .ngrp = 1, .umasks = snbep_load_hit_pre, }, { .name = "L3_LAT_CACHE", .desc = "Core-originated cacheable demand requests to L3", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(snbep_l3_lat_cache), .ngrp = 1, .umasks = snbep_l3_lat_cache, }, { .name = "MACHINE_CLEARS", .desc = "Machine clear asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(snbep_machine_clears), .ngrp = 1, .umasks = snbep_machine_clears, }, { .name = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .desc = "Load uops retired which hit the L3 cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_LLC_HIT_RETIRED", .desc = "Load uops retired which hit the L3 cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd2, .equiv = "MEM_LOAD_UOPS_LLC_HIT_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_llc_hit_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_llc_hit_retired, }, { .name = "MEM_LOAD_UOPS_LLC_MISS_RETIRED", .desc = "Load uops retired which miss the L3 cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd3, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_llc_miss_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_llc_miss_retired, }, { .name = "MEM_LOAD_UOPS_MISC_RETIRED", .desc = "Loads and some non simd split loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_misc_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_misc_retired, }, { .name = "MEM_LOAD_MISC_RETIRED", .desc = "Loads and some non simd split loads uops retired (deprecated use MEM_LOAD_UOPS_MISC_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd4, .equiv = "MEM_LOAD_UOPS_MISC_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_misc_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_misc_retired, }, { .name = "MEM_LOAD_UOPS_RETIRED", .desc = "Memory loads uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_retired, }, { .name = "MEM_LOAD_RETIRED", .desc = "Memory loads uops retired (deprecated use MEM_LOAD_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd1, .equiv = "MEM_LOAD_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_load_uops_retired), .ngrp = 1, .umasks = snbep_mem_load_uops_retired, }, { .name = "MEM_TRANS_RETIRED", .desc = "Memory transactions retired", .modmsk = INTEL_V3_ATTRS | _INTEL_X86_ATTR_LDLAT, .cntmsk = 0x8, .code = 0xcd, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_trans_retired), .ngrp = 1, .umasks = snbep_mem_trans_retired, }, { .name = "MEM_UOPS_RETIRED", .desc = "Memory uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_uops_retired), .ngrp = 1, .umasks = snbep_mem_uops_retired, }, { .name = "MEM_UOP_RETIRED", .desc = "Memory uops retired (deprecated use MEM_UOPS_RETIRED)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xd0, .equiv = "MEM_UOPS_RETIRED", .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_mem_uops_retired), .ngrp = 1, .umasks = snbep_mem_uops_retired, }, { .name = "MISALIGN_MEM_REF", .desc = "Misaligned memory references", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(snbep_misalign_mem_ref), .ngrp = 1, .umasks = snbep_misalign_mem_ref, }, { .name = "OFFCORE_REQUESTS", .desc = "Offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(snbep_offcore_requests), .ngrp = 1, .umasks = snbep_offcore_requests, }, { .name = "OFFCORE_REQUESTS_BUFFER", .desc = "Offcore requests buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb2, .numasks = LIBPFM_ARRAY_SIZE(snbep_offcore_requests_buffer), .ngrp = 1, .umasks = snbep_offcore_requests_buffer, }, { .name = "OFFCORE_REQUESTS_OUTSTANDING", .desc = "Outstanding offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(snbep_offcore_requests_outstanding), .ngrp = 1, .umasks = snbep_offcore_requests_outstanding, }, { .name = "OTHER_ASSISTS", .desc = "Count hardware assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc1, .numasks = LIBPFM_ARRAY_SIZE(snbep_other_assists), .ngrp = 1, .umasks = snbep_other_assists, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Partial Register Allocation Table stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x59, .numasks = LIBPFM_ARRAY_SIZE(snbep_partial_rat_stalls), .ngrp = 1, .umasks = snbep_partial_rat_stalls, }, { .name = "RESOURCE_STALLS", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa2, .numasks = LIBPFM_ARRAY_SIZE(snbep_resource_stalls), .ngrp = 1, .umasks = snbep_resource_stalls, }, { .name = "RESOURCE_STALLS2", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5b, .numasks = LIBPFM_ARRAY_SIZE(snbep_resource_stalls2), .ngrp = 1, .umasks = snbep_resource_stalls2, }, { .name = "ROB_MISC_EVENTS", .desc = "Reorder buffer events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(snbep_rob_misc_events), .ngrp = 1, .umasks = snbep_rob_misc_events, }, { .name = "RS_EVENTS", .desc = "Reservation station events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x5e, .numasks = LIBPFM_ARRAY_SIZE(snbep_rs_events), .ngrp = 1, .umasks = snbep_rs_events, }, { .name = "SIMD_FP_256", .desc = "Counts 256-bit packed floating point instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0x11, .numasks = LIBPFM_ARRAY_SIZE(snbep_simd_fp_256), .ngrp = 1, .umasks = snbep_simd_fp_256, }, { .name = "SQ_MISC", .desc = "SuperQ events", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xf4, .numasks = LIBPFM_ARRAY_SIZE(snbep_sq_misc), .ngrp = 1, .umasks = snbep_sq_misc, }, { .name = "TLB_FLUSH", .desc = "TLB flushes", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xbd, .numasks = LIBPFM_ARRAY_SIZE(snbep_tlb_flush), .ngrp = 1, .umasks = snbep_tlb_flush, }, { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "UOPS_DISPATCHED", .desc = "Uops dispatched", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(snbep_uops_dispatched), .ngrp = 1, .umasks = snbep_uops_dispatched, }, { .name = "UOPS_DISPATCHED_PORT", .desc = "Uops dispatch to specific ports", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xa1, .numasks = LIBPFM_ARRAY_SIZE(snbep_uops_dispatched_port), .ngrp = 1, .umasks = snbep_uops_dispatched_port, }, { .name = "UOPS_ISSUED", .desc = "Uops issued", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xe, .numasks = LIBPFM_ARRAY_SIZE(snbep_uops_issued), .ngrp = 1, .umasks = snbep_uops_issued, }, { .name = "UOPS_RETIRED", .desc = "Uops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xff, .code = 0xc2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snbep_uops_retired), .ngrp = 1, .umasks = snbep_uops_retired, }, { .name = "CYCLE_ACTIVITY", .desc = "Stalled cycles", .modmsk = INTEL_V3_ATTRS & ~_INTEL_X86_ATTR_C, .cntmsk = 0xff, .code = 0xa3, .numasks = LIBPFM_ARRAY_SIZE(snbep_cycle_activity), .ngrp = 1, .umasks = snbep_cycle_activity, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(snbep_offcore_response), .ngrp = 3, .umasks = snbep_offcore_response, }, { .name = "OFFCORE_RESPONSE_1", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1bb, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(snbep_offcore_response), .ngrp = 3, .umasks = snbep_offcore_response, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/intel_pm_events.h0000600003276200002170000007335712247131123021717 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: pm (Intel Pentium M) */ static const intel_x86_umask_t pm_l2_ifetch[]={ { .uname = "I", .udesc = "Invalid state", .ucode = 0x100, }, { .uname = "S", .udesc = "Shared state", .ucode = 0x200, }, { .uname = "E", .udesc = "Exclusive state", .ucode = 0x400, }, { .uname = "M", .udesc = "Modified state", .ucode = 0x800, }, }; static const intel_x86_umask_t pm_bus_drdy_clocks[]={ { .uname = "SELF", .udesc = "Clocks when processor is driving bus", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "Clocks when any agent is driving bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pm_mmx_instr_type_exec[]={ { .uname = "MUL", .udesc = "MMX packed multiply instructions executed", .ucode = 0x100, }, { .uname = "SHIFT", .udesc = "MMX packed shift instructions executed", .ucode = 0x200, }, { .uname = "PACK", .udesc = "MMX pack operation instructions executed", .ucode = 0x400, }, { .uname = "UNPACK", .udesc = "MMX unpack operation instructions executed", .ucode = 0x800, }, { .uname = "LOGICAL", .udesc = "MMX packed logical instructions executed", .ucode = 0x1000, }, { .uname = "ARITH", .udesc = "MMX packed arithmetic instructions executed", .ucode = 0x2000, }, }; static const intel_x86_umask_t pm_fp_mmx_trans[]={ { .uname = "TO_FP", .udesc = "From MMX instructions to floating-point instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TO_MMX", .udesc = "From floating-point instructions to MMX instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pm_seg_rename_stalls[]={ { .uname = "ES", .udesc = "Segment register ES", .ucode = 0x100, }, { .uname = "DS", .udesc = "Segment register DS", .ucode = 0x200, }, { .uname = "FS", .udesc = "Segment register FS", .ucode = 0x400, }, { .uname = "GS", .udesc = "Segment register GS", .ucode = 0x800, }, }; static const intel_x86_umask_t pm_emon_kni_pref_dispatched[]={ { .uname = "NTA", .udesc = "Prefetch NTA", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T1", .udesc = "Prefetch T1", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T2", .udesc = "Prefetch T2", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WEAK", .udesc = "Weakly ordered stores", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pm_emon_est_trans[]={ { .uname = "ALL", .udesc = "All transitions", .ucode = 0x0, }, { .uname = "FREQ", .udesc = "Only frequency transitions", .ucode = 0x200, }, }; static const intel_x86_umask_t pm_emon_fused_uops_ret[]={ { .uname = "ALL", .udesc = "All fused micro-ops", .ucode = 0x0, }, { .uname = "LD_OP", .udesc = "Only load+Op micro-ops", .ucode = 0x100, }, { .uname = "STD_STA", .udesc = "Only std+sta micro-ops", .ucode = 0x200, }, }; static const intel_x86_umask_t pm_emon_sse_sse2_inst_retired[]={ { .uname = "SSE_PACKED_SCALAR_SINGLE", .udesc = "SSE Packed Single and Scalar Single", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SCALAR_SINGLE", .udesc = "SSE Scalar Single", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE2_PACKED_DOUBLE", .udesc = "SSE2 Packed Double", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE2_SCALAR_DOUBLE", .udesc = "SSE2 Scalar Double", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pm_l2_ld[]={ { .uname = "I", .udesc = "Invalid state", .ucode = 0x100, }, { .uname = "S", .udesc = "Shared state", .ucode = 0x200, }, { .uname = "E", .udesc = "Exclusive state", .ucode = 0x400, }, { .uname = "M", .udesc = "Modified state", .ucode = 0x800, }, { .uname = "EXCL_HW_PREFETCH", .udesc = "Exclude hardware prefetched lines", .ucode = 0x0, }, { .uname = "ONLY_HW_PREFETCH", .udesc = "Only hardware prefetched lines", .ucode = 0x1000, }, { .uname = "NON_HW_PREFETCH", .udesc = "Non hardware prefetched lines", .ucode = 0x2000, }, }; static const intel_x86_entry_t intel_pm_pe[]={ { .name = "CPU_CLK_UNHALTED", .desc = "Number cycles during which the processor is not halted and not in a thermal trip", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x79, }, { .name = "INST_RETIRED", .desc = "Number of instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, }, { .name = "DATA_MEM_REFS", .desc = "All loads from any memory type. All stores to any memory typeEach part of a split is counted separately. The internal logic counts not only memory loads and stores but also internal retries. 80-bit floating point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performe, is only counted once). Does ot include I/O accesses or other non-memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x43, }, { .name = "DCU_LINES_IN", .desc = "Total lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x45, }, { .name = "DCU_M_LINES_IN", .desc = "Number of M state lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "DCU_M_LINES_OUT", .desc = "Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "DCU_MISS_OUTSTANDING", .desc = "Weighted number of cycle while a DCU miss is outstanding, incremented by the number of cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded Read-for-ownerships are counted, as well as line fills, invalidates, and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "IFU_IFETCH", .desc = "Number of instruction fetches, both cacheable and noncacheable including UC fetches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "IFU_IFETCH_MISS", .desc = "Number of instruction fetch misses. All instructions fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB_MISS", .desc = "Number of ITLB misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x85, }, { .name = "IFU_MEM_STALL", .desc = "Number of cycles instruction fetch is stalled for any reason. Includs IFU cache misses, ITLB misses, ITLB faults, and other minor stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Number of cycles that the instruction length decoder is stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "L2_IFETCH", .desc = "Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches: it does not include UC instruction fetches It does not include ITLB miss accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ifetch), .ngrp = 1, .umasks = pm_l2_ifetch, }, { .name = "L2_ST", .desc = "Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indictes that the DCU sent a read-for ownership request to the L2. It also includes Invalid to Modified reqyests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ifetch), .ngrp = 1, .umasks = pm_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_INM", .desc = "Number of modified lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, }, { .name = "L2_RQSTS", .desc = "Total number of L2 requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ifetch), .ngrp = 1, .umasks = pm_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_ADS", .desc = "Number of L2 address strobes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, }, { .name = "L2_DBUS_BUSY", .desc = "Number of cycles during which the L2 cache data bus was busy", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x22, }, { .name = "L2_DBUS_BUSY_RD", .desc = "Number of cycles during which the data bus was busy transferring read data from L2 to the processor", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, }, { .name = "BUS_DRDY_CLOCKS", .desc = "Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, }, { .name = "BUS_LOCK_CLOCKS", .desc = "Number of clocks during which LOCK# is asserted on the external system bus", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_REQ_OUTSTANDING", .desc = "Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, }, { .name = "BUS_TRANS_BRD", .desc = "Number of burst read transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "Number of completed read for ownership transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Number of completed write back transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_IFETCH", .desc = "Number of completed instruction fetch transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_INVAL", .desc = "Number of completed invalidate transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_PWR", .desc = "Number of completed partial write transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Number of completed partial transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "Number of completed I/O transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_DEF", .desc = "Number of completed deferred transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_BURST", .desc = "Number of completed burst transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_ANY", .desc = "Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_MEM", .desc = "Number of completed memory transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(pm_bus_drdy_clocks), .ngrp = 1, .umasks = pm_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RECV", .desc = "Number of bus clock cycles during which this processor is receiving data", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x64, }, { .name = "BUS_BNR_DRV", .desc = "Number of bus clock cycles during which this processor is driving the BNR# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, }, { .name = "BUS_HIT_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HIT# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7a, }, { .name = "BUS_HITM_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HITM# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7b, }, { .name = "BUS_SNOOP_STALL", .desc = "Number of clock cycles during which the bus is snoop stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, }, { .name = "FLOPS", .desc = "Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating point loads and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xc1, }, { .name = "FP_COMP_OPS_EXE", .desc = "Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "Number of floating-point exception cases handled by microcode.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Number of multiplies.This count includes integer as well as FP multiplies and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Number of divides.This count includes integer as well as FP divides and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Number of cycles during which the divider is busy, and cannot accept new divides. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "LD_BLOCKS", .desc = "Number of load operations delayed due to store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, }, { .name = "SB_DRAINS", .desc = "Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, }, { .name = "MISALIGN_MEM_REF", .desc = "Number of misaligned data memory references. Incremented by 1 every cycle during which, either the processor's load or store pipeline dispatches a misaligned micro-op Counting is performed if it is the first or second half or if it is blocked, squashed, or missed. In this context, misaligned means crossing a 64-bit boundary", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x5, }, { .name = "UOPS_RETIRED", .desc = "Number of micro-ops retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, }, { .name = "INST_DECODED", .desc = "Number of instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd0, }, { .name = "HW_INT_RX", .desc = "Number of hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "CYCLES_INT_MASKED", .desc = "Number of processor cycles for which interrupts are disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc6, }, { .name = "CYCLES_INT_PENDING_AND_MASKED", .desc = "Number of processor cycles for which interrupts are disabled and interrupts are pending.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, }, { .name = "BR_INST_RETIRED", .desc = "Number of branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, }, { .name = "BR_MISS_PRED_RETIRED", .desc = "Number of mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, }, { .name = "BR_TAKEN_RETIRED", .desc = "Number of taken branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "BR_MISS_PRED_TAKEN_RET", .desc = "Number of taken mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, }, { .name = "BR_INST_DECODED", .desc = "Number of branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BTB_MISSES", .desc = "Number of branches for which the BTB did not produce a prediction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe2, }, { .name = "BR_BOGUS", .desc = "Number of bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "RESOURCE_STALLS", .desc = "Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa2, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Number of cycles or events for partial stalls. This includes flag partial stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd2, }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "MMX_SAT_INSTR_EXEC", .desc = "Number of MMX saturating instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb1, }, { .name = "MMX_UOPS_EXEC", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb2, }, { .name = "MMX_INSTR_TYPE_EXEC", .desc = "Number of MMX instructions executed by type", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(pm_mmx_instr_type_exec), .ngrp = 1, .umasks = pm_mmx_instr_type_exec, }, { .name = "FP_MMX_TRANS", .desc = "Number of MMX transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(pm_fp_mmx_trans), .ngrp = 1, .umasks = pm_fp_mmx_trans, }, { .name = "MMX_ASSIST", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "SEG_RENAME_STALLS", .desc = "Number of Segment Register Renaming Stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd4, .numasks = LIBPFM_ARRAY_SIZE(pm_seg_rename_stalls), .ngrp = 1, .umasks = pm_seg_rename_stalls, }, { .name = "SEG_REG_RENAMES", .desc = "Number of Segment Register Renames", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd5, .numasks = LIBPFM_ARRAY_SIZE(pm_seg_rename_stalls), .ngrp = 1, .umasks = pm_seg_rename_stalls, /* identical to actual umasks list for this event */ }, { .name = "RET_SEG_RENAMES", .desc = "Number of segment register rename events retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd6, }, { .name = "EMON_KNI_PREF_DISPATCHED", .desc = "Number of Streaming SIMD extensions prefetch/weakly-ordered instructions dispatched (speculative prefetches are included in counting). Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_kni_pref_dispatched), .ngrp = 1, .umasks = pm_emon_kni_pref_dispatched, }, { .name = "EMON_KNI_PREF_MISS", .desc = "Number of prefetch/weakly-ordered instructions that miss all caches. Pentium III and later", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_kni_pref_dispatched), .ngrp = 1, .umasks = pm_emon_kni_pref_dispatched, /* identical to actual umasks list for this event */ }, { .name = "EMON_EST_TRANS", .desc = "Number of Enhanced Intel SpeedStep technology transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x58, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_est_trans), .ngrp = 1, .umasks = pm_emon_est_trans, }, { .name = "EMON_THERMAL_TRIP", .desc = "Duration/occurrences in thermal trip; to count the number of thermal trips; edge detect must be used", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x59, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed (not necessarily retired)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x88, }, { .name = "BR_MISSP_EXEC", .desc = "Branch instructions executed that were mispredicted at execution", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x89, }, { .name = "BR_BAC_MISSP_EXEC", .desc = "Branch instructions executed that were mispredicted at Front End (BAC)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8a, }, { .name = "BR_CND_EXEC", .desc = "Conditional branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8b, }, { .name = "BR_CND_MISSP_EXEC", .desc = "Conditional branch instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8c, }, { .name = "BR_IND_EXEC", .desc = "Indirect branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8d, }, { .name = "BR_IND_MISSP_EXEC", .desc = "Indirect branch instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8e, }, { .name = "BR_RET_EXEC", .desc = "Return branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8f, }, { .name = "BR_RET_MISSP_EXEC", .desc = "Return branch instructions executed that were mispredicted at Execution", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x90, }, { .name = "BR_RET_BAC_MISSP_EXEC", .desc = "Return branch instructions executed that were mispredicted at Front End (BAC)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x91, }, { .name = "BR_CALL_EXEC", .desc = "CALL instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x92, }, { .name = "BR_CALL_MISSP_EXEC", .desc = "CALL instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x93, }, { .name = "BR_IND_CALL_EXEC", .desc = "Indirect CALL instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x94, }, { .name = "EMON_SIMD_INSTR_RETIRED", .desc = "Number of retired MMX instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xce, }, { .name = "EMON_SYNCH_UOPS", .desc = "Sync micro-ops", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd3, }, { .name = "EMON_ESP_UOPS", .desc = "Total number of micro-ops", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd7, }, { .name = "EMON_FUSED_UOPS_RET", .desc = "Total number of micro-ops", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xda, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_fused_uops_ret), .ngrp = 1, .umasks = pm_emon_fused_uops_ret, }, { .name = "EMON_UNFUSION", .desc = "Number of unfusion events in the ROB, happened on a FP exception to a fused micro-op", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xdb, }, { .name = "EMON_PREF_RQSTS_UP", .desc = "Number of upward prefetches issued", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf0, }, { .name = "EMON_PREF_RQSTS_DN", .desc = "Number of downward prefetches issued", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf8, }, { .name = "EMON_SSE_SSE2_INST_RETIRED", .desc = "Streaming SIMD extensions instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd8, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_sse_sse2_inst_retired), .ngrp = 1, .umasks = pm_emon_sse_sse2_inst_retired, }, { .name = "EMON_SSE_SSE2_COMP_INST_RETIRED", .desc = "Computational SSE instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd9, .numasks = LIBPFM_ARRAY_SIZE(pm_emon_sse_sse2_inst_retired), .ngrp = 1, .umasks = pm_emon_sse_sse2_inst_retired, /* identical to actual umasks list for this event */ }, { .name = "L2_LD", .desc = "Number of L2 data loads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ld), .ngrp = 1, .umasks = pm_l2_ld, }, { .name = "L2_LINES_IN", .desc = "Number of L2 lines allocated", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ld), .ngrp = 1, .umasks = pm_l2_ld, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_OUT", .desc = "Number of L2 lines evicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ld), .ngrp = 1, .umasks = pm_l2_ld, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_OUT", .desc = "Number of L2 M-state lines evicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(pm_l2_ld), .ngrp = 1, .umasks = pm_l2_ld, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/intel_pii_events.h0000600003276200002170000005527712247131123022065 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: pii (Intel Pentium II) */ static const intel_x86_umask_t pii_l2_ifetch[]={ { .uname = "I", .udesc = "Invalid state", .ucode = 0x100, }, { .uname = "S", .udesc = "Shared state", .ucode = 0x200, }, { .uname = "E", .udesc = "Exclusive state", .ucode = 0x400, }, { .uname = "M", .udesc = "Modified state", .ucode = 0x800, }, }; static const intel_x86_umask_t pii_bus_drdy_clocks[]={ { .uname = "SELF", .udesc = "Clocks when processor is driving bus", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "Clocks when any agent is driving bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pii_mmx_instr_type_exec[]={ { .uname = "MUL", .udesc = "MMX packed multiply instructions executed", .ucode = 0x100, }, { .uname = "SHIFT", .udesc = "MMX packed shift instructions executed", .ucode = 0x200, }, { .uname = "PACK", .udesc = "MMX pack operation instructions executed", .ucode = 0x400, }, { .uname = "UNPACK", .udesc = "MMX unpack operation instructions executed", .ucode = 0x800, }, { .uname = "LOGICAL", .udesc = "MMX packed logical instructions executed", .ucode = 0x1000, }, { .uname = "ARITH", .udesc = "MMX packed arithmetic instructions executed", .ucode = 0x2000, }, }; static const intel_x86_umask_t pii_fp_mmx_trans[]={ { .uname = "TO_FP", .udesc = "From MMX instructions to floating-point instructions", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TO_MMX", .udesc = "From floating-point instructions to MMX instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t pii_seg_rename_stalls[]={ { .uname = "ES", .udesc = "Segment register ES", .ucode = 0x100, }, { .uname = "DS", .udesc = "Segment register DS", .ucode = 0x200, }, { .uname = "FS", .udesc = "Segment register FS", .ucode = 0x400, }, { .uname = "GS", .udesc = "Segment register GS", .ucode = 0x800, }, }; static const intel_x86_entry_t intel_pii_pe[]={ { .name = "CPU_CLK_UNHALTED", .desc = "Number cycles during which the processor is not halted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x79, }, { .name = "INST_RETIRED", .desc = "Number of instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, }, { .name = "DATA_MEM_REFS", .desc = "All loads from any memory type. All stores to any memory typeEach part of a split is counted separately. The internal logic counts not only memory loads and stores but also internal retries. 80-bit floating point accesses are double counted, since they are decomposed into a 16-bit exponent load and a 64-bit mantissa load. Memory accesses are only counted when they are actually performed (such as a load that gets squashed because a previous cache miss is outstanding to the same address, and which finally gets performe, is only counted once). Does ot include I/O accesses or other non-memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x43, }, { .name = "DCU_LINES_IN", .desc = "Total lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x45, }, { .name = "DCU_M_LINES_IN", .desc = "Number of M state lines allocated in the DCU", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "DCU_M_LINES_OUT", .desc = "Number of M state lines evicted from the DCU. This includes evictions via snoop HITM, intervention or replacement", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "DCU_MISS_OUTSTANDING", .desc = "Weighted number of cycle while a DCU miss is outstanding, incremented by the number of cache misses at any particular time. Cacheable read requests only are considered. Uncacheable requests are excluded Read-for-ownerships are counted, as well as line fills, invalidates, and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "IFU_IFETCH", .desc = "Number of instruction fetches, both cacheable and noncacheable including UC fetches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "IFU_IFETCH_MISS", .desc = "Number of instruction fetch misses. All instructions fetches that do not hit the IFU (i.e., that produce memory requests). Includes UC accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB_MISS", .desc = "Number of ITLB misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x85, }, { .name = "IFU_MEM_STALL", .desc = "Number of cycles instruction fetch is stalled for any reason. Includs IFU cache misses, ITLB misses, ITLB faults, and other minor stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Number of cycles that the instruction length decoder is stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "L2_IFETCH", .desc = "Number of L2 instruction fetches. This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches: it does not include UC instruction fetches It does not include ITLB miss accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(pii_l2_ifetch), .ngrp = 1, .umasks = pii_l2_ifetch, }, { .name = "L2_ST", .desc = "Number of L2 data stores. This event indicates that a normal, unlocked, store memory access was received by the L2. Specifically, it indictes that the DCU sent a read-for ownership request to the L2. It also includes Invalid to Modified reqyests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(pii_l2_ifetch), .ngrp = 1, .umasks = pii_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_INM", .desc = "Number of modified lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, }, { .name = "L2_RQSTS", .desc = "Total number of L2 requests", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(pii_l2_ifetch), .ngrp = 1, .umasks = pii_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_ADS", .desc = "Number of L2 address strobes", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, }, { .name = "L2_DBUS_BUSY", .desc = "Number of cycles during which the L2 cache data bus was busy", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x22, }, { .name = "L2_DBUS_BUSY_RD", .desc = "Number of cycles during which the data bus was busy transferring read data from L2 to the processor", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, }, { .name = "BUS_DRDY_CLOCKS", .desc = "Number of clocks during which DRDY# is asserted. Utilization of the external system data bus during data transfers", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, }, { .name = "BUS_LOCK_CLOCKS", .desc = "Number of clocks during which LOCK# is asserted on the external system bus", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_REQ_OUTSTANDING", .desc = "Number of bus requests outstanding. This counter is incremented by the number of cacheable read bus requests outstanding in any given cycle", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, }, { .name = "BUS_TRANS_BRD", .desc = "Number of burst read transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "Number of completed read for ownership transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Number of completed write back transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_IFETCH", .desc = "Number of completed instruction fetch transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_INVAL", .desc = "Number of completed invalidate transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_PWR", .desc = "Number of completed partial write transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Number of completed partial transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "Number of completed I/O transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_DEF", .desc = "Number of completed deferred transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_BURST", .desc = "Number of completed burst transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6e, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_ANY", .desc = "Number of all completed bus transactions. Address bus utilization can be calculated knowing the minimum address bus occupancy. Includes special cycles, etc.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x70, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRAN_MEM", .desc = "Number of completed memory transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6f, .numasks = LIBPFM_ARRAY_SIZE(pii_bus_drdy_clocks), .ngrp = 1, .umasks = pii_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RECV", .desc = "Number of bus clock cycles during which this processor is receiving data", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x64, }, { .name = "BUS_BNR_DRV", .desc = "Number of bus clock cycles during which this processor is driving the BNR# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, }, { .name = "BUS_HIT_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HIT# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7a, }, { .name = "BUS_HITM_DRV", .desc = "Number of bus clock cycles during which this processor is driving the HITM# pin", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7b, }, { .name = "BUS_SNOOP_STALL", .desc = "Number of clock cycles during which the bus is snoop stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, }, { .name = "FLOPS", .desc = "Number of computational floating-point operations retired. Excludes floating-point computational operations that cause traps or assists. Includes internal sub-operations for complex floating-point instructions like transcendentals. Excludes floating point loads and stores", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xc1, }, { .name = "FP_COMP_OPS_EXE", .desc = "Number of computational floating-point operations executed. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations. This event does not distinguish an FADD used in the middle of a transcendental flow from a separate FADD instruction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "Number of floating-point exception cases handled by microcode.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Number of multiplies.This count includes integer as well as FP multiplies and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Number of divides.This count includes integer as well as FP divides and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Number of cycles during which the divider is busy, and cannot accept new divides. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "LD_BLOCKS", .desc = "Number of load operations delayed due to store buffer blocks. Includes counts caused by preceding stores whose addresses are unknown, preceding stores whose addresses are known but whose data is unknown, and preceding stores that conflicts with the load but which incompletely overlap the load", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, }, { .name = "SB_DRAINS", .desc = "Number of store buffer drain cycles. Incremented every cycle the store buffer is draining. Draining is caused by serializing operations like CPUID, synchronizing operations like XCHG, interrupt acknowledgment, as well as other conditions (such as cache flushing).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, }, { .name = "MISALIGN_MEM_REF", .desc = "Number of misaligned data memory references. Incremented by 1 every cycle during which, either the processor's load or store pipeline dispatches a misaligned micro-op Counting is performed if it is the first or second half or if it is blocked, squashed, or missed. In this context, misaligned means crossing a 64-bit boundary", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x5, }, { .name = "UOPS_RETIRED", .desc = "Number of micro-ops retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, }, { .name = "INST_DECODED", .desc = "Number of instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd0, }, { .name = "HW_INT_RX", .desc = "Number of hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "CYCLES_INT_MASKED", .desc = "Number of processor cycles for which interrupts are disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc6, }, { .name = "CYCLES_INT_PENDING_AND_MASKED", .desc = "Number of processor cycles for which interrupts are disabled and interrupts are pending.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, }, { .name = "BR_INST_RETIRED", .desc = "Number of branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, }, { .name = "BR_MISS_PRED_RETIRED", .desc = "Number of mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, }, { .name = "BR_TAKEN_RETIRED", .desc = "Number of taken branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "BR_MISS_PRED_TAKEN_RET", .desc = "Number of taken mispredicted branches retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, }, { .name = "BR_INST_DECODED", .desc = "Number of branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BTB_MISSES", .desc = "Number of branches for which the BTB did not produce a prediction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe2, }, { .name = "BR_BOGUS", .desc = "Number of bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "Number of times BACLEAR is asserted. This is the number of times that a static branch prediction was made, in which the branch decoder decided to make a branch prediction because the BTB did not", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "RESOURCE_STALLS", .desc = "Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc. In addition to resource related stalls, this event counts some other events. Includes stalls arising during branch misprediction recovery, such as if retirement of the mispredicted branch is delayed and stalls arising while store buffer is draining from synchronizing operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa2, }, { .name = "PARTIAL_RAT_STALLS", .desc = "Number of cycles or events for partial stalls. This includes flag partial stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd2, }, { .name = "SEGMENT_REG_LOADS", .desc = "Number of segment register loads.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "MMX_INSTR_EXEC", .desc = "Number of MMX instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb0, }, { .name = "MMX_INSTR_RET", .desc = "Number of MMX instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xce, }, { .name = "MMX_SAT_INSTR_EXEC", .desc = "Number of MMX saturating instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb1, }, { .name = "MMX_UOPS_EXEC", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb2, }, { .name = "MMX_INSTR_TYPE_EXEC", .desc = "Number of MMX instructions executed by type", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(pii_mmx_instr_type_exec), .ngrp = 1, .umasks = pii_mmx_instr_type_exec, }, { .name = "FP_MMX_TRANS", .desc = "Number of MMX transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(pii_fp_mmx_trans), .ngrp = 1, .umasks = pii_fp_mmx_trans, }, { .name = "MMX_ASSIST", .desc = "Number of MMX micro-ops executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "SEG_RENAME_STALLS", .desc = "Number of Segment Register Renaming Stalls", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd4, .numasks = LIBPFM_ARRAY_SIZE(pii_seg_rename_stalls), .ngrp = 1, .umasks = pii_seg_rename_stalls, }, { .name = "SEG_REG_RENAMES", .desc = "Number of Segment Register Renames", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd5, .numasks = LIBPFM_ARRAY_SIZE(pii_seg_rename_stalls), .ngrp = 1, .umasks = pii_seg_rename_stalls, /* identical to actual umasks list for this event */ }, { .name = "RET_SEG_RENAMES", .desc = "Number of segment register rename events retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd6, }, { .name = "L2_LD", .desc = "Number of L2 data loads. This event indicates that a normal, unlocked, load memory access was received by the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other non-memory accesses, or memory accesses such as UC/WT memory accesses. It does include L2 cacheable TLB miss memory accesses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(pii_l2_ifetch), .ngrp = 1, .umasks = pii_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "Number of lines allocated in the L2", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, }, { .name = "L2_LINES_OUT", .desc = "Number of lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, }, { .name = "L2_M_LINES_OUTM", .desc = "Number of modified lines removed from the L2 for any reason", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, }, }; papi-5.3.0/src/libpfm4/lib/events/montecito_events.h0000600003276200002170000037213712247131124022110 0ustar ralphundrgrad/* * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * This file is generated automatically * !! DO NOT CHANGE !! */ static pme_mont_entry_t montecito_pe []={ #define PME_MONT_ALAT_CAPACITY_MISS_ALL 0 { "ALAT_CAPACITY_MISS_ALL", {0x30058}, 0xfff0, 2, {0xffff0007}, "ALAT Entry Replaced -- both integer and floating point instructions"}, #define PME_MONT_ALAT_CAPACITY_MISS_FP 1 { "ALAT_CAPACITY_MISS_FP", {0x20058}, 0xfff0, 2, {0xffff0007}, "ALAT Entry Replaced -- only floating point instructions"}, #define PME_MONT_ALAT_CAPACITY_MISS_INT 2 { "ALAT_CAPACITY_MISS_INT", {0x10058}, 0xfff0, 2, {0xffff0007}, "ALAT Entry Replaced -- only integer instructions"}, #define PME_MONT_BACK_END_BUBBLE_ALL 3 { "BACK_END_BUBBLE_ALL", {0x0}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe -- Front-end, RSE, EXE, FPU/L1D stall or a pipeline flush due to an exception/branch misprediction"}, #define PME_MONT_BACK_END_BUBBLE_FE 4 { "BACK_END_BUBBLE_FE", {0x10000}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe -- front-end"}, #define PME_MONT_BACK_END_BUBBLE_L1D_FPU_RSE 5 { "BACK_END_BUBBLE_L1D_FPU_RSE", {0x20000}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe -- L1D_FPU or RSE."}, #define PME_MONT_BE_BR_MISPRED_DETAIL_ANY 6 { "BE_BR_MISPRED_DETAIL_ANY", {0x61}, 0xfff0, 1, {0xffff0003}, "BE Branch Misprediction Detail -- any back-end (be) mispredictions"}, #define PME_MONT_BE_BR_MISPRED_DETAIL_PFS 7 { "BE_BR_MISPRED_DETAIL_PFS", {0x30061}, 0xfff0, 1, {0xffff0003}, "BE Branch Misprediction Detail -- only back-end pfs mispredictions for taken branches"}, #define PME_MONT_BE_BR_MISPRED_DETAIL_ROT 8 { "BE_BR_MISPRED_DETAIL_ROT", {0x20061}, 0xfff0, 1, {0xffff0003}, "BE Branch Misprediction Detail -- only back-end rotate mispredictions"}, #define PME_MONT_BE_BR_MISPRED_DETAIL_STG 9 { "BE_BR_MISPRED_DETAIL_STG", {0x10061}, 0xfff0, 1, {0xffff0003}, "BE Branch Misprediction Detail -- only back-end stage mispredictions"}, #define PME_MONT_BE_EXE_BUBBLE_ALL 10 { "BE_EXE_BUBBLE_ALL", {0x2}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe"}, #define PME_MONT_BE_EXE_BUBBLE_ARCR 11 { "BE_EXE_BUBBLE_ARCR", {0x40002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to AR or CR dependency"}, #define PME_MONT_BE_EXE_BUBBLE_ARCR_PR_CANCEL_BANK 12 { "BE_EXE_BUBBLE_ARCR_PR_CANCEL_BANK", {0x80002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- ARCR, PR, CANCEL or BANK_SWITCH"}, #define PME_MONT_BE_EXE_BUBBLE_BANK_SWITCH 13 { "BE_EXE_BUBBLE_BANK_SWITCH", {0x70002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to bank switching."}, #define PME_MONT_BE_EXE_BUBBLE_CANCEL 14 { "BE_EXE_BUBBLE_CANCEL", {0x60002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to a canceled load"}, #define PME_MONT_BE_EXE_BUBBLE_FRALL 15 { "BE_EXE_BUBBLE_FRALL", {0x20002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to FR/FR or FR/load dependency"}, #define PME_MONT_BE_EXE_BUBBLE_GRALL 16 { "BE_EXE_BUBBLE_GRALL", {0x10002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to GR/GR or GR/load dependency"}, #define PME_MONT_BE_EXE_BUBBLE_GRGR 17 { "BE_EXE_BUBBLE_GRGR", {0x50002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to GR/GR dependency"}, #define PME_MONT_BE_EXE_BUBBLE_PR 18 { "BE_EXE_BUBBLE_PR", {0x30002}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to PR dependency"}, #define PME_MONT_BE_FLUSH_BUBBLE_ALL 19 { "BE_FLUSH_BUBBLE_ALL", {0x4}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to either an exception/interruption or branch misprediction flush"}, #define PME_MONT_BE_FLUSH_BUBBLE_BRU 20 { "BE_FLUSH_BUBBLE_BRU", {0x10004}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to a branch misprediction flush"}, #define PME_MONT_BE_FLUSH_BUBBLE_XPN 21 { "BE_FLUSH_BUBBLE_XPN", {0x20004}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to an exception/interruption flush"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_ALL 22 { "BE_L1D_FPU_BUBBLE_ALL", {0xca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D or FPU"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_FPU 23 { "BE_L1D_FPU_BUBBLE_FPU", {0x100ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by FPU."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D 24 { "BE_L1D_FPU_BUBBLE_L1D", {0x200ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D. This includes all stalls caused by the L1 pipeline (created in the L1D stage of the L1 pipeline which corresponds to the DET stage of the main pipe)."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_AR_CR 25 { "BE_L1D_FPU_BUBBLE_L1D_AR_CR", {0x800ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to ar/cr requiring a stall"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_FILLCONF 26 { "BE_L1D_FPU_BUBBLE_L1D_FILLCONF", {0x700ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due a store in conflict with a returning fill."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_FULLSTBUF 27 { "BE_L1D_FPU_BUBBLE_L1D_FULLSTBUF", {0x300ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to store buffer being full"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_HPW 28 { "BE_L1D_FPU_BUBBLE_L1D_HPW", {0x500ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to Hardware Page Walker"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_L2BPRESS 29 { "BE_L1D_FPU_BUBBLE_L1D_L2BPRESS", {0x900ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L2 Back Pressure"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_LDCHK 30 { "BE_L1D_FPU_BUBBLE_L1D_LDCHK", {0xc00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to architectural ordering conflict"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_LDCONF 31 { "BE_L1D_FPU_BUBBLE_L1D_LDCONF", {0xb00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to architectural ordering conflict"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_NAT 32 { "BE_L1D_FPU_BUBBLE_L1D_NAT", {0xd00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L1D data return needing recirculated NaT generation."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_NATCONF 33 { "BE_L1D_FPU_BUBBLE_L1D_NATCONF", {0xf00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to ld8.fill conflict with st8.spill not written to unat."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_PIPE_RECIRC 34 { "BE_L1D_FPU_BUBBLE_L1D_PIPE_RECIRC", {0x400ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to recirculate"}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_STBUFRECIR 35 { "BE_L1D_FPU_BUBBLE_L1D_STBUFRECIR", {0xe00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to store buffer cancel needing recirculate."}, #define PME_MONT_BE_L1D_FPU_BUBBLE_L1D_TLB 36 { "BE_L1D_FPU_BUBBLE_L1D_TLB", {0xa00ca}, 0xfff0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L2DTLB to L1DTLB transfer"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_ALL 37 { "BE_LOST_BW_DUE_TO_FE_ALL", {0x72}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- count regardless of cause"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_BI 38 { "BE_LOST_BW_DUE_TO_FE_BI", {0x90072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch initialization stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_BRQ 39 { "BE_LOST_BW_DUE_TO_FE_BRQ", {0xa0072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch retirement queue stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_BR_ILOCK 40 { "BE_LOST_BW_DUE_TO_FE_BR_ILOCK", {0xc0072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch interlock stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_BUBBLE 41 { "BE_LOST_BW_DUE_TO_FE_BUBBLE", {0xd0072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch resteer bubble stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_FEFLUSH 42 { "BE_LOST_BW_DUE_TO_FE_FEFLUSH", {0x10072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by a front-end flush"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC 43 { "BE_LOST_BW_DUE_TO_FE_FILL_RECIRC", {0x80072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by a recirculate for a cache line fill operation"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_IBFULL 44 { "BE_LOST_BW_DUE_TO_FE_IBFULL", {0x50072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- (* meaningless for this event *)"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_IMISS 45 { "BE_LOST_BW_DUE_TO_FE_IMISS", {0x60072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by instruction cache miss stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_PLP 46 { "BE_LOST_BW_DUE_TO_FE_PLP", {0xb0072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by perfect loop prediction stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_TLBMISS 47 { "BE_LOST_BW_DUE_TO_FE_TLBMISS", {0x70072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by TLB stall"}, #define PME_MONT_BE_LOST_BW_DUE_TO_FE_UNREACHED 48 { "BE_LOST_BW_DUE_TO_FE_UNREACHED", {0x40072}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by unreachable bundle"}, #define PME_MONT_BE_RSE_BUBBLE_ALL 49 { "BE_RSE_BUBBLE_ALL", {0x1}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE"}, #define PME_MONT_BE_RSE_BUBBLE_AR_DEP 50 { "BE_RSE_BUBBLE_AR_DEP", {0x20001}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to AR dependencies"}, #define PME_MONT_BE_RSE_BUBBLE_BANK_SWITCH 51 { "BE_RSE_BUBBLE_BANK_SWITCH", {0x10001}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to bank switching"}, #define PME_MONT_BE_RSE_BUBBLE_LOADRS 52 { "BE_RSE_BUBBLE_LOADRS", {0x50001}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to loadrs calculations"}, #define PME_MONT_BE_RSE_BUBBLE_OVERFLOW 53 { "BE_RSE_BUBBLE_OVERFLOW", {0x30001}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to need to spill"}, #define PME_MONT_BE_RSE_BUBBLE_UNDERFLOW 54 { "BE_RSE_BUBBLE_UNDERFLOW", {0x40001}, 0xfff0, 1, {0xffff0000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to need to fill"}, #define PME_MONT_BR_MISPRED_DETAIL_ALL_ALL_PRED 55 { "BR_MISPRED_DETAIL_ALL_ALL_PRED", {0x5b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- All branch types regardless of prediction result"}, #define PME_MONT_BR_MISPRED_DETAIL_ALL_CORRECT_PRED 56 { "BR_MISPRED_DETAIL_ALL_CORRECT_PRED", {0x1005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- All branch types, correctly predicted branches (outcome and target)"}, #define PME_MONT_BR_MISPRED_DETAIL_ALL_WRONG_PATH 57 { "BR_MISPRED_DETAIL_ALL_WRONG_PATH", {0x2005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- All branch types, mispredicted branches due to wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL_ALL_WRONG_TARGET 58 { "BR_MISPRED_DETAIL_ALL_WRONG_TARGET", {0x3005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- All branch types, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_BR_MISPRED_DETAIL_IPREL_ALL_PRED 59 { "BR_MISPRED_DETAIL_IPREL_ALL_PRED", {0x4005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only IP relative branches, regardless of prediction result"}, #define PME_MONT_BR_MISPRED_DETAIL_IPREL_CORRECT_PRED 60 { "BR_MISPRED_DETAIL_IPREL_CORRECT_PRED", {0x5005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only IP relative branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_BR_MISPRED_DETAIL_IPREL_WRONG_PATH 61 { "BR_MISPRED_DETAIL_IPREL_WRONG_PATH", {0x6005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only IP relative branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL_IPREL_WRONG_TARGET 62 { "BR_MISPRED_DETAIL_IPREL_WRONG_TARGET", {0x7005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only IP relative branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_BR_MISPRED_DETAIL_NRETIND_ALL_PRED 63 { "BR_MISPRED_DETAIL_NRETIND_ALL_PRED", {0xc005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, regardless of prediction result"}, #define PME_MONT_BR_MISPRED_DETAIL_NRETIND_CORRECT_PRED 64 { "BR_MISPRED_DETAIL_NRETIND_CORRECT_PRED", {0xd005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_BR_MISPRED_DETAIL_NRETIND_WRONG_PATH 65 { "BR_MISPRED_DETAIL_NRETIND_WRONG_PATH", {0xe005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL_NRETIND_WRONG_TARGET 66 { "BR_MISPRED_DETAIL_NRETIND_WRONG_TARGET", {0xf005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_BR_MISPRED_DETAIL_RETURN_ALL_PRED 67 { "BR_MISPRED_DETAIL_RETURN_ALL_PRED", {0x8005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only return type branches, regardless of prediction result"}, #define PME_MONT_BR_MISPRED_DETAIL_RETURN_CORRECT_PRED 68 { "BR_MISPRED_DETAIL_RETURN_CORRECT_PRED", {0x9005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only return type branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_BR_MISPRED_DETAIL_RETURN_WRONG_PATH 69 { "BR_MISPRED_DETAIL_RETURN_WRONG_PATH", {0xa005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only return type branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL_RETURN_WRONG_TARGET 70 { "BR_MISPRED_DETAIL_RETURN_WRONG_TARGET", {0xb005b}, 0xfff0, 3, {0xffff0003}, "FE Branch Mispredict Detail -- Only return type branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_BR_MISPRED_DETAIL2_ALL_ALL_UNKNOWN_PRED 71 { "BR_MISPRED_DETAIL2_ALL_ALL_UNKNOWN_PRED", {0x68}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction"}, #define PME_MONT_BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_CORRECT_PRED 72 { "BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_CORRECT_PRED", {0x10068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction and correctly predicted branch (outcome & target)"}, #define PME_MONT_BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_WRONG_PATH 73 { "BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_WRONG_PATH", {0x20068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction and wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL2_IPREL_ALL_UNKNOWN_PRED 74 { "BR_MISPRED_DETAIL2_IPREL_ALL_UNKNOWN_PRED", {0x40068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction"}, #define PME_MONT_BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_CORRECT_PRED 75 { "BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_CORRECT_PRED", {0x50068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_MONT_BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_WRONG_PATH 76 { "BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_WRONG_PATH", {0x60068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction and wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL2_NRETIND_ALL_UNKNOWN_PRED 77 { "BR_MISPRED_DETAIL2_NRETIND_ALL_UNKNOWN_PRED", {0xc0068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction"}, #define PME_MONT_BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_CORRECT_PRED 78 { "BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_CORRECT_PRED", {0xd0068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_MONT_BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_WRONG_PATH 79 { "BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_WRONG_PATH", {0xe0068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction and wrong branch direction"}, #define PME_MONT_BR_MISPRED_DETAIL2_RETURN_ALL_UNKNOWN_PRED 80 { "BR_MISPRED_DETAIL2_RETURN_ALL_UNKNOWN_PRED", {0x80068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction"}, #define PME_MONT_BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_CORRECT_PRED 81 { "BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_CORRECT_PRED", {0x90068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_MONT_BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_WRONG_PATH 82 { "BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_WRONG_PATH", {0xa0068}, 0xfff0, 2, {0xffff0003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction and wrong branch direction"}, #define PME_MONT_BR_PATH_PRED_ALL_MISPRED_NOTTAKEN 83 { "BR_PATH_PRED_ALL_MISPRED_NOTTAKEN", {0x54}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- All branch types, incorrectly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_ALL_MISPRED_TAKEN 84 { "BR_PATH_PRED_ALL_MISPRED_TAKEN", {0x10054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- All branch types, incorrectly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_ALL_OKPRED_NOTTAKEN 85 { "BR_PATH_PRED_ALL_OKPRED_NOTTAKEN", {0x20054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- All branch types, correctly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_ALL_OKPRED_TAKEN 86 { "BR_PATH_PRED_ALL_OKPRED_TAKEN", {0x30054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- All branch types, correctly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_IPREL_MISPRED_NOTTAKEN 87 { "BR_PATH_PRED_IPREL_MISPRED_NOTTAKEN", {0x40054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only IP relative branches, incorrectly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_IPREL_MISPRED_TAKEN 88 { "BR_PATH_PRED_IPREL_MISPRED_TAKEN", {0x50054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only IP relative branches, incorrectly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_IPREL_OKPRED_NOTTAKEN 89 { "BR_PATH_PRED_IPREL_OKPRED_NOTTAKEN", {0x60054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only IP relative branches, correctly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_IPREL_OKPRED_TAKEN 90 { "BR_PATH_PRED_IPREL_OKPRED_TAKEN", {0x70054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only IP relative branches, correctly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_NRETIND_MISPRED_NOTTAKEN 91 { "BR_PATH_PRED_NRETIND_MISPRED_NOTTAKEN", {0xc0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, incorrectly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_NRETIND_MISPRED_TAKEN 92 { "BR_PATH_PRED_NRETIND_MISPRED_TAKEN", {0xd0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, incorrectly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_NRETIND_OKPRED_NOTTAKEN 93 { "BR_PATH_PRED_NRETIND_OKPRED_NOTTAKEN", {0xe0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, correctly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_NRETIND_OKPRED_TAKEN 94 { "BR_PATH_PRED_NRETIND_OKPRED_TAKEN", {0xf0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, correctly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_RETURN_MISPRED_NOTTAKEN 95 { "BR_PATH_PRED_RETURN_MISPRED_NOTTAKEN", {0x80054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only return type branches, incorrectly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_RETURN_MISPRED_TAKEN 96 { "BR_PATH_PRED_RETURN_MISPRED_TAKEN", {0x90054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only return type branches, incorrectly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED_RETURN_OKPRED_NOTTAKEN 97 { "BR_PATH_PRED_RETURN_OKPRED_NOTTAKEN", {0xa0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only return type branches, correctly predicted path and not taken branch"}, #define PME_MONT_BR_PATH_PRED_RETURN_OKPRED_TAKEN 98 { "BR_PATH_PRED_RETURN_OKPRED_TAKEN", {0xb0054}, 0xfff0, 3, {0xffff0003}, "FE Branch Path Prediction Detail -- Only return type branches, correctly predicted path and taken branch"}, #define PME_MONT_BR_PATH_PRED2_ALL_UNKNOWNPRED_NOTTAKEN 99 { "BR_PATH_PRED2_ALL_UNKNOWNPRED_NOTTAKEN", {0x6a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- All branch types, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_MONT_BR_PATH_PRED2_ALL_UNKNOWNPRED_TAKEN 100 { "BR_PATH_PRED2_ALL_UNKNOWNPRED_TAKEN", {0x1006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- All branch types, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_MONT_BR_PATH_PRED2_IPREL_UNKNOWNPRED_NOTTAKEN 101 { "BR_PATH_PRED2_IPREL_UNKNOWNPRED_NOTTAKEN", {0x4006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only IP relative branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_MONT_BR_PATH_PRED2_IPREL_UNKNOWNPRED_TAKEN 102 { "BR_PATH_PRED2_IPREL_UNKNOWNPRED_TAKEN", {0x5006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only IP relative branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_MONT_BR_PATH_PRED2_NRETIND_UNKNOWNPRED_NOTTAKEN 103 { "BR_PATH_PRED2_NRETIND_UNKNOWNPRED_NOTTAKEN", {0xc006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only non-return indirect branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_MONT_BR_PATH_PRED2_NRETIND_UNKNOWNPRED_TAKEN 104 { "BR_PATH_PRED2_NRETIND_UNKNOWNPRED_TAKEN", {0xd006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only non-return indirect branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_MONT_BR_PATH_PRED2_RETURN_UNKNOWNPRED_NOTTAKEN 105 { "BR_PATH_PRED2_RETURN_UNKNOWNPRED_NOTTAKEN", {0x8006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only return type branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_MONT_BR_PATH_PRED2_RETURN_UNKNOWNPRED_TAKEN 106 { "BR_PATH_PRED2_RETURN_UNKNOWNPRED_TAKEN", {0x9006a}, 0xfff0, 2, {0xffff0003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only return type branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_MONT_BUS_ALL_ANY 107 { "BUS_ALL_ANY", {0x31887}, 0x03f0, 1, {0xffff0000}, "Bus Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_ALL_EITHER 108 { "BUS_ALL_EITHER", {0x1887}, 0x03f0, 1, {0xffff0000}, "Bus Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_ALL_IO 109 { "BUS_ALL_IO", {0x11887}, 0x03f0, 1, {0xffff0000}, "Bus Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_ALL_SELF 110 { "BUS_ALL_SELF", {0x21887}, 0x03f0, 1, {0xffff0000}, "Bus Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_B2B_DATA_CYCLES_ANY 111 { "BUS_B2B_DATA_CYCLES_ANY", {0x31093}, 0x03f0, 1, {0xffff0000}, "Back to Back Data Cycles on the Bus -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_B2B_DATA_CYCLES_EITHER 112 { "BUS_B2B_DATA_CYCLES_EITHER", {0x1093}, 0x03f0, 1, {0xffff0000}, "Back to Back Data Cycles on the Bus -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_B2B_DATA_CYCLES_IO 113 { "BUS_B2B_DATA_CYCLES_IO", {0x11093}, 0x03f0, 1, {0xffff0000}, "Back to Back Data Cycles on the Bus -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_B2B_DATA_CYCLES_SELF 114 { "BUS_B2B_DATA_CYCLES_SELF", {0x21093}, 0x03f0, 1, {0xffff0000}, "Back to Back Data Cycles on the Bus -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_DATA_CYCLE_ANY 115 { "BUS_DATA_CYCLE_ANY", {0x31088}, 0x03f0, 1, {0xffff0000}, "Valid Data Cycle on the Bus -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_DATA_CYCLE_EITHER 116 { "BUS_DATA_CYCLE_EITHER", {0x1088}, 0x03f0, 1, {0xffff0000}, "Valid Data Cycle on the Bus -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_DATA_CYCLE_IO 117 { "BUS_DATA_CYCLE_IO", {0x11088}, 0x03f0, 1, {0xffff0000}, "Valid Data Cycle on the Bus -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_DATA_CYCLE_SELF 118 { "BUS_DATA_CYCLE_SELF", {0x21088}, 0x03f0, 1, {0xffff0000}, "Valid Data Cycle on the Bus -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_HITM_ANY 119 { "BUS_HITM_ANY", {0x31884}, 0x03f0, 1, {0xffff0000}, "Bus Hit Modified Line Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_HITM_EITHER 120 { "BUS_HITM_EITHER", {0x1884}, 0x03f0, 1, {0xffff0000}, "Bus Hit Modified Line Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_HITM_IO 121 { "BUS_HITM_IO", {0x11884}, 0x03f0, 1, {0xffff0000}, "Bus Hit Modified Line Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_HITM_SELF 122 { "BUS_HITM_SELF", {0x21884}, 0x03f0, 1, {0xffff0000}, "Bus Hit Modified Line Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_IO_ANY 123 { "BUS_IO_ANY", {0x31890}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Bus Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_IO_EITHER 124 { "BUS_IO_EITHER", {0x1890}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Bus Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_IO_IO 125 { "BUS_IO_IO", {0x11890}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Bus Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_IO_SELF 126 { "BUS_IO_SELF", {0x21890}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Bus Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_MEMORY_ALL_ANY 127 { "BUS_MEMORY_ALL_ANY", {0xf188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- All bus transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEMORY_ALL_EITHER 128 { "BUS_MEMORY_ALL_EITHER", {0xc188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- All bus transactions from non-CPU priority agents"}, #define PME_MONT_BUS_MEMORY_ALL_IO 129 { "BUS_MEMORY_ALL_IO", {0xd188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- All bus transactions from 'this' local processor"}, #define PME_MONT_BUS_MEMORY_ALL_SELF 130 { "BUS_MEMORY_ALL_SELF", {0xe188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- All bus transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEMORY_EQ_128BYTE_ANY 131 { "BUS_MEMORY_EQ_128BYTE_ANY", {0x7188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP, BIL) from either local processor"}, #define PME_MONT_BUS_MEMORY_EQ_128BYTE_EITHER 132 { "BUS_MEMORY_EQ_128BYTE_EITHER", {0x4188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL, BRC, BCR, BCCL) from non-CPU priority agents"}, #define PME_MONT_BUS_MEMORY_EQ_128BYTE_IO 133 { "BUS_MEMORY_EQ_128BYTE_IO", {0x5188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL, BRC, BCR, BCCL) from 'this' processor"}, #define PME_MONT_BUS_MEMORY_EQ_128BYTE_SELF 134 { "BUS_MEMORY_EQ_128BYTE_SELF", {0x6188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL, BRC, BCR, BCCL) from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEMORY_LT_128BYTE_ANY 135 { "BUS_MEMORY_LT_128BYTE_ANY", {0xb188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- All bus transactions from either local processor"}, #define PME_MONT_BUS_MEMORY_LT_128BYTE_EITHER 136 { "BUS_MEMORY_LT_128BYTE_EITHER", {0x8188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP, BIL) from non-CPU priority agents"}, #define PME_MONT_BUS_MEMORY_LT_128BYTE_IO 137 { "BUS_MEMORY_LT_128BYTE_IO", {0x9188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP, BIL) from 'this' processor"}, #define PME_MONT_BUS_MEMORY_LT_128BYTE_SELF 138 { "BUS_MEMORY_LT_128BYTE_SELF", {0xa188a}, 0x03f0, 1, {0xffff0000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP, BIL) CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEM_READ_ALL_ANY 139 { "BUS_MEM_READ_ALL_ANY", {0xf188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEM_READ_ALL_EITHER 140 { "BUS_MEM_READ_ALL_EITHER", {0xc188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from either local processor"}, #define PME_MONT_BUS_MEM_READ_ALL_IO 141 { "BUS_MEM_READ_ALL_IO", {0xd188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from non-CPU priority agents"}, #define PME_MONT_BUS_MEM_READ_ALL_SELF 142 { "BUS_MEM_READ_ALL_SELF", {0xe188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from local processor"}, #define PME_MONT_BUS_MEM_READ_BIL_ANY 143 { "BUS_MEM_READ_BIL_ANY", {0x3188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEM_READ_BIL_EITHER 144 { "BUS_MEM_READ_BIL_EITHER", {0x188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from either local processor"}, #define PME_MONT_BUS_MEM_READ_BIL_IO 145 { "BUS_MEM_READ_BIL_IO", {0x1188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from non-CPU priority agents"}, #define PME_MONT_BUS_MEM_READ_BIL_SELF 146 { "BUS_MEM_READ_BIL_SELF", {0x2188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from local processor"}, #define PME_MONT_BUS_MEM_READ_BRIL_ANY 147 { "BUS_MEM_READ_BRIL_ANY", {0xb188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEM_READ_BRIL_EITHER 148 { "BUS_MEM_READ_BRIL_EITHER", {0x8188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from either local processor"}, #define PME_MONT_BUS_MEM_READ_BRIL_IO 149 { "BUS_MEM_READ_BRIL_IO", {0x9188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from non-CPU priority agents"}, #define PME_MONT_BUS_MEM_READ_BRIL_SELF 150 { "BUS_MEM_READ_BRIL_SELF", {0xa188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from local processor"}, #define PME_MONT_BUS_MEM_READ_BRL_ANY 151 { "BUS_MEM_READ_BRL_ANY", {0x7188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_MEM_READ_BRL_EITHER 152 { "BUS_MEM_READ_BRL_EITHER", {0x4188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from either local processor"}, #define PME_MONT_BUS_MEM_READ_BRL_IO 153 { "BUS_MEM_READ_BRL_IO", {0x5188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from non-CPU priority agents"}, #define PME_MONT_BUS_MEM_READ_BRL_SELF 154 { "BUS_MEM_READ_BRL_SELF", {0x6188b}, 0x03f0, 1, {0xffff0000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from local processor"}, #define PME_MONT_BUS_RD_DATA_ANY 155 { "BUS_RD_DATA_ANY", {0x3188c}, 0x03f0, 1, {0xffff0000}, "Bus Read Data Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_DATA_EITHER 156 { "BUS_RD_DATA_EITHER", {0x188c}, 0x03f0, 1, {0xffff0000}, "Bus Read Data Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_DATA_IO 157 { "BUS_RD_DATA_IO", {0x1188c}, 0x03f0, 1, {0xffff0000}, "Bus Read Data Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_DATA_SELF 158 { "BUS_RD_DATA_SELF", {0x2188c}, 0x03f0, 1, {0xffff0000}, "Bus Read Data Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_HIT_ANY 159 { "BUS_RD_HIT_ANY", {0x31880}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Clean Non-local Cache Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_HIT_EITHER 160 { "BUS_RD_HIT_EITHER", {0x1880}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Clean Non-local Cache Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_HIT_IO 161 { "BUS_RD_HIT_IO", {0x11880}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Clean Non-local Cache Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_HIT_SELF 162 { "BUS_RD_HIT_SELF", {0x21880}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Clean Non-local Cache Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_HITM_ANY 163 { "BUS_RD_HITM_ANY", {0x31881}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Modified Non-local Cache Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_HITM_EITHER 164 { "BUS_RD_HITM_EITHER", {0x1881}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Modified Non-local Cache Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_HITM_IO 165 { "BUS_RD_HITM_IO", {0x11881}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Modified Non-local Cache Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_HITM_SELF 166 { "BUS_RD_HITM_SELF", {0x21881}, 0x03f0, 1, {0xffff0000}, "Bus Read Hit Modified Non-local Cache Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_INVAL_BST_HITM_ANY 167 { "BUS_RD_INVAL_BST_HITM_ANY", {0x31883}, 0x03f0, 1, {0xffff0000}, "Bus BRIL Transaction Results in HITM -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_INVAL_BST_HITM_EITHER 168 { "BUS_RD_INVAL_BST_HITM_EITHER", {0x1883}, 0x03f0, 1, {0xffff0000}, "Bus BRIL Transaction Results in HITM -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_INVAL_BST_HITM_IO 169 { "BUS_RD_INVAL_BST_HITM_IO", {0x11883}, 0x03f0, 1, {0xffff0000}, "Bus BRIL Transaction Results in HITM -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_INVAL_BST_HITM_SELF 170 { "BUS_RD_INVAL_BST_HITM_SELF", {0x21883}, 0x03f0, 1, {0xffff0000}, "Bus BRIL Transaction Results in HITM -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_INVAL_HITM_ANY 171 { "BUS_RD_INVAL_HITM_ANY", {0x31882}, 0x03f0, 1, {0xffff0000}, "Bus BIL Transaction Results in HITM -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_INVAL_HITM_EITHER 172 { "BUS_RD_INVAL_HITM_EITHER", {0x1882}, 0x03f0, 1, {0xffff0000}, "Bus BIL Transaction Results in HITM -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_INVAL_HITM_IO 173 { "BUS_RD_INVAL_HITM_IO", {0x11882}, 0x03f0, 1, {0xffff0000}, "Bus BIL Transaction Results in HITM -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_INVAL_HITM_SELF 174 { "BUS_RD_INVAL_HITM_SELF", {0x21882}, 0x03f0, 1, {0xffff0000}, "Bus BIL Transaction Results in HITM -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_IO_ANY 175 { "BUS_RD_IO_ANY", {0x31891}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Read Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_IO_EITHER 176 { "BUS_RD_IO_EITHER", {0x1891}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Read Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_IO_IO 177 { "BUS_RD_IO_IO", {0x11891}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Read Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_IO_SELF 178 { "BUS_RD_IO_SELF", {0x21891}, 0x03f0, 1, {0xffff0000}, "IA-32 Compatible IO Read Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_RD_PRTL_ANY 179 { "BUS_RD_PRTL_ANY", {0x3188d}, 0x03f0, 1, {0xffff0000}, "Bus Read Partial Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_RD_PRTL_EITHER 180 { "BUS_RD_PRTL_EITHER", {0x188d}, 0x03f0, 1, {0xffff0000}, "Bus Read Partial Transactions -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_RD_PRTL_IO 181 { "BUS_RD_PRTL_IO", {0x1188d}, 0x03f0, 1, {0xffff0000}, "Bus Read Partial Transactions -- transactions initiated by non-CPU priority agents"}, #define PME_MONT_BUS_RD_PRTL_SELF 182 { "BUS_RD_PRTL_SELF", {0x2188d}, 0x03f0, 1, {0xffff0000}, "Bus Read Partial Transactions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_BUS_SNOOP_STALL_CYCLES_ANY 183 { "BUS_SNOOP_STALL_CYCLES_ANY", {0x3188f}, 0x03f0, 1, {0xffff0000}, "Bus Snoop Stall Cycles (from any agent) -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_SNOOP_STALL_CYCLES_EITHER 184 { "BUS_SNOOP_STALL_CYCLES_EITHER", {0x188f}, 0x03f0, 1, {0xffff0000}, "Bus Snoop Stall Cycles (from any agent) -- transactions initiated by either cpu core"}, #define PME_MONT_BUS_SNOOP_STALL_CYCLES_SELF 185 { "BUS_SNOOP_STALL_CYCLES_SELF", {0x2188f}, 0x03f0, 1, {0xffff0000}, "Bus Snoop Stall Cycles (from any agent) -- local processor"}, #define PME_MONT_BUS_WR_WB_ALL_ANY 186 { "BUS_WR_WB_ALL_ANY", {0xf1892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)."}, #define PME_MONT_BUS_WR_WB_ALL_IO 187 { "BUS_WR_WB_ALL_IO", {0xd1892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- non-CPU priority agents"}, #define PME_MONT_BUS_WR_WB_ALL_SELF 188 { "BUS_WR_WB_ALL_SELF", {0xe1892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- this' processor"}, #define PME_MONT_BUS_WR_WB_CCASTOUT_ANY 189 { "BUS_WR_WB_CCASTOUT_ANY", {0xb1892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)/Only 0-byte transactions with write back attribute (clean cast outs) will be counted"}, #define PME_MONT_BUS_WR_WB_CCASTOUT_SELF 190 { "BUS_WR_WB_CCASTOUT_SELF", {0xa1892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- this' processor/Only 0-byte transactions with write back attribute (clean cast outs) will be counted"}, #define PME_MONT_BUS_WR_WB_EQ_128BYTE_ANY 191 { "BUS_WR_WB_EQ_128BYTE_ANY", {0x71892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)./Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_MONT_BUS_WR_WB_EQ_128BYTE_IO 192 { "BUS_WR_WB_EQ_128BYTE_IO", {0x51892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- non-CPU priority agents/Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_MONT_BUS_WR_WB_EQ_128BYTE_SELF 193 { "BUS_WR_WB_EQ_128BYTE_SELF", {0x61892}, 0x03f0, 1, {0xffff0000}, "Bus Write Back Transactions -- this' processor/Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_MONT_CPU_CPL_CHANGES_ALL 194 { "CPU_CPL_CHANGES_ALL", {0xf0013}, 0xfff0, 1, {0xffff0000}, "Privilege Level Changes -- All changes in cpl counted"}, #define PME_MONT_CPU_CPL_CHANGES_LVL0 195 { "CPU_CPL_CHANGES_LVL0", {0x10013}, 0xfff0, 1, {0xffff0000}, "Privilege Level Changes -- All changes to/from privilege level0 are counted"}, #define PME_MONT_CPU_CPL_CHANGES_LVL1 196 { "CPU_CPL_CHANGES_LVL1", {0x20013}, 0xfff0, 1, {0xffff0000}, "Privilege Level Changes -- All changes to/from privilege level1 are counted"}, #define PME_MONT_CPU_CPL_CHANGES_LVL2 197 { "CPU_CPL_CHANGES_LVL2", {0x40013}, 0xfff0, 1, {0xffff0000}, "Privilege Level Changes -- All changes to/from privilege level2 are counted"}, #define PME_MONT_CPU_CPL_CHANGES_LVL3 198 { "CPU_CPL_CHANGES_LVL3", {0x80013}, 0xfff0, 1, {0xffff0000}, "Privilege Level Changes -- All changes to/from privilege level3 are counted"}, #define PME_MONT_CPU_OP_CYCLES_ALL 199 { "CPU_OP_CYCLES_ALL", {0x1012}, 0xfff0, 1, {0xffff0000}, "CPU Operating Cycles -- All CPU cycles counted"}, #define PME_MONT_CPU_OP_CYCLES_QUAL 200 { "CPU_OP_CYCLES_QUAL", {0x11012}, 0xfff0, 1, {0xffff0003}, "CPU Operating Cycles -- Qualified cycles only"}, #define PME_MONT_CPU_OP_CYCLES_HALTED 201 { "CPU_OP_CYCLES_HALTED", {0x1018}, 0x0400, 7, {0xffff0000}, "CPU Operating Cycles Halted"}, #define PME_MONT_DATA_DEBUG_REGISTER_FAULT 202 { "DATA_DEBUG_REGISTER_FAULT", {0x52}, 0xfff0, 1, {0xffff0000}, "Fault Due to Data Debug Reg. Match to Load/Store Instruction"}, #define PME_MONT_DATA_DEBUG_REGISTER_MATCHES 203 { "DATA_DEBUG_REGISTER_MATCHES", {0xc6}, 0xfff0, 1, {0xffff0007}, "Data Debug Register Matches Data Address of Memory Reference."}, #define PME_MONT_DATA_EAR_ALAT 204 { "DATA_EAR_ALAT", {0xec8}, 0xfff0, 1, {0xffff0007}, "Data EAR ALAT"}, #define PME_MONT_DATA_EAR_CACHE_LAT1024 205 { "DATA_EAR_CACHE_LAT1024", {0x80dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 1024 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT128 206 { "DATA_EAR_CACHE_LAT128", {0x50dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 128 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT16 207 { "DATA_EAR_CACHE_LAT16", {0x20dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 16 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT2048 208 { "DATA_EAR_CACHE_LAT2048", {0x90dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 2048 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT256 209 { "DATA_EAR_CACHE_LAT256", {0x60dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 256 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT32 210 { "DATA_EAR_CACHE_LAT32", {0x30dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 32 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT4 211 { "DATA_EAR_CACHE_LAT4", {0xdc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 4 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT4096 212 { "DATA_EAR_CACHE_LAT4096", {0xa0dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 4096 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT512 213 { "DATA_EAR_CACHE_LAT512", {0x70dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 512 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT64 214 { "DATA_EAR_CACHE_LAT64", {0x40dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 64 Cycles"}, #define PME_MONT_DATA_EAR_CACHE_LAT8 215 { "DATA_EAR_CACHE_LAT8", {0x10dc8}, 0xfff0, 1, {0xffff0007}, "Data EAR Cache -- >= 8 Cycles"}, #define PME_MONT_DATA_EAR_EVENTS 216 { "DATA_EAR_EVENTS", {0x8c8}, 0xfff0, 1, {0xffff0007}, "L1 Data Cache EAR Events"}, #define PME_MONT_DATA_EAR_TLB_ALL 217 { "DATA_EAR_TLB_ALL", {0xe0cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- All L1 DTLB Misses"}, #define PME_MONT_DATA_EAR_TLB_FAULT 218 { "DATA_EAR_TLB_FAULT", {0x80cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- DTLB Misses which produce a software fault"}, #define PME_MONT_DATA_EAR_TLB_L2DTLB 219 { "DATA_EAR_TLB_L2DTLB", {0x20cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB"}, #define PME_MONT_DATA_EAR_TLB_L2DTLB_OR_FAULT 220 { "DATA_EAR_TLB_L2DTLB_OR_FAULT", {0xa0cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB or produce a software fault"}, #define PME_MONT_DATA_EAR_TLB_L2DTLB_OR_VHPT 221 { "DATA_EAR_TLB_L2DTLB_OR_VHPT", {0x60cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB or VHPT"}, #define PME_MONT_DATA_EAR_TLB_VHPT 222 { "DATA_EAR_TLB_VHPT", {0x40cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- L1 DTLB Misses which hit VHPT"}, #define PME_MONT_DATA_EAR_TLB_VHPT_OR_FAULT 223 { "DATA_EAR_TLB_VHPT_OR_FAULT", {0xc0cc8}, 0xfff0, 1, {0xffff0007}, "Data EAR TLB -- L1 DTLB Misses which hit VHPT or produce a software fault"}, #define PME_MONT_DATA_REFERENCES_SET0 224 { "DATA_REFERENCES_SET0", {0xc3}, 0xfff0, 4, {0x5010007}, "Data Memory References Issued to Memory Pipeline"}, #define PME_MONT_DATA_REFERENCES_SET1 225 { "DATA_REFERENCES_SET1", {0xc5}, 0xfff0, 4, {0x5110007}, "Data Memory References Issued to Memory Pipeline"}, #define PME_MONT_DISP_STALLED 226 { "DISP_STALLED", {0x49}, 0xfff0, 1, {0xffff0000}, "Number of Cycles Dispersal Stalled"}, #define PME_MONT_DTLB_INSERTS_HPW 227 { "DTLB_INSERTS_HPW", {0x8c9}, 0xfff0, 4, {0xffff0000}, "Hardware Page Walker Installs to DTLB"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL_ALL_PRED 228 { "ENCBR_MISPRED_DETAIL_ALL_ALL_PRED", {0x63}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- All encoded branches regardless of prediction result"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL_CORRECT_PRED 229 { "ENCBR_MISPRED_DETAIL_ALL_CORRECT_PRED", {0x10063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- All encoded branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL_WRONG_PATH 230 { "ENCBR_MISPRED_DETAIL_ALL_WRONG_PATH", {0x20063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- All encoded branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL_WRONG_TARGET 231 { "ENCBR_MISPRED_DETAIL_ALL_WRONG_TARGET", {0x30063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- All encoded branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL2_ALL_PRED 232 { "ENCBR_MISPRED_DETAIL_ALL2_ALL_PRED", {0xc0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, regardless of prediction result"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL2_CORRECT_PRED 233 { "ENCBR_MISPRED_DETAIL_ALL2_CORRECT_PRED", {0xd0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL2_WRONG_PATH 234 { "ENCBR_MISPRED_DETAIL_ALL2_WRONG_PATH", {0xe0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_ALL2_WRONG_TARGET 235 { "ENCBR_MISPRED_DETAIL_ALL2_WRONG_TARGET", {0xf0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_OVERSUB_ALL_PRED 236 { "ENCBR_MISPRED_DETAIL_OVERSUB_ALL_PRED", {0x80063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only return type branches, regardless of prediction result"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_OVERSUB_CORRECT_PRED 237 { "ENCBR_MISPRED_DETAIL_OVERSUB_CORRECT_PRED", {0x90063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only return type branches, correctly predicted branches (outcome and target)"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_PATH 238 { "ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_PATH", {0xa0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only return type branches, mispredicted branches due to wrong branch direction"}, #define PME_MONT_ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_TARGET 239 { "ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_TARGET", {0xb0063}, 0xfff0, 3, {0xffff0003}, "Number of Encoded Branches Retired -- Only return type branches, mispredicted branches due to wrong target for taken branches"}, #define PME_MONT_ER_BKSNP_ME_ACCEPTED 240 { "ER_BKSNP_ME_ACCEPTED", {0x10bb}, 0x03f0, 2, {0xffff0000}, "Backsnoop Me Accepted"}, #define PME_MONT_ER_BRQ_LIVE_REQ_HI 241 { "ER_BRQ_LIVE_REQ_HI", {0x10b8}, 0x03f0, 2, {0xffff0000}, "BRQ Live Requests (upper 2 bits)"}, #define PME_MONT_ER_BRQ_LIVE_REQ_LO 242 { "ER_BRQ_LIVE_REQ_LO", {0x10b9}, 0x03f0, 7, {0xffff0000}, "BRQ Live Requests (lower 3 bits)"}, #define PME_MONT_ER_BRQ_REQ_INSERTED 243 { "ER_BRQ_REQ_INSERTED", {0x8ba}, 0x03f0, 1, {0xffff0000}, "BRQ Requests Inserted"}, #define PME_MONT_ER_MEM_READ_OUT_HI 244 { "ER_MEM_READ_OUT_HI", {0x8b4}, 0x03f0, 2, {0xffff0000}, "Outstanding Memory Read Transactions (upper 2 bits)"}, #define PME_MONT_ER_MEM_READ_OUT_LO 245 { "ER_MEM_READ_OUT_LO", {0x8b5}, 0x03f0, 7, {0xffff0000}, "Outstanding Memory Read Transactions (lower 3 bits)"}, #define PME_MONT_ER_REJECT_ALL_L1D_REQ 246 { "ER_REJECT_ALL_L1D_REQ", {0x10bd}, 0x03f0, 1, {0xffff0000}, "Reject All L1D Requests"}, #define PME_MONT_ER_REJECT_ALL_L1I_REQ 247 { "ER_REJECT_ALL_L1I_REQ", {0x10be}, 0x03f0, 1, {0xffff0000}, "Reject All L1I Requests"}, #define PME_MONT_ER_REJECT_ALL_L1_REQ 248 { "ER_REJECT_ALL_L1_REQ", {0x10bc}, 0x03f0, 1, {0xffff0000}, "Reject All L1 Requests"}, #define PME_MONT_ER_SNOOPQ_REQ_HI 249 { "ER_SNOOPQ_REQ_HI", {0x10b6}, 0x03f0, 2, {0xffff0000}, "Outstanding Snoops (upper bit)"}, #define PME_MONT_ER_SNOOPQ_REQ_LO 250 { "ER_SNOOPQ_REQ_LO", {0x10b7}, 0x03f0, 7, {0xffff0000}, "Outstanding Snoops (lower 3 bits)"}, #define PME_MONT_ETB_EVENT 251 { "ETB_EVENT", {0x111}, 0xfff0, 1, {0xffff0003}, "Execution Trace Buffer Event Captured"}, #define PME_MONT_FE_BUBBLE_ALL 252 { "FE_BUBBLE_ALL", {0x71}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- count regardless of cause"}, #define PME_MONT_FE_BUBBLE_ALLBUT_FEFLUSH_BUBBLE 253 { "FE_BUBBLE_ALLBUT_FEFLUSH_BUBBLE", {0xb0071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- ALL except FEFLUSH and BUBBLE"}, #define PME_MONT_FE_BUBBLE_ALLBUT_IBFULL 254 { "FE_BUBBLE_ALLBUT_IBFULL", {0xc0071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- ALL except IBFULl"}, #define PME_MONT_FE_BUBBLE_BRANCH 255 { "FE_BUBBLE_BRANCH", {0x90071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by any of 4 branch recirculates"}, #define PME_MONT_FE_BUBBLE_BUBBLE 256 { "FE_BUBBLE_BUBBLE", {0xd0071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by branch bubble stall"}, #define PME_MONT_FE_BUBBLE_FEFLUSH 257 { "FE_BUBBLE_FEFLUSH", {0x10071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by a front-end flush"}, #define PME_MONT_FE_BUBBLE_FILL_RECIRC 258 { "FE_BUBBLE_FILL_RECIRC", {0x80071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by a recirculate for a cache line fill operation"}, #define PME_MONT_FE_BUBBLE_GROUP1 259 { "FE_BUBBLE_GROUP1", {0x30071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- BUBBLE or BRANCH"}, #define PME_MONT_FE_BUBBLE_GROUP2 260 { "FE_BUBBLE_GROUP2", {0x40071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- IMISS or TLBMISS"}, #define PME_MONT_FE_BUBBLE_GROUP3 261 { "FE_BUBBLE_GROUP3", {0xa0071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- FILL_RECIRC or BRANCH"}, #define PME_MONT_FE_BUBBLE_IBFULL 262 { "FE_BUBBLE_IBFULL", {0x50071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by instruction buffer full stall"}, #define PME_MONT_FE_BUBBLE_IMISS 263 { "FE_BUBBLE_IMISS", {0x60071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by instruction cache miss stall"}, #define PME_MONT_FE_BUBBLE_TLBMISS 264 { "FE_BUBBLE_TLBMISS", {0x70071}, 0xfff0, 1, {0xffff0000}, "Bubbles Seen by FE -- only if caused by TLB stall"}, #define PME_MONT_FE_LOST_BW_ALL 265 { "FE_LOST_BW_ALL", {0x70}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- count regardless of cause"}, #define PME_MONT_FE_LOST_BW_BI 266 { "FE_LOST_BW_BI", {0x90070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch initialization stall"}, #define PME_MONT_FE_LOST_BW_BRQ 267 { "FE_LOST_BW_BRQ", {0xa0070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch retirement queue stall"}, #define PME_MONT_FE_LOST_BW_BR_ILOCK 268 { "FE_LOST_BW_BR_ILOCK", {0xc0070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch interlock stall"}, #define PME_MONT_FE_LOST_BW_BUBBLE 269 { "FE_LOST_BW_BUBBLE", {0xd0070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch resteer bubble stall"}, #define PME_MONT_FE_LOST_BW_FEFLUSH 270 { "FE_LOST_BW_FEFLUSH", {0x10070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by a front-end flush"}, #define PME_MONT_FE_LOST_BW_FILL_RECIRC 271 { "FE_LOST_BW_FILL_RECIRC", {0x80070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by a recirculate for a cache line fill operation"}, #define PME_MONT_FE_LOST_BW_IBFULL 272 { "FE_LOST_BW_IBFULL", {0x50070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by instruction buffer full stall"}, #define PME_MONT_FE_LOST_BW_IMISS 273 { "FE_LOST_BW_IMISS", {0x60070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by instruction cache miss stall"}, #define PME_MONT_FE_LOST_BW_PLP 274 { "FE_LOST_BW_PLP", {0xb0070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by perfect loop prediction stall"}, #define PME_MONT_FE_LOST_BW_TLBMISS 275 { "FE_LOST_BW_TLBMISS", {0x70070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by TLB stall"}, #define PME_MONT_FE_LOST_BW_UNREACHED 276 { "FE_LOST_BW_UNREACHED", {0x40070}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Entrance to IB -- only if caused by unreachable bundle"}, #define PME_MONT_FP_FAILED_FCHKF 277 { "FP_FAILED_FCHKF", {0x6}, 0xfff0, 1, {0xffff0001}, "Failed fchkf"}, #define PME_MONT_FP_FALSE_SIRSTALL 278 { "FP_FALSE_SIRSTALL", {0x5}, 0xfff0, 1, {0xffff0001}, "SIR Stall Without a Trap"}, #define PME_MONT_FP_FLUSH_TO_ZERO_FTZ_POSS 279 { "FP_FLUSH_TO_ZERO_FTZ_POSS", {0x1000b}, 0xfff0, 2, {0xffff0001}, "FP Result Flushed to Zero -- "}, #define PME_MONT_FP_FLUSH_TO_ZERO_FTZ_REAL 280 { "FP_FLUSH_TO_ZERO_FTZ_REAL", {0xb}, 0xfff0, 2, {0xffff0001}, "FP Result Flushed to Zero -- Times FTZ"}, #define PME_MONT_FP_OPS_RETIRED 281 { "FP_OPS_RETIRED", {0x9}, 0xfff0, 6, {0xffff0001}, "Retired FP Operations"}, #define PME_MONT_FP_TRUE_SIRSTALL 282 { "FP_TRUE_SIRSTALL", {0x3}, 0xfff0, 1, {0xffff0001}, "SIR stall asserted and leads to a trap"}, #define PME_MONT_HPW_DATA_REFERENCES 283 { "HPW_DATA_REFERENCES", {0x2d}, 0xfff0, 4, {0xffff0000}, "Data Memory References to VHPT"}, #define PME_MONT_IA64_INST_RETIRED_THIS 284 { "IA64_INST_RETIRED_THIS", {0x8}, 0xfff0, 6, {0xffff0003}, "Retired IA-64 Instructions -- Retired IA-64 Instructions"}, #define PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP0_PMC32_33 285 { "IA64_TAGGED_INST_RETIRED_IBRP0_PMC32_33", {0x8}, 0xfff0, 6, {0xffff0003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 0 and the opcode matcher pair PMC32 and PMC33."}, #define PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP1_PMC34_35 286 { "IA64_TAGGED_INST_RETIRED_IBRP1_PMC34_35", {0x10008}, 0xfff0, 6, {0xffff0003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 1 and the opcode matcher pair PMC34 and PMC35."}, #define PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP2_PMC32_33 287 { "IA64_TAGGED_INST_RETIRED_IBRP2_PMC32_33", {0x20008}, 0xfff0, 6, {0xffff0003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 2 and the opcode matcher pair PMC32 and PMC33."}, #define PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP3_PMC34_35 288 { "IA64_TAGGED_INST_RETIRED_IBRP3_PMC34_35", {0x30008}, 0xfff0, 6, {0xffff0003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 3 and the opcode matcher pair PMC34 and PMC35."}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_ALL 289 { "IDEAL_BE_LOST_BW_DUE_TO_FE_ALL", {0x73}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- count regardless of cause"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_BI 290 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BI", {0x90073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by branch initialization stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_BRQ 291 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BRQ", {0xa0073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by branch retirement queue stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_BR_ILOCK 292 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BR_ILOCK", {0xc0073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by branch interlock stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_BUBBLE 293 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BUBBLE", {0xd0073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by branch resteer bubble stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_FEFLUSH 294 { "IDEAL_BE_LOST_BW_DUE_TO_FE_FEFLUSH", {0x10073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by a front-end flush"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC 295 { "IDEAL_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC", {0x80073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by a recirculate for a cache line fill operation"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_IBFULL 296 { "IDEAL_BE_LOST_BW_DUE_TO_FE_IBFULL", {0x50073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- (* meaningless for this event *)"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_IMISS 297 { "IDEAL_BE_LOST_BW_DUE_TO_FE_IMISS", {0x60073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by instruction cache miss stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_PLP 298 { "IDEAL_BE_LOST_BW_DUE_TO_FE_PLP", {0xb0073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by perfect loop prediction stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_TLBMISS 299 { "IDEAL_BE_LOST_BW_DUE_TO_FE_TLBMISS", {0x70073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by TLB stall"}, #define PME_MONT_IDEAL_BE_LOST_BW_DUE_TO_FE_UNREACHED 300 { "IDEAL_BE_LOST_BW_DUE_TO_FE_UNREACHED", {0x40073}, 0xfff0, 2, {0xffff0000}, "Invalid Bundles at the Exit from IB -- only if caused by unreachable bundle"}, #define PME_MONT_INST_CHKA_LDC_ALAT_ALL 301 { "INST_CHKA_LDC_ALAT_ALL", {0x30056}, 0xfff0, 2, {0xffff0007}, "Retired chk.a and ld.c Instructions -- both integer and floating point instructions"}, #define PME_MONT_INST_CHKA_LDC_ALAT_FP 302 { "INST_CHKA_LDC_ALAT_FP", {0x20056}, 0xfff0, 2, {0xffff0007}, "Retired chk.a and ld.c Instructions -- only floating point instructions"}, #define PME_MONT_INST_CHKA_LDC_ALAT_INT 303 { "INST_CHKA_LDC_ALAT_INT", {0x10056}, 0xfff0, 2, {0xffff0007}, "Retired chk.a and ld.c Instructions -- only integer instructions"}, #define PME_MONT_INST_DISPERSED 304 { "INST_DISPERSED", {0x4d}, 0xfff0, 6, {0xffff0001}, "Syllables Dispersed from REN to REG stage"}, #define PME_MONT_INST_FAILED_CHKA_LDC_ALAT_ALL 305 { "INST_FAILED_CHKA_LDC_ALAT_ALL", {0x30057}, 0xfff0, 1, {0xffff0007}, "Failed chk.a and ld.c Instructions -- both integer and floating point instructions"}, #define PME_MONT_INST_FAILED_CHKA_LDC_ALAT_FP 306 { "INST_FAILED_CHKA_LDC_ALAT_FP", {0x20057}, 0xfff0, 1, {0xffff0007}, "Failed chk.a and ld.c Instructions -- only floating point instructions"}, #define PME_MONT_INST_FAILED_CHKA_LDC_ALAT_INT 307 { "INST_FAILED_CHKA_LDC_ALAT_INT", {0x10057}, 0xfff0, 1, {0xffff0007}, "Failed chk.a and ld.c Instructions -- only integer instructions"}, #define PME_MONT_INST_FAILED_CHKS_RETIRED_ALL 308 { "INST_FAILED_CHKS_RETIRED_ALL", {0x30055}, 0xfff0, 1, {0xffff0000}, "Failed chk.s Instructions -- both integer and floating point instructions"}, #define PME_MONT_INST_FAILED_CHKS_RETIRED_FP 309 { "INST_FAILED_CHKS_RETIRED_FP", {0x20055}, 0xfff0, 1, {0xffff0000}, "Failed chk.s Instructions -- only floating point instructions"}, #define PME_MONT_INST_FAILED_CHKS_RETIRED_INT 310 { "INST_FAILED_CHKS_RETIRED_INT", {0x10055}, 0xfff0, 1, {0xffff0000}, "Failed chk.s Instructions -- only integer instructions"}, #define PME_MONT_ISB_BUNPAIRS_IN 311 { "ISB_BUNPAIRS_IN", {0x46}, 0xfff0, 1, {0xffff0001}, "Bundle Pairs Written from L2I into FE"}, #define PME_MONT_ITLB_MISSES_FETCH_ALL 312 { "ITLB_MISSES_FETCH_ALL", {0x30047}, 0xfff0, 1, {0xffff0001}, "ITLB Misses Demand Fetch -- All tlb misses will be counted. Note that this is not equal to sum of the L1ITLB and L2ITLB umasks because any access could be a miss in L1ITLB and L2ITLB."}, #define PME_MONT_ITLB_MISSES_FETCH_L1ITLB 313 { "ITLB_MISSES_FETCH_L1ITLB", {0x10047}, 0xfff0, 1, {0xffff0001}, "ITLB Misses Demand Fetch -- All misses in L1ITLB will be counted. even if L1ITLB is not updated for an access (Uncacheable/nat page/not present page/faulting/some flushed), it will be counted here."}, #define PME_MONT_ITLB_MISSES_FETCH_L2ITLB 314 { "ITLB_MISSES_FETCH_L2ITLB", {0x20047}, 0xfff0, 1, {0xffff0001}, "ITLB Misses Demand Fetch -- All misses in L1ITLB which also missed in L2ITLB will be counted."}, #define PME_MONT_L1DTLB_TRANSFER 315 { "L1DTLB_TRANSFER", {0xc0}, 0xfff0, 1, {0x5010007}, "L1DTLB Misses That Hit in the L2DTLB for Accesses Counted in L1D_READS"}, #define PME_MONT_L1D_READS_SET0 316 { "L1D_READS_SET0", {0xc2}, 0xfff0, 2, {0x5010007}, "L1 Data Cache Reads"}, #define PME_MONT_L1D_READS_SET1 317 { "L1D_READS_SET1", {0xc4}, 0xfff0, 2, {0x5110007}, "L1 Data Cache Reads"}, #define PME_MONT_L1D_READ_MISSES_ALL 318 { "L1D_READ_MISSES_ALL", {0xc7}, 0xfff0, 2, {0x5110007}, "L1 Data Cache Read Misses -- all L1D read misses will be counted."}, #define PME_MONT_L1D_READ_MISSES_RSE_FILL 319 { "L1D_READ_MISSES_RSE_FILL", {0x100c7}, 0xfff0, 2, {0x5110007}, "L1 Data Cache Read Misses -- only L1D read misses caused by RSE fills will be counted"}, #define PME_MONT_L1ITLB_INSERTS_HPW 320 { "L1ITLB_INSERTS_HPW", {0x48}, 0xfff0, 1, {0xffff0001}, "L1ITLB Hardware Page Walker Inserts"}, #define PME_MONT_L1I_EAR_CACHE_LAT0 321 { "L1I_EAR_CACHE_LAT0", {0x400b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- > 0 Cycles (All L1 Misses)"}, #define PME_MONT_L1I_EAR_CACHE_LAT1024 322 { "L1I_EAR_CACHE_LAT1024", {0xc00b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 1024 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT128 323 { "L1I_EAR_CACHE_LAT128", {0xf00b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 128 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT16 324 { "L1I_EAR_CACHE_LAT16", {0xfc0b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 16 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT256 325 { "L1I_EAR_CACHE_LAT256", {0xe00b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 256 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT32 326 { "L1I_EAR_CACHE_LAT32", {0xf80b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 32 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT4 327 { "L1I_EAR_CACHE_LAT4", {0xff0b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 4 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT4096 328 { "L1I_EAR_CACHE_LAT4096", {0x800b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 4096 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_LAT8 329 { "L1I_EAR_CACHE_LAT8", {0xfe0b43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- >= 8 Cycles"}, #define PME_MONT_L1I_EAR_CACHE_RAB 330 { "L1I_EAR_CACHE_RAB", {0xb43}, 0xfff0, 1, {0xffff0001}, "L1I EAR Cache -- RAB HIT"}, #define PME_MONT_L1I_EAR_EVENTS 331 { "L1I_EAR_EVENTS", {0x843}, 0xfff0, 1, {0xffff0001}, "Instruction EAR Events"}, #define PME_MONT_L1I_EAR_TLB_ALL 332 { "L1I_EAR_TLB_ALL", {0x70a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- All L1 ITLB Misses"}, #define PME_MONT_L1I_EAR_TLB_FAULT 333 { "L1I_EAR_TLB_FAULT", {0x40a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- ITLB Misses which produced a fault"}, #define PME_MONT_L1I_EAR_TLB_L2TLB 334 { "L1I_EAR_TLB_L2TLB", {0x10a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB"}, #define PME_MONT_L1I_EAR_TLB_L2TLB_OR_FAULT 335 { "L1I_EAR_TLB_L2TLB_OR_FAULT", {0x50a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB or produce a software fault"}, #define PME_MONT_L1I_EAR_TLB_L2TLB_OR_VHPT 336 { "L1I_EAR_TLB_L2TLB_OR_VHPT", {0x30a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB or VHPT"}, #define PME_MONT_L1I_EAR_TLB_VHPT 337 { "L1I_EAR_TLB_VHPT", {0x20a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- L1 ITLB Misses which hit VHPT"}, #define PME_MONT_L1I_EAR_TLB_VHPT_OR_FAULT 338 { "L1I_EAR_TLB_VHPT_OR_FAULT", {0x60a43}, 0xfff0, 1, {0xffff0001}, "L1I EAR TLB -- L1 ITLB Misses which hit VHPT or produce a software fault"}, #define PME_MONT_L1I_FETCH_ISB_HIT 339 { "L1I_FETCH_ISB_HIT", {0x66}, 0xfff0, 1, {0xffff0001}, "\"Just-In-Time\" Instruction Fetch Hitting in and Being Bypassed from ISB"}, #define PME_MONT_L1I_FETCH_RAB_HIT 340 { "L1I_FETCH_RAB_HIT", {0x65}, 0xfff0, 1, {0xffff0001}, "Instruction Fetch Hitting in RAB"}, #define PME_MONT_L1I_FILLS 341 { "L1I_FILLS", {0x841}, 0xfff0, 1, {0xffff0001}, "L1 Instruction Cache Fills"}, #define PME_MONT_L1I_PREFETCHES 342 { "L1I_PREFETCHES", {0x44}, 0xfff0, 1, {0xffff0001}, "L1 Instruction Prefetch Requests"}, #define PME_MONT_L1I_PREFETCH_STALL_ALL 343 { "L1I_PREFETCH_STALL_ALL", {0x30067}, 0xfff0, 1, {0xffff0000}, "Prefetch Pipeline Stalls -- Number of clocks prefetch pipeline is stalled"}, #define PME_MONT_L1I_PREFETCH_STALL_FLOW 344 { "L1I_PREFETCH_STALL_FLOW", {0x20067}, 0xfff0, 1, {0xffff0000}, "Prefetch Pipeline Stalls -- Asserted when the streaming prefetcher is working close to the instructions being fetched for demand reads, and is not asserted when the streaming prefetcher is ranging way ahead of the demand reads."}, #define PME_MONT_L1I_PURGE 345 { "L1I_PURGE", {0x104b}, 0xfff0, 1, {0xffff0001}, "L1ITLB Purges Handled by L1I"}, #define PME_MONT_L1I_PVAB_OVERFLOW 346 { "L1I_PVAB_OVERFLOW", {0x69}, 0xfff0, 1, {0xffff0000}, "PVAB Overflow"}, #define PME_MONT_L1I_RAB_ALMOST_FULL 347 { "L1I_RAB_ALMOST_FULL", {0x1064}, 0xfff0, 1, {0xffff0000}, "Is RAB Almost Full?"}, #define PME_MONT_L1I_RAB_FULL 348 { "L1I_RAB_FULL", {0x1060}, 0xfff0, 1, {0xffff0000}, "Is RAB Full?"}, #define PME_MONT_L1I_READS 349 { "L1I_READS", {0x40}, 0xfff0, 1, {0xffff0001}, "L1 Instruction Cache Reads"}, #define PME_MONT_L1I_SNOOP 350 { "L1I_SNOOP", {0x104a}, 0xfff0, 1, {0xffff0007}, "Snoop Requests Handled by L1I"}, #define PME_MONT_L1I_STRM_PREFETCHES 351 { "L1I_STRM_PREFETCHES", {0x5f}, 0xfff0, 1, {0xffff0001}, "L1 Instruction Cache Line Prefetch Requests"}, #define PME_MONT_L2DTLB_MISSES 352 { "L2DTLB_MISSES", {0xc1}, 0xfff0, 4, {0x5010007}, "L2DTLB Misses"}, #define PME_MONT_L2D_BAD_LINES_SELECTED_ANY 353 { "L2D_BAD_LINES_SELECTED_ANY", {0x8ec}, 0xfff0, 4, {0x4520007}, "Valid Line Replaced When Invalid Line Is Available -- Valid line replaced when invalid line is available"}, #define PME_MONT_L2D_BYPASS_L2_DATA1 354 { "L2D_BYPASS_L2_DATA1", {0x8e4}, 0xfff0, 1, {0x4120007}, "Count L2D Bypasses -- Count only L2D data bypasses (L1D to L2A)"}, #define PME_MONT_L2D_BYPASS_L2_DATA2 355 { "L2D_BYPASS_L2_DATA2", {0x108e4}, 0xfff0, 1, {0x4120007}, "Count L2D Bypasses -- Count only L2D data bypasses (L1W to L2I)"}, #define PME_MONT_L2D_BYPASS_L3_DATA1 356 { "L2D_BYPASS_L3_DATA1", {0x208e4}, 0xfff0, 1, {0x4120007}, "Count L2D Bypasses -- Count only L3 data bypasses (L1D to L2A)"}, #define PME_MONT_L2D_FILLB_FULL_THIS 357 { "L2D_FILLB_FULL_THIS", {0x8f1}, 0xfff0, 1, {0x4720000}, "L2D Fill Buffer Is Full -- L2D Fill buffer is full"}, #define PME_MONT_L2D_FILL_MESI_STATE_E 358 { "L2D_FILL_MESI_STATE_E", {0x108f2}, 0xfff0, 1, {0x4820000}, "L2D Cache Fills with MESI state -- "}, #define PME_MONT_L2D_FILL_MESI_STATE_I 359 { "L2D_FILL_MESI_STATE_I", {0x308f2}, 0xfff0, 1, {0x4820000}, "L2D Cache Fills with MESI state -- "}, #define PME_MONT_L2D_FILL_MESI_STATE_M 360 { "L2D_FILL_MESI_STATE_M", {0x8f2}, 0xfff0, 1, {0x4820000}, "L2D Cache Fills with MESI state -- "}, #define PME_MONT_L2D_FILL_MESI_STATE_P 361 { "L2D_FILL_MESI_STATE_P", {0x408f2}, 0xfff0, 1, {0x4820000}, "L2D Cache Fills with MESI state -- "}, #define PME_MONT_L2D_FILL_MESI_STATE_S 362 { "L2D_FILL_MESI_STATE_S", {0x208f2}, 0xfff0, 1, {0x4820000}, "L2D Cache Fills with MESI state -- "}, #define PME_MONT_L2D_FORCE_RECIRC_FILL_HIT 363 { "L2D_FORCE_RECIRC_FILL_HIT", {0x808ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count only those caused by an L2D miss which hit in the fill buffer."}, #define PME_MONT_L2D_FORCE_RECIRC_FRC_RECIRC 364 { "L2D_FORCE_RECIRC_FRC_RECIRC", {0x908ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Caused by an L2D miss when a force recirculate already existed in the Ozq."}, #define PME_MONT_L2D_FORCE_RECIRC_L1W 365 { "L2D_FORCE_RECIRC_L1W", {0xc08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count only those caused by a L2D miss one cycle ahead of the current op."}, #define PME_MONT_L2D_FORCE_RECIRC_LIMBO 366 { "L2D_FORCE_RECIRC_LIMBO", {0x108ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count operations that went into the LIMBO Ozq state. This state is entered when the the op sees a FILL_HIT or OZQ_MISS event."}, #define PME_MONT_L2D_FORCE_RECIRC_OZQ_MISS 367 { "L2D_FORCE_RECIRC_OZQ_MISS", {0xb08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Caused by an L2D miss when an L2D miss was already in the OZQ."}, #define PME_MONT_L2D_FORCE_RECIRC_RECIRC 368 { "L2D_FORCE_RECIRC_RECIRC", {0x8ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Counts inserts into OzQ due to a recirculate. The recirculate due to secondary misses or various other conflicts"}, #define PME_MONT_L2D_FORCE_RECIRC_SAME_INDEX 369 { "L2D_FORCE_RECIRC_SAME_INDEX", {0xa08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Caused by an L2D miss when a miss to the same index was in the same issue group."}, #define PME_MONT_L2D_FORCE_RECIRC_SECONDARY_ALL 370 { "L2D_FORCE_RECIRC_SECONDARY_ALL", {0xf08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- CSaused by any L2D op that saw a miss to the same address in OZQ, L2 fill buffer, or one cycle ahead in the main pipeline."}, #define PME_MONT_L2D_FORCE_RECIRC_SECONDARY_READ 371 { "L2D_FORCE_RECIRC_SECONDARY_READ", {0xd08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Caused by L2D read op that saw a miss to the same address in OZQ, L2 fill buffer, or one cycle ahead in the main pipeline."}, #define PME_MONT_L2D_FORCE_RECIRC_SECONDARY_WRITE 372 { "L2D_FORCE_RECIRC_SECONDARY_WRITE", {0xe08ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Caused by L2D write op that saw a miss to the same address in OZQ, L2 fill buffer, or one cycle ahead in the main pipeline."}, #define PME_MONT_L2D_FORCE_RECIRC_SNP_OR_L3 373 { "L2D_FORCE_RECIRC_SNP_OR_L3", {0x608ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count only those caused by a snoop or L3 issue."}, #define PME_MONT_L2D_FORCE_RECIRC_TAG_NOTOK 374 { "L2D_FORCE_RECIRC_TAG_NOTOK", {0x408ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count only those caused by L2D hits caused by in flight snoops, stores with a sibling miss to the same index, sibling probe to the same line or a pending mf.a instruction. This count can usually be ignored since its events are rare, unpredictable, and/or show up in one of the other events."}, #define PME_MONT_L2D_FORCE_RECIRC_TAG_OK 375 { "L2D_FORCE_RECIRC_TAG_OK", {0x708ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count operations that inserted to Ozq as a hit. Thus it was NOT forced to recirculate. Likely identical to L2D_INSERT_HITS."}, #define PME_MONT_L2D_FORCE_RECIRC_TRAN_PREF 376 { "L2D_FORCE_RECIRC_TRAN_PREF", {0x508ea}, 0xfff0, 4, {0x4420007}, "Forced Recirculates -- Count only those caused by L2D miss requests that transformed to prefetches"}, #define PME_MONT_L2D_INSERT_HITS 377 { "L2D_INSERT_HITS", {0x8b1}, 0xfff0, 4, {0xffff0007}, "Count Number of Times an Inserting Data Request Hit in the L2D."}, #define PME_MONT_L2D_INSERT_MISSES 378 { "L2D_INSERT_MISSES", {0x8b0}, 0xfff0, 4, {0xffff0007}, "Count Number of Times an Inserting Data Request Missed the L2D."}, #define PME_MONT_L2D_ISSUED_RECIRC_OZQ_ACC 379 { "L2D_ISSUED_RECIRC_OZQ_ACC", {0x8eb}, 0xfff0, 1, {0x4420007}, "Count Number of Times a Recirculate Issue Was Attempted and Not Preempted"}, #define PME_MONT_L2D_L3ACCESS_CANCEL_ANY 380 { "L2D_L3ACCESS_CANCEL_ANY", {0x208e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- count cancels due to any reason. This umask will count more than the sum of all the other umasks. It will count things that weren't committed accesses when they reached L1w, but the L2D attempted to bypass them to the L3 anyway (speculatively). This will include accesses made repeatedly while the main pipeline is stalled and the L1D is attempting to recirculate an access down the L1D pipeline. Thus, an access could get counted many times before it really does get bypassed to the L3. It is a measure of how many times we asserted a request to the L3 but didn't confirm it."}, #define PME_MONT_L2D_L3ACCESS_CANCEL_ER_REJECT 381 { "L2D_L3ACCESS_CANCEL_ER_REJECT", {0x308e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- Count only requests that were rejected by ER"}, #define PME_MONT_L2D_L3ACCESS_CANCEL_INV_L3_BYP 382 { "L2D_L3ACCESS_CANCEL_INV_L3_BYP", {0x8e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- L2D cancelled a bypass because it did not commit, or was not a valid opcode to bypass, or was not a true miss of L2D (either hit,recirc,or limbo)."}, #define PME_MONT_L2D_L3ACCESS_CANCEL_P2_COV_SNP_FILL_NOSNP 383 { "L2D_L3ACCESS_CANCEL_P2_COV_SNP_FILL_NOSNP", {0x608e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- A snoop and a fill to the same address reached the L2D within a 3 cycle window of each other or a snoop hit a nosnoops entry in Ozq."}, #define PME_MONT_L2D_L3ACCESS_CANCEL_P2_COV_SNP_TEM 384 { "L2D_L3ACCESS_CANCEL_P2_COV_SNP_TEM", {0x408e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- A snoop saw an L2D tag error and missed/"}, #define PME_MONT_L2D_L3ACCESS_CANCEL_P2_COV_SNP_VIC 385 { "L2D_L3ACCESS_CANCEL_P2_COV_SNP_VIC", {0x508e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- A snoop hit in the L1D victim buffer"}, #define PME_MONT_L2D_L3ACCESS_CANCEL_SPEC_L3_BYP 386 { "L2D_L3ACCESS_CANCEL_SPEC_L3_BYP", {0x108e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- L2D cancelled speculative L3 bypasses because it was not a WB memory attribute or it was an effective release."}, #define PME_MONT_L2D_L3ACCESS_CANCEL_TAIL_TRANS_DIS 387 { "L2D_L3ACCESS_CANCEL_TAIL_TRANS_DIS", {0x708e8}, 0xfff0, 1, {0x4320007}, "L2D Access Cancelled by L2D -- Count the number of cycles that either transform to prefetches or Ozq tail collapse have been dynamically disabled. This would indicate that memory contention has lead the L2D to throttle request to prevent livelock scenarios."}, #define PME_MONT_L2D_MISSES 388 { "L2D_MISSES", {0x8cb}, 0xfff0, 1, {0xffff0007}, "L2 Misses"}, #define PME_MONT_L2D_OPS_ISSUED_FP_LOAD 389 { "L2D_OPS_ISSUED_FP_LOAD", {0x108f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only valid floating-point loads"}, #define PME_MONT_L2D_OPS_ISSUED_INT_LOAD 390 { "L2D_OPS_ISSUED_INT_LOAD", {0x8f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only valid integer loads, including ld16."}, #define PME_MONT_L2D_OPS_ISSUED_LFETCH 391 { "L2D_OPS_ISSUED_LFETCH", {0x408f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only lfetch operations."}, #define PME_MONT_L2D_OPS_ISSUED_OTHER 392 { "L2D_OPS_ISSUED_OTHER", {0x508f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only valid non-load, no-store accesses that are not in any of the above sections."}, #define PME_MONT_L2D_OPS_ISSUED_RMW 393 { "L2D_OPS_ISSUED_RMW", {0x208f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only valid read_modify_write stores and semaphores including cmp8xchg16."}, #define PME_MONT_L2D_OPS_ISSUED_STORE 394 { "L2D_OPS_ISSUED_STORE", {0x308f0}, 0xfff0, 4, {0xffff0007}, "Operations Issued By L2D -- Count only valid non-read_modify_write stores, including st16."}, #define PME_MONT_L2D_OZDB_FULL_THIS 395 { "L2D_OZDB_FULL_THIS", {0x8e9}, 0xfff0, 1, {0x4320000}, "L2D OZ Data Buffer Is Full -- L2 OZ Data Buffer is full"}, #define PME_MONT_L2D_OZQ_ACQUIRE 396 { "L2D_OZQ_ACQUIRE", {0x8ef}, 0xfff0, 1, {0x4620000}, "Acquire Ordering Attribute Exists in L2D OZQ"}, #define PME_MONT_L2D_OZQ_CANCELS0_ACQ 397 { "L2D_OZQ_CANCELS0_ACQ", {0x608e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- caused by an acquire somewhere in Ozq or ER."}, #define PME_MONT_L2D_OZQ_CANCELS0_BANK_CONF 398 { "L2D_OZQ_CANCELS0_BANK_CONF", {0x808e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a bypassed L2D hit operation had a bank conflict with an older sibling bypass or an older operation in the L2D pipeline."}, #define PME_MONT_L2D_OZQ_CANCELS0_CANC_L2M_TO_L2C_ST 399 { "L2D_OZQ_CANCELS0_CANC_L2M_TO_L2C_ST", {0x108e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- caused by a canceled store in L2M,L2D or L2C. This is the combination of following subevents that were available separately in Itanium2: CANC_L2M_ST=caused by canceled store in L2M, CANC_L2D_ST=caused by canceled store in L2D, CANC_L2C_ST=caused by canceled store in L2C"}, #define PME_MONT_L2D_OZQ_CANCELS0_FILL_ST_CONF 400 { "L2D_OZQ_CANCELS0_FILL_ST_CONF", {0xe08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- an OZQ store conflicted with a returning L2D fill"}, #define PME_MONT_L2D_OZQ_CANCELS0_L2A_ST_MAT 401 { "L2D_OZQ_CANCELS0_L2A_ST_MAT", {0x208e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- canceled due to an uncanceled store match in L2A"}, #define PME_MONT_L2D_OZQ_CANCELS0_L2C_ST_MAT 402 { "L2D_OZQ_CANCELS0_L2C_ST_MAT", {0x508e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- canceled due to an uncanceled store match in L2C"}, #define PME_MONT_L2D_OZQ_CANCELS0_L2D_ST_MAT 403 { "L2D_OZQ_CANCELS0_L2D_ST_MAT", {0x408e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- canceled due to an uncanceled store match in L2D"}, #define PME_MONT_L2D_OZQ_CANCELS0_L2M_ST_MAT 404 { "L2D_OZQ_CANCELS0_L2M_ST_MAT", {0x308e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- canceled due to an uncanceled store match in L2M"}, #define PME_MONT_L2D_OZQ_CANCELS0_MISC_ORDER 405 { "L2D_OZQ_CANCELS0_MISC_ORDER", {0xd08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a sync.i or mf.a . This is the combination of following subevents that were available separately in Itanium2: SYNC=caused by sync.i, MFA=a memory fence instruction"}, #define PME_MONT_L2D_OZQ_CANCELS0_OVER_SUB 406 { "L2D_OZQ_CANCELS0_OVER_SUB", {0xa08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a high Ozq issue rate resulted in the L2D having to cancel due to hardware restrictions. This is the combination of following subevents that were available separately in Itanium2: OVER_SUB=oversubscription, L1DF_L2M=L1D fill in L2M"}, #define PME_MONT_L2D_OZQ_CANCELS0_OZDATA_CONF 407 { "L2D_OZQ_CANCELS0_OZDATA_CONF", {0xf08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- an OZQ operation that needed to read the OZQ data buffer conflicted with a fill return that needed to do the same."}, #define PME_MONT_L2D_OZQ_CANCELS0_OZQ_PREEMPT 408 { "L2D_OZQ_CANCELS0_OZQ_PREEMPT", {0xb08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- an L2D fill return conflicted with, and cancelled, an ozq request for various reasons. Formerly known as L1_FILL_CONF."}, #define PME_MONT_L2D_OZQ_CANCELS0_RECIRC 409 { "L2D_OZQ_CANCELS0_RECIRC", {0x8e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a recirculate was cancelled due h/w limitations on recirculate issue rate. This is the combination of following subevents that were available separately in Itanium2: RECIRC_OVER_SUB=caused by a recirculate oversubscription, DIDNT_RECIRC=caused because it did not recirculate, WEIRD=counts the cancels caused by attempted 5-cycle bypasses for non-aligned accesses and bypasses blocking recirculates for too long"}, #define PME_MONT_L2D_OZQ_CANCELS0_REL 410 { "L2D_OZQ_CANCELS0_REL", {0x708e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a release was cancelled due to some other operation"}, #define PME_MONT_L2D_OZQ_CANCELS0_SEMA 411 { "L2D_OZQ_CANCELS0_SEMA", {0x908e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- a semaphore op was cancelled for various ordering or h/w restriction reasons. This is the combination of following subevents that were available separately in Itanium 2: SEM=a semaphore, CCV=a CCV"}, #define PME_MONT_L2D_OZQ_CANCELS0_WB_CONF 412 { "L2D_OZQ_CANCELS0_WB_CONF", {0xc08e0}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Specific Reason Set 0) -- an OZQ request conflicted with an L2D data array read for a writeback. This is the combination of following subevents that were available separately in Itanium2: READ_WB_CONF=a write back conflict, ST_FILL_CONF=a store fill conflict"}, #define PME_MONT_L2D_OZQ_CANCELS1_ANY 413 { "L2D_OZQ_CANCELS1_ANY", {0x8e2}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Late or Any) -- counts the total OZ Queue cancels"}, #define PME_MONT_L2D_OZQ_CANCELS1_LATE_BYP_EFFRELEASE 414 { "L2D_OZQ_CANCELS1_LATE_BYP_EFFRELEASE", {0x308e2}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Late or Any) -- counts the late cancels caused by L1D to L2A bypass effective releases"}, #define PME_MONT_L2D_OZQ_CANCELS1_LATE_SPEC_BYP 415 { "L2D_OZQ_CANCELS1_LATE_SPEC_BYP", {0x108e2}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Late or Any) -- counts the late cancels caused by speculative bypasses"}, #define PME_MONT_L2D_OZQ_CANCELS1_SIBLING_ACQ_REL 416 { "L2D_OZQ_CANCELS1_SIBLING_ACQ_REL", {0x208e2}, 0xfff0, 4, {0x4020007}, "L2D OZQ Cancels (Late or Any) -- counts the late cancels caused by releases and acquires in the same issue group. This is the combination of following subevents that were available separately in Itanium2: LATE_ACQUIRE=late cancels caused by acquires, LATE_RELEASE=late cancles caused by releases"}, #define PME_MONT_L2D_OZQ_FULL_THIS 417 { "L2D_OZQ_FULL_THIS", {0x8bc}, 0xfff0, 1, {0x4520000}, "L2D OZQ Is Full -- L2D OZQ is full"}, #define PME_MONT_L2D_OZQ_RELEASE 418 { "L2D_OZQ_RELEASE", {0x8e5}, 0xfff0, 1, {0x4120000}, "Release Ordering Attribute Exists in L2D OZQ"}, #define PME_MONT_L2D_REFERENCES_ALL 419 { "L2D_REFERENCES_ALL", {0x308e6}, 0xfff0, 4, {0x4220007}, "Data Read/Write Access to L2D -- count both read and write operations (semaphores will count as 2)"}, #define PME_MONT_L2D_REFERENCES_READS 420 { "L2D_REFERENCES_READS", {0x108e6}, 0xfff0, 4, {0x4220007}, "Data Read/Write Access to L2D -- count only data read and semaphore operations."}, #define PME_MONT_L2D_REFERENCES_WRITES 421 { "L2D_REFERENCES_WRITES", {0x208e6}, 0xfff0, 4, {0x4220007}, "Data Read/Write Access to L2D -- count only data write and semaphore operations"}, #define PME_MONT_L2D_STORE_HIT_SHARED_ANY 422 { "L2D_STORE_HIT_SHARED_ANY", {0x8ed}, 0xfff0, 2, {0x4520007}, "Store Hit a Shared Line -- Store hit a shared line"}, #define PME_MONT_L2D_VICTIMB_FULL_THIS 423 { "L2D_VICTIMB_FULL_THIS", {0x8f3}, 0xfff0, 1, {0x4820000}, "L2D Victim Buffer Is Full -- L2D victim buffer is full"}, #define PME_MONT_L2I_DEMAND_READS 424 { "L2I_DEMAND_READS", {0x42}, 0xfff0, 1, {0xffff0001}, "L2 Instruction Demand Fetch Requests"}, #define PME_MONT_L2I_HIT_CONFLICTS_ALL_ALL 425 { "L2I_HIT_CONFLICTS_ALL_ALL", {0xf087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- All fetches that reference L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_ALL_DMND 426 { "L2I_HIT_CONFLICTS_ALL_DMND", {0xd087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only demand fetches that reference L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_ALL_PFTCH 427 { "L2I_HIT_CONFLICTS_ALL_PFTCH", {0xe087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only prefetches that reference L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_HIT_ALL 428 { "L2I_HIT_CONFLICTS_HIT_ALL", {0x7087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- All fetches that hit in L2I counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_HIT_DMND 429 { "L2I_HIT_CONFLICTS_HIT_DMND", {0x5087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only demand fetches that hit in L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_HIT_PFTCH 430 { "L2I_HIT_CONFLICTS_HIT_PFTCH", {0x6087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only prefetches that hit in L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_MISS_ALL 431 { "L2I_HIT_CONFLICTS_MISS_ALL", {0xb087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- All fetches that miss in L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_MISS_DMND 432 { "L2I_HIT_CONFLICTS_MISS_DMND", {0x9087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only demand fetches that miss in L2I are counted"}, #define PME_MONT_L2I_HIT_CONFLICTS_MISS_PFTCH 433 { "L2I_HIT_CONFLICTS_MISS_PFTCH", {0xa087d}, 0xfff0, 1, {0xffff0001}, "L2I hit conflicts -- Only prefetches that miss in L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_ALL_ALL 434 { "L2I_L3_REJECTS_ALL_ALL", {0xf087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- All fetches that reference L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_ALL_DMND 435 { "L2I_L3_REJECTS_ALL_DMND", {0xd087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only demand fetches that reference L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_ALL_PFTCH 436 { "L2I_L3_REJECTS_ALL_PFTCH", {0xe087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only prefetches that reference L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_HIT_ALL 437 { "L2I_L3_REJECTS_HIT_ALL", {0x7087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- All fetches that hit in L2I counted"}, #define PME_MONT_L2I_L3_REJECTS_HIT_DMND 438 { "L2I_L3_REJECTS_HIT_DMND", {0x5087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only demand fetches that hit in L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_HIT_PFTCH 439 { "L2I_L3_REJECTS_HIT_PFTCH", {0x6087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only prefetches that hit in L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_MISS_ALL 440 { "L2I_L3_REJECTS_MISS_ALL", {0xb087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- All fetches that miss in L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_MISS_DMND 441 { "L2I_L3_REJECTS_MISS_DMND", {0x9087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only demand fetches that miss in L2I are counted"}, #define PME_MONT_L2I_L3_REJECTS_MISS_PFTCH 442 { "L2I_L3_REJECTS_MISS_PFTCH", {0xa087c}, 0xfff0, 1, {0xffff0001}, "L3 rejects -- Only prefetches that miss in L2I are counted"}, #define PME_MONT_L2I_PREFETCHES 443 { "L2I_PREFETCHES", {0x45}, 0xfff0, 1, {0xffff0001}, "L2 Instruction Prefetch Requests"}, #define PME_MONT_L2I_READS_ALL_ALL 444 { "L2I_READS_ALL_ALL", {0xf0878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- All fetches that reference L2I are counted"}, #define PME_MONT_L2I_READS_ALL_DMND 445 { "L2I_READS_ALL_DMND", {0xd0878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only demand fetches that reference L2I are counted"}, #define PME_MONT_L2I_READS_ALL_PFTCH 446 { "L2I_READS_ALL_PFTCH", {0xe0878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only prefetches that reference L2I are counted"}, #define PME_MONT_L2I_READS_HIT_ALL 447 { "L2I_READS_HIT_ALL", {0x70878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- All fetches that hit in L2I counted"}, #define PME_MONT_L2I_READS_HIT_DMND 448 { "L2I_READS_HIT_DMND", {0x50878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only demand fetches that hit in L2I are counted"}, #define PME_MONT_L2I_READS_HIT_PFTCH 449 { "L2I_READS_HIT_PFTCH", {0x60878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only prefetches that hit in L2I are counted"}, #define PME_MONT_L2I_READS_MISS_ALL 450 { "L2I_READS_MISS_ALL", {0xb0878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- All fetches that miss in L2I are counted"}, #define PME_MONT_L2I_READS_MISS_DMND 451 { "L2I_READS_MISS_DMND", {0x90878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only demand fetches that miss in L2I are counted"}, #define PME_MONT_L2I_READS_MISS_PFTCH 452 { "L2I_READS_MISS_PFTCH", {0xa0878}, 0xfff0, 1, {0xffff0001}, "L2I Cacheable Reads -- Only prefetches that miss in L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_ALL_ALL 453 { "L2I_RECIRCULATES_ALL_ALL", {0xf087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- All fetches that reference L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_ALL_DMND 454 { "L2I_RECIRCULATES_ALL_DMND", {0xd087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only demand fetches that reference L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_ALL_PFTCH 455 { "L2I_RECIRCULATES_ALL_PFTCH", {0xe087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only prefetches that reference L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_HIT_ALL 456 { "L2I_RECIRCULATES_HIT_ALL", {0x7087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- All fetches that hit in L2I counted"}, #define PME_MONT_L2I_RECIRCULATES_HIT_DMND 457 { "L2I_RECIRCULATES_HIT_DMND", {0x5087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only demand fetches that hit in L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_HIT_PFTCH 458 { "L2I_RECIRCULATES_HIT_PFTCH", {0x6087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only prefetches that hit in L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_MISS_ALL 459 { "L2I_RECIRCULATES_MISS_ALL", {0xb087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- All fetches that miss in L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_MISS_DMND 460 { "L2I_RECIRCULATES_MISS_DMND", {0x9087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only demand fetches that miss in L2I are counted"}, #define PME_MONT_L2I_RECIRCULATES_MISS_PFTCH 461 { "L2I_RECIRCULATES_MISS_PFTCH", {0xa087b}, 0xfff0, 1, {0xffff0001}, "L2I recirculates -- Only prefetches that miss in L2I are counted"}, #define PME_MONT_L2I_SNOOP_HITS 462 { "L2I_SNOOP_HITS", {0x107f}, 0xfff0, 1, {0xffff0000}, "L2I snoop hits"}, #define PME_MONT_L2I_SPEC_ABORTS 463 { "L2I_SPEC_ABORTS", {0x87e}, 0xfff0, 1, {0xffff0001}, "L2I speculative aborts"}, #define PME_MONT_L2I_UC_READS_ALL_ALL 464 { "L2I_UC_READS_ALL_ALL", {0xf0879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- All fetches that reference L2I are counted"}, #define PME_MONT_L2I_UC_READS_ALL_DMND 465 { "L2I_UC_READS_ALL_DMND", {0xd0879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only demand fetches that reference L2I are counted"}, #define PME_MONT_L2I_UC_READS_ALL_PFTCH 466 { "L2I_UC_READS_ALL_PFTCH", {0xe0879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only prefetches that reference L2I are counted"}, #define PME_MONT_L2I_UC_READS_HIT_ALL 467 { "L2I_UC_READS_HIT_ALL", {0x70879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- All fetches that hit in L2I counted"}, #define PME_MONT_L2I_UC_READS_HIT_DMND 468 { "L2I_UC_READS_HIT_DMND", {0x50879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only demand fetches that hit in L2I are counted"}, #define PME_MONT_L2I_UC_READS_HIT_PFTCH 469 { "L2I_UC_READS_HIT_PFTCH", {0x60879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only prefetches that hit in L2I are counted"}, #define PME_MONT_L2I_UC_READS_MISS_ALL 470 { "L2I_UC_READS_MISS_ALL", {0xb0879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- All fetches that miss in L2I are counted"}, #define PME_MONT_L2I_UC_READS_MISS_DMND 471 { "L2I_UC_READS_MISS_DMND", {0x90879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only demand fetches that miss in L2I are counted"}, #define PME_MONT_L2I_UC_READS_MISS_PFTCH 472 { "L2I_UC_READS_MISS_PFTCH", {0xa0879}, 0xfff0, 1, {0xffff0001}, "L2I Uncacheable reads -- Only prefetches that miss in L2I are counted"}, #define PME_MONT_L2I_VICTIMIZATION 473 { "L2I_VICTIMIZATION", {0x87a}, 0xfff0, 1, {0xffff0001}, "L2I victimizations"}, #define PME_MONT_L3_INSERTS 474 { "L3_INSERTS", {0x8da}, 0xfff0, 1, {0xffff0017}, "L3 Cache Lines inserts"}, #define PME_MONT_L3_LINES_REPLACED 475 { "L3_LINES_REPLACED", {0x8df}, 0xfff0, 1, {0xffff0010}, "L3 Cache Lines Replaced"}, #define PME_MONT_L3_MISSES 476 { "L3_MISSES", {0x8dc}, 0xfff0, 1, {0xffff0007}, "L3 Misses"}, #define PME_MONT_L3_READS_ALL_ALL 477 { "L3_READS_ALL_ALL", {0xf08dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Read References"}, #define PME_MONT_L3_READS_ALL_HIT 478 { "L3_READS_ALL_HIT", {0xd08dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Read Hits"}, #define PME_MONT_L3_READS_ALL_MISS 479 { "L3_READS_ALL_MISS", {0xe08dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Read Misses"}, #define PME_MONT_L3_READS_DATA_READ_ALL 480 { "L3_READS_DATA_READ_ALL", {0xb08dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Load References (excludes reads for ownership used to satisfy stores)"}, #define PME_MONT_L3_READS_DATA_READ_HIT 481 { "L3_READS_DATA_READ_HIT", {0x908dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Load Hits (excludes reads for ownership used to satisfy stores)"}, #define PME_MONT_L3_READS_DATA_READ_MISS 482 { "L3_READS_DATA_READ_MISS", {0xa08dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Load Misses (excludes reads for ownership used to satisfy stores)"}, #define PME_MONT_L3_READS_DINST_FETCH_ALL 483 { "L3_READS_DINST_FETCH_ALL", {0x308dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Demand Instruction References"}, #define PME_MONT_L3_READS_DINST_FETCH_HIT 484 { "L3_READS_DINST_FETCH_HIT", {0x108dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Demand Instruction Fetch Hits"}, #define PME_MONT_L3_READS_DINST_FETCH_MISS 485 { "L3_READS_DINST_FETCH_MISS", {0x208dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Demand Instruction Fetch Misses"}, #define PME_MONT_L3_READS_INST_FETCH_ALL 486 { "L3_READS_INST_FETCH_ALL", {0x708dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Instruction Fetch and Prefetch References"}, #define PME_MONT_L3_READS_INST_FETCH_HIT 487 { "L3_READS_INST_FETCH_HIT", {0x508dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Instruction Fetch and Prefetch Hits"}, #define PME_MONT_L3_READS_INST_FETCH_MISS 488 { "L3_READS_INST_FETCH_MISS", {0x608dd}, 0xfff0, 1, {0xffff0017}, "L3 Reads -- L3 Instruction Fetch and Prefetch Misses"}, #define PME_MONT_L3_REFERENCES 489 { "L3_REFERENCES", {0x8db}, 0xfff0, 1, {0xffff0007}, "L3 References"}, #define PME_MONT_L3_WRITES_ALL_ALL 490 { "L3_WRITES_ALL_ALL", {0xf08de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Write References"}, #define PME_MONT_L3_WRITES_ALL_HIT 491 { "L3_WRITES_ALL_HIT", {0xd08de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Write Hits"}, #define PME_MONT_L3_WRITES_ALL_MISS 492 { "L3_WRITES_ALL_MISS", {0xe08de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Write Misses"}, #define PME_MONT_L3_WRITES_DATA_WRITE_ALL 493 { "L3_WRITES_DATA_WRITE_ALL", {0x708de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Store References (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_MONT_L3_WRITES_DATA_WRITE_HIT 494 { "L3_WRITES_DATA_WRITE_HIT", {0x508de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Store Hits (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_MONT_L3_WRITES_DATA_WRITE_MISS 495 { "L3_WRITES_DATA_WRITE_MISS", {0x608de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L3 Store Misses (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_MONT_L3_WRITES_L2_WB_ALL 496 { "L3_WRITES_L2_WB_ALL", {0xb08de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L2 Write Back References"}, #define PME_MONT_L3_WRITES_L2_WB_HIT 497 { "L3_WRITES_L2_WB_HIT", {0x908de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L2 Write Back Hits"}, #define PME_MONT_L3_WRITES_L2_WB_MISS 498 { "L3_WRITES_L2_WB_MISS", {0xa08de}, 0xfff0, 1, {0xffff0017}, "L3 Writes -- L2 Write Back Misses"}, #define PME_MONT_LOADS_RETIRED 499 { "LOADS_RETIRED", {0xcd}, 0xfff0, 4, {0x5310007}, "Retired Loads"}, #define PME_MONT_LOADS_RETIRED_INTG 500 { "LOADS_RETIRED_INTG", {0xd8}, 0xfff0, 2, {0x5610007}, "Integer loads retired"}, #define PME_MONT_MEM_READ_CURRENT_ANY 501 { "MEM_READ_CURRENT_ANY", {0x31089}, 0xfff0, 1, {0xffff0000}, "Current Mem Read Transactions On Bus -- CPU or non-CPU (all transactions)."}, #define PME_MONT_MEM_READ_CURRENT_IO 502 { "MEM_READ_CURRENT_IO", {0x11089}, 0xfff0, 1, {0xffff0000}, "Current Mem Read Transactions On Bus -- non-CPU priority agents"}, #define PME_MONT_MISALIGNED_LOADS_RETIRED 503 { "MISALIGNED_LOADS_RETIRED", {0xce}, 0xfff0, 4, {0x5310007}, "Retired Misaligned Load Instructions"}, #define PME_MONT_MISALIGNED_STORES_RETIRED 504 { "MISALIGNED_STORES_RETIRED", {0xd2}, 0xfff0, 2, {0x5410007}, "Retired Misaligned Store Instructions"}, #define PME_MONT_NOPS_RETIRED 505 { "NOPS_RETIRED", {0x50}, 0xfff0, 6, {0xffff0003}, "Retired NOP Instructions"}, #define PME_MONT_PREDICATE_SQUASHED_RETIRED 506 { "PREDICATE_SQUASHED_RETIRED", {0x51}, 0xfff0, 6, {0xffff0003}, "Instructions Squashed Due to Predicate Off"}, #define PME_MONT_RSE_CURRENT_REGS_2_TO_0 507 { "RSE_CURRENT_REGS_2_TO_0", {0x2b}, 0xfff0, 7, {0xffff0000}, "Current RSE Registers (Bits 2:0)"}, #define PME_MONT_RSE_CURRENT_REGS_5_TO_3 508 { "RSE_CURRENT_REGS_5_TO_3", {0x2a}, 0xfff0, 7, {0xffff0000}, "Current RSE Registers (Bits 5:3)"}, #define PME_MONT_RSE_CURRENT_REGS_6 509 { "RSE_CURRENT_REGS_6", {0x26}, 0xfff0, 1, {0xffff0000}, "Current RSE Registers (Bit 6)"}, #define PME_MONT_RSE_DIRTY_REGS_2_TO_0 510 { "RSE_DIRTY_REGS_2_TO_0", {0x29}, 0xfff0, 7, {0xffff0000}, "Dirty RSE Registers (Bits 2:0)"}, #define PME_MONT_RSE_DIRTY_REGS_5_TO_3 511 { "RSE_DIRTY_REGS_5_TO_3", {0x28}, 0xfff0, 7, {0xffff0000}, "Dirty RSE Registers (Bits 5:3)"}, #define PME_MONT_RSE_DIRTY_REGS_6 512 { "RSE_DIRTY_REGS_6", {0x24}, 0xfff0, 1, {0xffff0000}, "Dirty RSE Registers (Bit 6)"}, #define PME_MONT_RSE_EVENT_RETIRED 513 { "RSE_EVENT_RETIRED", {0x32}, 0xfff0, 1, {0xffff0000}, "Retired RSE operations"}, #define PME_MONT_RSE_REFERENCES_RETIRED_ALL 514 { "RSE_REFERENCES_RETIRED_ALL", {0x30020}, 0xfff0, 2, {0xffff0007}, "RSE Accesses -- Both RSE loads and stores will be counted."}, #define PME_MONT_RSE_REFERENCES_RETIRED_LOAD 515 { "RSE_REFERENCES_RETIRED_LOAD", {0x10020}, 0xfff0, 2, {0xffff0007}, "RSE Accesses -- Only RSE loads will be counted."}, #define PME_MONT_RSE_REFERENCES_RETIRED_STORE 516 { "RSE_REFERENCES_RETIRED_STORE", {0x20020}, 0xfff0, 2, {0xffff0007}, "RSE Accesses -- Only RSE stores will be counted."}, #define PME_MONT_SERIALIZATION_EVENTS 517 { "SERIALIZATION_EVENTS", {0x53}, 0xfff0, 1, {0xffff0000}, "Number of srlz.i Instructions"}, #define PME_MONT_SI_CCQ_COLLISIONS_EITHER 518 { "SI_CCQ_COLLISIONS_EITHER", {0x10a8}, 0xfff0, 2, {0xffff0000}, "Clean Castout Queue Collisions -- transactions initiated by either cpu core"}, #define PME_MONT_SI_CCQ_COLLISIONS_SELF 519 { "SI_CCQ_COLLISIONS_SELF", {0x110a8}, 0xfff0, 2, {0xffff0000}, "Clean Castout Queue Collisions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_CCQ_INSERTS_EITHER 520 { "SI_CCQ_INSERTS_EITHER", {0x18a5}, 0xfff0, 2, {0xffff0000}, "Clean Castout Queue Insertions -- transactions initiated by either cpu core"}, #define PME_MONT_SI_CCQ_INSERTS_SELF 521 { "SI_CCQ_INSERTS_SELF", {0x118a5}, 0xfff0, 2, {0xffff0000}, "Clean Castout Queue Insertions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_CCQ_LIVE_REQ_HI_EITHER 522 { "SI_CCQ_LIVE_REQ_HI_EITHER", {0x10a7}, 0xfff0, 1, {0xffff0000}, "Clean Castout Queue Requests (upper bit) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_CCQ_LIVE_REQ_HI_SELF 523 { "SI_CCQ_LIVE_REQ_HI_SELF", {0x110a7}, 0xfff0, 1, {0xffff0000}, "Clean Castout Queue Requests (upper bit) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_CCQ_LIVE_REQ_LO_EITHER 524 { "SI_CCQ_LIVE_REQ_LO_EITHER", {0x10a6}, 0xfff0, 7, {0xffff0000}, "Clean Castout Queue Requests (lower three bits) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_CCQ_LIVE_REQ_LO_SELF 525 { "SI_CCQ_LIVE_REQ_LO_SELF", {0x110a6}, 0xfff0, 7, {0xffff0000}, "Clean Castout Queue Requests (lower three bits) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_CYCLES 526 { "SI_CYCLES", {0x108e}, 0xfff0, 1, {0xffff0000}, "SI Cycles"}, #define PME_MONT_SI_IOQ_COLLISIONS 527 { "SI_IOQ_COLLISIONS", {0x10aa}, 0xfff0, 2, {0xffff0000}, "In Order Queue Collisions"}, #define PME_MONT_SI_IOQ_LIVE_REQ_HI 528 { "SI_IOQ_LIVE_REQ_HI", {0x1098}, 0xfff0, 2, {0xffff0000}, "Inorder Bus Queue Requests (upper bit)"}, #define PME_MONT_SI_IOQ_LIVE_REQ_LO 529 { "SI_IOQ_LIVE_REQ_LO", {0x1097}, 0xfff0, 3, {0xffff0000}, "Inorder Bus Queue Requests (lower three bits)"}, #define PME_MONT_SI_RQ_INSERTS_EITHER 530 { "SI_RQ_INSERTS_EITHER", {0x189e}, 0xfff0, 2, {0xffff0000}, "Request Queue Insertions -- transactions initiated by either cpu core"}, #define PME_MONT_SI_RQ_INSERTS_SELF 531 { "SI_RQ_INSERTS_SELF", {0x1189e}, 0xfff0, 2, {0xffff0000}, "Request Queue Insertions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_RQ_LIVE_REQ_HI_EITHER 532 { "SI_RQ_LIVE_REQ_HI_EITHER", {0x10a0}, 0xfff0, 1, {0xffff0000}, "Request Queue Requests (upper bit) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_RQ_LIVE_REQ_HI_SELF 533 { "SI_RQ_LIVE_REQ_HI_SELF", {0x110a0}, 0xfff0, 1, {0xffff0000}, "Request Queue Requests (upper bit) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_RQ_LIVE_REQ_LO_EITHER 534 { "SI_RQ_LIVE_REQ_LO_EITHER", {0x109f}, 0xfff0, 7, {0xffff0000}, "Request Queue Requests (lower three bits) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_RQ_LIVE_REQ_LO_SELF 535 { "SI_RQ_LIVE_REQ_LO_SELF", {0x1109f}, 0xfff0, 7, {0xffff0000}, "Request Queue Requests (lower three bits) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_SCB_INSERTS_ALL_EITHER 536 { "SI_SCB_INSERTS_ALL_EITHER", {0xc10ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count all snoop signoffs (plus backsnoop inserts) from either cpu core"}, #define PME_MONT_SI_SCB_INSERTS_ALL_SELF 537 { "SI_SCB_INSERTS_ALL_SELF", {0xd10ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count all snoop signoffs (plus backsnoop inserts) from 'this' cpu core"}, #define PME_MONT_SI_SCB_INSERTS_HIT_EITHER 538 { "SI_SCB_INSERTS_HIT_EITHER", {0x410ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count HIT snoop signoffs from either cpu core"}, #define PME_MONT_SI_SCB_INSERTS_HIT_SELF 539 { "SI_SCB_INSERTS_HIT_SELF", {0x510ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count HIT snoop signoffs from 'this' cpu core"}, #define PME_MONT_SI_SCB_INSERTS_HITM_EITHER 540 { "SI_SCB_INSERTS_HITM_EITHER", {0x810ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count HITM snoop signoffs from either cpu core"}, #define PME_MONT_SI_SCB_INSERTS_HITM_SELF 541 { "SI_SCB_INSERTS_HITM_SELF", {0x910ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count HITM snoop signoffs from 'this' cpu core"}, #define PME_MONT_SI_SCB_INSERTS_MISS_EITHER 542 { "SI_SCB_INSERTS_MISS_EITHER", {0x10ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count MISS snoop signoffs (plus backsnoop inserts) from either cpu core"}, #define PME_MONT_SI_SCB_INSERTS_MISS_SELF 543 { "SI_SCB_INSERTS_MISS_SELF", {0x110ab}, 0xfff0, 4, {0xffff0000}, "Snoop Coalescing Buffer Insertions -- count MISS snoop signoffs (plus backsnoop inserts) from 'this' cpu core"}, #define PME_MONT_SI_SCB_LIVE_REQ_HI_EITHER 544 { "SI_SCB_LIVE_REQ_HI_EITHER", {0x10ad}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Requests (upper bit) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_SCB_LIVE_REQ_HI_SELF 545 { "SI_SCB_LIVE_REQ_HI_SELF", {0x110ad}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Requests (upper bit) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_SCB_LIVE_REQ_LO_EITHER 546 { "SI_SCB_LIVE_REQ_LO_EITHER", {0x10ac}, 0xfff0, 7, {0xffff0000}, "Snoop Coalescing Buffer Requests (lower three bits) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_SCB_LIVE_REQ_LO_SELF 547 { "SI_SCB_LIVE_REQ_LO_SELF", {0x110ac}, 0xfff0, 7, {0xffff0000}, "Snoop Coalescing Buffer Requests (lower three bits) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_SCB_SIGNOFFS_ALL 548 { "SI_SCB_SIGNOFFS_ALL", {0xc10ae}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Coherency Signoffs -- count all snoop signoffs"}, #define PME_MONT_SI_SCB_SIGNOFFS_HIT 549 { "SI_SCB_SIGNOFFS_HIT", {0x410ae}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Coherency Signoffs -- count HIT snoop signoffs"}, #define PME_MONT_SI_SCB_SIGNOFFS_HITM 550 { "SI_SCB_SIGNOFFS_HITM", {0x810ae}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Coherency Signoffs -- count HITM snoop signoffs"}, #define PME_MONT_SI_SCB_SIGNOFFS_MISS 551 { "SI_SCB_SIGNOFFS_MISS", {0x10ae}, 0xfff0, 1, {0xffff0000}, "Snoop Coalescing Buffer Coherency Signoffs -- count MISS snoop signoffs"}, #define PME_MONT_SI_WAQ_COLLISIONS_EITHER 552 { "SI_WAQ_COLLISIONS_EITHER", {0x10a4}, 0xfff0, 1, {0xffff0000}, "Write Address Queue Collisions -- transactions initiated by either cpu core"}, #define PME_MONT_SI_WAQ_COLLISIONS_SELF 553 { "SI_WAQ_COLLISIONS_SELF", {0x110a4}, 0xfff0, 1, {0xffff0000}, "Write Address Queue Collisions -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_ALL_EITHER 554 { "SI_WDQ_ECC_ERRORS_ALL_EITHER", {0x810af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count all ECC errors from either cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_ALL_SELF 555 { "SI_WDQ_ECC_ERRORS_ALL_SELF", {0x910af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count all ECC errors from 'this' cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_DBL_EITHER 556 { "SI_WDQ_ECC_ERRORS_DBL_EITHER", {0x410af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count double-bit ECC errors from either cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_DBL_SELF 557 { "SI_WDQ_ECC_ERRORS_DBL_SELF", {0x510af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count double-bit ECC errors from 'this' cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_SGL_EITHER 558 { "SI_WDQ_ECC_ERRORS_SGL_EITHER", {0x10af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count single-bit ECC errors from either cpu core"}, #define PME_MONT_SI_WDQ_ECC_ERRORS_SGL_SELF 559 { "SI_WDQ_ECC_ERRORS_SGL_SELF", {0x110af}, 0xfff0, 2, {0xffff0000}, "Write Data Queue ECC Errors -- count single-bit ECC errors from 'this' cpu core"}, #define PME_MONT_SI_WRITEQ_INSERTS_ALL_EITHER 560 { "SI_WRITEQ_INSERTS_ALL_EITHER", {0x18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_ALL_SELF 561 { "SI_WRITEQ_INSERTS_ALL_SELF", {0x118a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_EWB_EITHER 562 { "SI_WRITEQ_INSERTS_EWB_EITHER", {0x418a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_EWB_SELF 563 { "SI_WRITEQ_INSERTS_EWB_SELF", {0x518a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_IWB_EITHER 564 { "SI_WRITEQ_INSERTS_IWB_EITHER", {0x218a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_IWB_SELF 565 { "SI_WRITEQ_INSERTS_IWB_SELF", {0x318a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_NEWB_EITHER 566 { "SI_WRITEQ_INSERTS_NEWB_EITHER", {0xc18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_NEWB_SELF 567 { "SI_WRITEQ_INSERTS_NEWB_SELF", {0xd18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC16_EITHER 568 { "SI_WRITEQ_INSERTS_WC16_EITHER", {0x818a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC16_SELF 569 { "SI_WRITEQ_INSERTS_WC16_SELF", {0x918a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC1_8A_EITHER 570 { "SI_WRITEQ_INSERTS_WC1_8A_EITHER", {0x618a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC1_8A_SELF 571 { "SI_WRITEQ_INSERTS_WC1_8A_SELF", {0x718a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC1_8B_EITHER 572 { "SI_WRITEQ_INSERTS_WC1_8B_EITHER", {0xe18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC1_8B_SELF 573 { "SI_WRITEQ_INSERTS_WC1_8B_SELF", {0xf18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC32_EITHER 574 { "SI_WRITEQ_INSERTS_WC32_EITHER", {0xa18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_INSERTS_WC32_SELF 575 { "SI_WRITEQ_INSERTS_WC32_SELF", {0xb18a1}, 0xfff0, 2, {0xffff0000}, "Write Queue Insertions -- "}, #define PME_MONT_SI_WRITEQ_LIVE_REQ_HI_EITHER 576 { "SI_WRITEQ_LIVE_REQ_HI_EITHER", {0x10a3}, 0xfff0, 1, {0xffff0000}, "Write Queue Requests (upper bit) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_WRITEQ_LIVE_REQ_HI_SELF 577 { "SI_WRITEQ_LIVE_REQ_HI_SELF", {0x110a3}, 0xfff0, 1, {0xffff0000}, "Write Queue Requests (upper bit) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SI_WRITEQ_LIVE_REQ_LO_EITHER 578 { "SI_WRITEQ_LIVE_REQ_LO_EITHER", {0x10a2}, 0xfff0, 7, {0xffff0000}, "Write Queue Requests (lower three bits) -- transactions initiated by either cpu core"}, #define PME_MONT_SI_WRITEQ_LIVE_REQ_LO_SELF 579 { "SI_WRITEQ_LIVE_REQ_LO_SELF", {0x110a2}, 0xfff0, 7, {0xffff0000}, "Write Queue Requests (lower three bits) -- transactions initiated by 'this' cpu core"}, #define PME_MONT_SPEC_LOADS_NATTED_ALL 580 { "SPEC_LOADS_NATTED_ALL", {0xd9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Count all NaT'd loads"}, #define PME_MONT_SPEC_LOADS_NATTED_DEF_PSR_ED 581 { "SPEC_LOADS_NATTED_DEF_PSR_ED", {0x500d9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Only loads NaT'd due to effect of PSR.ed"}, #define PME_MONT_SPEC_LOADS_NATTED_DEF_TLB_FAULT 582 { "SPEC_LOADS_NATTED_DEF_TLB_FAULT", {0x300d9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Only loads NaT'd due to deferred TLB faults"}, #define PME_MONT_SPEC_LOADS_NATTED_DEF_TLB_MISS 583 { "SPEC_LOADS_NATTED_DEF_TLB_MISS", {0x200d9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Only loads NaT'd due to deferred TLB misses"}, #define PME_MONT_SPEC_LOADS_NATTED_NAT_CNSM 584 { "SPEC_LOADS_NATTED_NAT_CNSM", {0x400d9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Only loads NaT'd due to NaT consumption"}, #define PME_MONT_SPEC_LOADS_NATTED_VHPT_MISS 585 { "SPEC_LOADS_NATTED_VHPT_MISS", {0x100d9}, 0xfff0, 2, {0xffff0005}, "Number of speculative inter loads that are NaTd -- Only loads NaT'd due to VHPT miss"}, #define PME_MONT_STORES_RETIRED 586 { "STORES_RETIRED", {0xd1}, 0xfff0, 2, {0x5410007}, "Retired Stores"}, #define PME_MONT_SYLL_NOT_DISPERSED_ALL 587 { "SYLL_NOT_DISPERSED_ALL", {0xf004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Counts all syllables not dispersed. NOTE: Any combination of b0000-b1111 is valid."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL 588 { "SYLL_NOT_DISPERSED_EXPL", {0x1004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits. These consist of programmer specified architected S-bit and templates 1 and 5. Dispersal takes a 6-syllable (3-syllable) hit for every template 1/5 in bundle 0(1). Dispersal takes a 3-syllable (0 syllable) hit for every S-bit in bundle 0(1)"}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_FE 589 { "SYLL_NOT_DISPERSED_EXPL_OR_FE", {0x5004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or front-end not providing valid bundles or providing valid illegal templates."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_FE_OR_MLX 590 { "SYLL_NOT_DISPERSED_EXPL_OR_FE_OR_MLX", {0xd004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or due to front-end not providing valid bundles or providing valid illegal templates or due to MLX bundle and resteers to non-0 syllable."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_IMPL 591 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL", {0x3004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit/implicit stop bits."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_FE 592 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_FE", {0x7004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit or implicit stop bits or due to front-end not providing valid bundles or providing valid illegal template."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_MLX 593 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_MLX", {0xb004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit or implicit stop bits or due to MLX bundle and resteers to non-0 syllable."}, #define PME_MONT_SYLL_NOT_DISPERSED_EXPL_OR_MLX 594 { "SYLL_NOT_DISPERSED_EXPL_OR_MLX", {0x9004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or to MLX bundle and resteers to non-0 syllable."}, #define PME_MONT_SYLL_NOT_DISPERSED_FE 595 { "SYLL_NOT_DISPERSED_FE", {0x4004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to front-end not providing valid bundles or providing valid illegal templates. Dispersal takes a 3-syllable hit for every invalid bundle or valid illegal template from front-end. Bundle 1 with front-end fault, is counted here (3-syllable hit).."}, #define PME_MONT_SYLL_NOT_DISPERSED_FE_OR_MLX 596 { "SYLL_NOT_DISPERSED_FE_OR_MLX", {0xc004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to MLI bundle and resteers to non-0 syllable or due to front-end not providing valid bundles or providing valid illegal templates."}, #define PME_MONT_SYLL_NOT_DISPERSED_IMPL 597 { "SYLL_NOT_DISPERSED_IMPL", {0x2004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits. These consist of all of the non-architected stop bits (asymmetry, oversubscription, implicit). Dispersal takes a 6-syllable(3-syllable) hit for every implicit stop bits in bundle 0(1)."}, #define PME_MONT_SYLL_NOT_DISPERSED_IMPL_OR_FE 598 { "SYLL_NOT_DISPERSED_IMPL_OR_FE", {0x6004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or to front-end not providing valid bundles or providing valid illegal templates."}, #define PME_MONT_SYLL_NOT_DISPERSED_IMPL_OR_FE_OR_MLX 599 { "SYLL_NOT_DISPERSED_IMPL_OR_FE_OR_MLX", {0xe004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or due to front-end not providing valid bundles or providing valid illegal templates or due to MLX bundle and resteers to non-0 syllable."}, #define PME_MONT_SYLL_NOT_DISPERSED_IMPL_OR_MLX 600 { "SYLL_NOT_DISPERSED_IMPL_OR_MLX", {0xa004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or to MLX bundle and resteers to non-0 syllable."}, #define PME_MONT_SYLL_NOT_DISPERSED_MLX 601 { "SYLL_NOT_DISPERSED_MLX", {0x8004e}, 0xfff0, 5, {0xffff0001}, "Syllables Not Dispersed -- Count syllables not dispersed due to MLX bundle and resteers to non-0 syllable. Dispersal takes a 1 syllable hit for each MLX bundle . Dispersal could take 0-2 syllable hit depending on which syllable we resteer to. Bundle 1 with front-end fault which is split, is counted here (0-2 syllable hit)."}, #define PME_MONT_SYLL_OVERCOUNT_ALL 602 { "SYLL_OVERCOUNT_ALL", {0x3004f}, 0xfff0, 2, {0xffff0001}, "Syllables Overcounted -- syllables overcounted in implicit & explicit bucket"}, #define PME_MONT_SYLL_OVERCOUNT_EXPL 603 { "SYLL_OVERCOUNT_EXPL", {0x1004f}, 0xfff0, 2, {0xffff0001}, "Syllables Overcounted -- Only syllables overcounted in the explicit bucket"}, #define PME_MONT_SYLL_OVERCOUNT_IMPL 604 { "SYLL_OVERCOUNT_IMPL", {0x2004f}, 0xfff0, 2, {0xffff0001}, "Syllables Overcounted -- Only syllables overcounted in the implicit bucket"}, #define PME_MONT_THREAD_SWITCH_CYCLE_ALL_GATED 605 { "THREAD_SWITCH_CYCLE_ALL_GATED", {0x6000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Cycles TSs are gated due to any reason"}, #define PME_MONT_THREAD_SWITCH_CYCLE_ANYSTALL 606 { "THREAD_SWITCH_CYCLE_ANYSTALL", {0x3000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Cycles TSs are stalled due to any reason"}, #define PME_MONT_THREAD_SWITCH_CYCLE_CRAB 607 { "THREAD_SWITCH_CYCLE_CRAB", {0x1000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Cycles TSs are stalled due to CRAB operation"}, #define PME_MONT_THREAD_SWITCH_CYCLE_L2D 608 { "THREAD_SWITCH_CYCLE_L2D", {0x2000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Cycles TSs are stalled due to L2D return operation"}, #define PME_MONT_THREAD_SWITCH_CYCLE_PCR 609 { "THREAD_SWITCH_CYCLE_PCR", {0x4000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Cycles we run with PCR.sd set"}, #define PME_MONT_THREAD_SWITCH_CYCLE_TOTAL 610 { "THREAD_SWITCH_CYCLE_TOTAL", {0x7000e}, 0xfff0, 1, {0xffff0000}, "Thread switch overhead cycles. -- Total time from TS opportunity is seized to TS happens."}, #define PME_MONT_THREAD_SWITCH_EVENTS_ALL 611 { "THREAD_SWITCH_EVENTS_ALL", {0x7000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- All taken TSs"}, #define PME_MONT_THREAD_SWITCH_EVENTS_DBG 612 { "THREAD_SWITCH_EVENTS_DBG", {0x5000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TSs due to debug operations"}, #define PME_MONT_THREAD_SWITCH_EVENTS_HINT 613 { "THREAD_SWITCH_EVENTS_HINT", {0x3000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TSs due to hint instruction"}, #define PME_MONT_THREAD_SWITCH_EVENTS_L3MISS 614 { "THREAD_SWITCH_EVENTS_L3MISS", {0x1000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TSs due to L3 miss"}, #define PME_MONT_THREAD_SWITCH_EVENTS_LP 615 { "THREAD_SWITCH_EVENTS_LP", {0x4000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TSs due to low power operation"}, #define PME_MONT_THREAD_SWITCH_EVENTS_MISSED 616 { "THREAD_SWITCH_EVENTS_MISSED", {0xc}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TS opportunities missed"}, #define PME_MONT_THREAD_SWITCH_EVENTS_TIMER 617 { "THREAD_SWITCH_EVENTS_TIMER", {0x2000c}, 0xfff0, 1, {0xffff0000}, "Thread switch events. -- TSs due to time out"}, #define PME_MONT_THREAD_SWITCH_GATED_ALL 618 { "THREAD_SWITCH_GATED_ALL", {0x7000d}, 0xfff0, 1, {0xffff0000}, "Thread switches gated -- TSs gated for any reason"}, #define PME_MONT_THREAD_SWITCH_GATED_FWDPRO 619 { "THREAD_SWITCH_GATED_FWDPRO", {0x5000d}, 0xfff0, 1, {0xffff0000}, "Thread switches gated -- Gated due to forward progress reasons"}, #define PME_MONT_THREAD_SWITCH_GATED_LP 620 { "THREAD_SWITCH_GATED_LP", {0x1000d}, 0xfff0, 1, {0xffff0000}, "Thread switches gated -- TSs gated due to LP"}, #define PME_MONT_THREAD_SWITCH_GATED_PIPE 621 { "THREAD_SWITCH_GATED_PIPE", {0x4000d}, 0xfff0, 1, {0xffff0000}, "Thread switches gated -- Gated due to pipeline operations"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_1024 622 { "THREAD_SWITCH_STALL_GTE_1024", {0x8000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 1024 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_128 623 { "THREAD_SWITCH_STALL_GTE_128", {0x5000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 128 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_16 624 { "THREAD_SWITCH_STALL_GTE_16", {0x2000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 16 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_2048 625 { "THREAD_SWITCH_STALL_GTE_2048", {0x9000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 2048 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_256 626 { "THREAD_SWITCH_STALL_GTE_256", {0x6000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 256 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_32 627 { "THREAD_SWITCH_STALL_GTE_32", {0x3000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 32 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_4 628 { "THREAD_SWITCH_STALL_GTE_4", {0xf}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 4 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_4096 629 { "THREAD_SWITCH_STALL_GTE_4096", {0xa000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 4096 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_512 630 { "THREAD_SWITCH_STALL_GTE_512", {0x7000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 512 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_64 631 { "THREAD_SWITCH_STALL_GTE_64", {0x4000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 64 cycles"}, #define PME_MONT_THREAD_SWITCH_STALL_GTE_8 632 { "THREAD_SWITCH_STALL_GTE_8", {0x1000f}, 0xfff0, 1, {0xffff0000}, "Thread switch stall -- Thread switch stall >= 8 cycles"}, #define PME_MONT_UC_LOADS_RETIRED 633 { "UC_LOADS_RETIRED", {0xcf}, 0xfff0, 4, {0x5310007}, "Retired Uncacheable Loads"}, #define PME_MONT_UC_STORES_RETIRED 634 { "UC_STORES_RETIRED", {0xd0}, 0xfff0, 2, {0x5410007}, "Retired Uncacheable Stores"}, #define PME_MONT_IA64_INST_RETIRED 635 { "IA64_INST_RETIRED", {0x8}, 0xfff0, 6, {0xffff0003}, "Retired IA-64 Instructions -- Retired IA-64 Instructions -- Alias to IA64_INST_RETIRED_THIS"}, #define PME_MONT_BRANCH_EVENT 636 { "BRANCH_EVENT", {0x111}, 0xfff0, 1, {0xffff0003}, "Execution Trace Buffer Event Captured. Alias to ETB_EVENT"}, }; #define PME_MONT_EVENT_COUNT (sizeof(montecito_pe)/sizeof(pme_mont_entry_t)) papi-5.3.0/src/libpfm4/lib/events/itanium2_events.h0000600003276200002170000027467012247131123021641 0ustar ralphundrgrad/* * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * This file is generated automatically * !! DO NOT CHANGE !! */ static pme_ita2_entry_t itanium2_pe []={ #define PME_ITA2_ALAT_CAPACITY_MISS_ALL 0 { "ALAT_CAPACITY_MISS_ALL", {0x30058}, 0xf0, 2, {0xf00007}, "ALAT Entry Replaced -- both integer and floating point instructions"}, #define PME_ITA2_ALAT_CAPACITY_MISS_FP 1 { "ALAT_CAPACITY_MISS_FP", {0x20058}, 0xf0, 2, {0xf00007}, "ALAT Entry Replaced -- only floating point instructions"}, #define PME_ITA2_ALAT_CAPACITY_MISS_INT 2 { "ALAT_CAPACITY_MISS_INT", {0x10058}, 0xf0, 2, {0xf00007}, "ALAT Entry Replaced -- only integer instructions"}, #define PME_ITA2_BACK_END_BUBBLE_ALL 3 { "BACK_END_BUBBLE_ALL", {0x0}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe -- Front-end, RSE, EXE, FPU/L1D stall or a pipeline flush due to an exception/branch misprediction"}, #define PME_ITA2_BACK_END_BUBBLE_FE 4 { "BACK_END_BUBBLE_FE", {0x10000}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe -- front-end"}, #define PME_ITA2_BACK_END_BUBBLE_L1D_FPU_RSE 5 { "BACK_END_BUBBLE_L1D_FPU_RSE", {0x20000}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe -- L1D_FPU or RSE."}, #define PME_ITA2_BE_BR_MISPRED_DETAIL_ANY 6 { "BE_BR_MISPRED_DETAIL_ANY", {0x61}, 0xf0, 1, {0xf00003}, "BE Branch Misprediction Detail -- any back-end (be) mispredictions"}, #define PME_ITA2_BE_BR_MISPRED_DETAIL_PFS 7 { "BE_BR_MISPRED_DETAIL_PFS", {0x30061}, 0xf0, 1, {0xf00003}, "BE Branch Misprediction Detail -- only back-end pfs mispredictions for taken branches"}, #define PME_ITA2_BE_BR_MISPRED_DETAIL_ROT 8 { "BE_BR_MISPRED_DETAIL_ROT", {0x20061}, 0xf0, 1, {0xf00003}, "BE Branch Misprediction Detail -- only back-end rotate mispredictions"}, #define PME_ITA2_BE_BR_MISPRED_DETAIL_STG 9 { "BE_BR_MISPRED_DETAIL_STG", {0x10061}, 0xf0, 1, {0xf00003}, "BE Branch Misprediction Detail -- only back-end stage mispredictions"}, #define PME_ITA2_BE_EXE_BUBBLE_ALL 10 { "BE_EXE_BUBBLE_ALL", {0x2}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe"}, #define PME_ITA2_BE_EXE_BUBBLE_ARCR 11 { "BE_EXE_BUBBLE_ARCR", {0x40002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to AR or CR dependency"}, #define PME_ITA2_BE_EXE_BUBBLE_ARCR_PR_CANCEL_BANK 12 { "BE_EXE_BUBBLE_ARCR_PR_CANCEL_BANK", {0x80002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- ARCR, PR, CANCEL or BANK_SWITCH"}, #define PME_ITA2_BE_EXE_BUBBLE_BANK_SWITCH 13 { "BE_EXE_BUBBLE_BANK_SWITCH", {0x70002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to bank switching."}, #define PME_ITA2_BE_EXE_BUBBLE_CANCEL 14 { "BE_EXE_BUBBLE_CANCEL", {0x60002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to a canceled load"}, #define PME_ITA2_BE_EXE_BUBBLE_FRALL 15 { "BE_EXE_BUBBLE_FRALL", {0x20002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to FR/FR or FR/load dependency"}, #define PME_ITA2_BE_EXE_BUBBLE_GRALL 16 { "BE_EXE_BUBBLE_GRALL", {0x10002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to GR/GR or GR/load dependency"}, #define PME_ITA2_BE_EXE_BUBBLE_GRGR 17 { "BE_EXE_BUBBLE_GRGR", {0x50002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to GR/GR dependency"}, #define PME_ITA2_BE_EXE_BUBBLE_PR 18 { "BE_EXE_BUBBLE_PR", {0x30002}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Execution Unit Stalls -- Back-end was stalled by exe due to PR dependency"}, #define PME_ITA2_BE_FLUSH_BUBBLE_ALL 19 { "BE_FLUSH_BUBBLE_ALL", {0x4}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to either an exception/interruption or branch misprediction flush"}, #define PME_ITA2_BE_FLUSH_BUBBLE_BRU 20 { "BE_FLUSH_BUBBLE_BRU", {0x10004}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to a branch misprediction flush"}, #define PME_ITA2_BE_FLUSH_BUBBLE_XPN 21 { "BE_FLUSH_BUBBLE_XPN", {0x20004}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to Flushes. -- Back-end was stalled due to an exception/interruption flush"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_ALL 22 { "BE_L1D_FPU_BUBBLE_ALL", {0xca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D or FPU"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_FPU 23 { "BE_L1D_FPU_BUBBLE_FPU", {0x100ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by FPU."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D 24 { "BE_L1D_FPU_BUBBLE_L1D", {0x200ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D. This includes all stalls caused by the L1 pipeline (created in the L1D stage of the L1 pipeline which corresponds to the DET stage of the main pipe)."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_DCS 25 { "BE_L1D_FPU_BUBBLE_L1D_DCS", {0x800ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to DCS requiring a stall"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_DCURECIR 26 { "BE_L1D_FPU_BUBBLE_L1D_DCURECIR", {0x400ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to DCU recirculating"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_FILLCONF 27 { "BE_L1D_FPU_BUBBLE_L1D_FILLCONF", {0x700ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due a store in conflict with a returning fill."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_FULLSTBUF 28 { "BE_L1D_FPU_BUBBLE_L1D_FULLSTBUF", {0x300ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to store buffer being full"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_HPW 29 { "BE_L1D_FPU_BUBBLE_L1D_HPW", {0x500ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to Hardware Page Walker"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_L2BPRESS 30 { "BE_L1D_FPU_BUBBLE_L1D_L2BPRESS", {0x900ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L2 Back Pressure"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_LDCHK 31 { "BE_L1D_FPU_BUBBLE_L1D_LDCHK", {0xc00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to architectural ordering conflict"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_LDCONF 32 { "BE_L1D_FPU_BUBBLE_L1D_LDCONF", {0xb00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to architectural ordering conflict"}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_NAT 33 { "BE_L1D_FPU_BUBBLE_L1D_NAT", {0xd00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L1D data return needing recirculated NaT generation."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_NATCONF 34 { "BE_L1D_FPU_BUBBLE_L1D_NATCONF", {0xf00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to ld8.fill conflict with st8.spill not written to unat."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_STBUFRECIR 35 { "BE_L1D_FPU_BUBBLE_L1D_STBUFRECIR", {0xe00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to store buffer cancel needing recirculate."}, #define PME_ITA2_BE_L1D_FPU_BUBBLE_L1D_TLB 36 { "BE_L1D_FPU_BUBBLE_L1D_TLB", {0xa00ca}, 0xf0, 1, {0x5210000}, "Full Pipe Bubbles in Main Pipe due to FPU or L1D Cache -- Back-end was stalled by L1D due to L2DTLB to L1DTLB transfer"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_ALL 37 { "BE_LOST_BW_DUE_TO_FE_ALL", {0x72}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- count regardless of cause"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_BI 38 { "BE_LOST_BW_DUE_TO_FE_BI", {0x90072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch initialization stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_BRQ 39 { "BE_LOST_BW_DUE_TO_FE_BRQ", {0xa0072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch retirement queue stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_BR_ILOCK 40 { "BE_LOST_BW_DUE_TO_FE_BR_ILOCK", {0xc0072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch interlock stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_BUBBLE 41 { "BE_LOST_BW_DUE_TO_FE_BUBBLE", {0xd0072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by branch resteer bubble stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_FEFLUSH 42 { "BE_LOST_BW_DUE_TO_FE_FEFLUSH", {0x10072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by a front-end flush"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC 43 { "BE_LOST_BW_DUE_TO_FE_FILL_RECIRC", {0x80072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by a recirculate for a cache line fill operation"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_IBFULL 44 { "BE_LOST_BW_DUE_TO_FE_IBFULL", {0x50072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- (* meaningless for this event *)"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_IMISS 45 { "BE_LOST_BW_DUE_TO_FE_IMISS", {0x60072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by instruction cache miss stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_PLP 46 { "BE_LOST_BW_DUE_TO_FE_PLP", {0xb0072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by perfect loop prediction stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_TLBMISS 47 { "BE_LOST_BW_DUE_TO_FE_TLBMISS", {0x70072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by TLB stall"}, #define PME_ITA2_BE_LOST_BW_DUE_TO_FE_UNREACHED 48 { "BE_LOST_BW_DUE_TO_FE_UNREACHED", {0x40072}, 0xf0, 2, {0xf00000}, "Invalid Bundles if BE Not Stalled for Other Reasons. -- only if caused by unreachable bundle"}, #define PME_ITA2_BE_RSE_BUBBLE_ALL 49 { "BE_RSE_BUBBLE_ALL", {0x1}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE"}, #define PME_ITA2_BE_RSE_BUBBLE_AR_DEP 50 { "BE_RSE_BUBBLE_AR_DEP", {0x20001}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to AR dependencies"}, #define PME_ITA2_BE_RSE_BUBBLE_BANK_SWITCH 51 { "BE_RSE_BUBBLE_BANK_SWITCH", {0x10001}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to bank switching"}, #define PME_ITA2_BE_RSE_BUBBLE_LOADRS 52 { "BE_RSE_BUBBLE_LOADRS", {0x50001}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to loadrs calculations"}, #define PME_ITA2_BE_RSE_BUBBLE_OVERFLOW 53 { "BE_RSE_BUBBLE_OVERFLOW", {0x30001}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to need to spill"}, #define PME_ITA2_BE_RSE_BUBBLE_UNDERFLOW 54 { "BE_RSE_BUBBLE_UNDERFLOW", {0x40001}, 0xf0, 1, {0xf00000}, "Full Pipe Bubbles in Main Pipe due to RSE Stalls -- Back-end was stalled by RSE due to need to fill"}, #define PME_ITA2_BRANCH_EVENT 55 { "BRANCH_EVENT", {0x111}, 0xf0, 1, {0xf00003}, "Branch Event Captured"}, #define PME_ITA2_BR_MISPRED_DETAIL_ALL_ALL_PRED 56 { "BR_MISPRED_DETAIL_ALL_ALL_PRED", {0x5b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- All branch types regardless of prediction result"}, #define PME_ITA2_BR_MISPRED_DETAIL_ALL_CORRECT_PRED 57 { "BR_MISPRED_DETAIL_ALL_CORRECT_PRED", {0x1005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- All branch types, correctly predicted branches (outcome and target)"}, #define PME_ITA2_BR_MISPRED_DETAIL_ALL_WRONG_PATH 58 { "BR_MISPRED_DETAIL_ALL_WRONG_PATH", {0x2005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- All branch types, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL_ALL_WRONG_TARGET 59 { "BR_MISPRED_DETAIL_ALL_WRONG_TARGET", {0x3005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- All branch types, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_BR_MISPRED_DETAIL_IPREL_ALL_PRED 60 { "BR_MISPRED_DETAIL_IPREL_ALL_PRED", {0x4005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only IP relative branches, regardless of prediction result"}, #define PME_ITA2_BR_MISPRED_DETAIL_IPREL_CORRECT_PRED 61 { "BR_MISPRED_DETAIL_IPREL_CORRECT_PRED", {0x5005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only IP relative branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_BR_MISPRED_DETAIL_IPREL_WRONG_PATH 62 { "BR_MISPRED_DETAIL_IPREL_WRONG_PATH", {0x6005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only IP relative branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL_IPREL_WRONG_TARGET 63 { "BR_MISPRED_DETAIL_IPREL_WRONG_TARGET", {0x7005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only IP relative branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_BR_MISPRED_DETAIL_NTRETIND_ALL_PRED 64 { "BR_MISPRED_DETAIL_NTRETIND_ALL_PRED", {0xc005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, regardless of prediction result"}, #define PME_ITA2_BR_MISPRED_DETAIL_NTRETIND_CORRECT_PRED 65 { "BR_MISPRED_DETAIL_NTRETIND_CORRECT_PRED", {0xd005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_BR_MISPRED_DETAIL_NTRETIND_WRONG_PATH 66 { "BR_MISPRED_DETAIL_NTRETIND_WRONG_PATH", {0xe005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL_NTRETIND_WRONG_TARGET 67 { "BR_MISPRED_DETAIL_NTRETIND_WRONG_TARGET", {0xf005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only non-return indirect branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_BR_MISPRED_DETAIL_RETURN_ALL_PRED 68 { "BR_MISPRED_DETAIL_RETURN_ALL_PRED", {0x8005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only return type branches, regardless of prediction result"}, #define PME_ITA2_BR_MISPRED_DETAIL_RETURN_CORRECT_PRED 69 { "BR_MISPRED_DETAIL_RETURN_CORRECT_PRED", {0x9005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only return type branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_BR_MISPRED_DETAIL_RETURN_WRONG_PATH 70 { "BR_MISPRED_DETAIL_RETURN_WRONG_PATH", {0xa005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only return type branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL_RETURN_WRONG_TARGET 71 { "BR_MISPRED_DETAIL_RETURN_WRONG_TARGET", {0xb005b}, 0xf0, 3, {0xf00003}, "FE Branch Mispredict Detail -- Only return type branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_BR_MISPRED_DETAIL2_ALL_ALL_UNKNOWN_PRED 72 { "BR_MISPRED_DETAIL2_ALL_ALL_UNKNOWN_PRED", {0x68}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_CORRECT_PRED 73 { "BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_CORRECT_PRED", {0x10068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction and correctly predicted branch (outcome & target)"}, #define PME_ITA2_BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_WRONG_PATH 74 { "BR_MISPRED_DETAIL2_ALL_UNKNOWN_PATH_WRONG_PATH", {0x20068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- All branch types, branches with unknown path prediction and wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_IPREL_ALL_UNKNOWN_PRED 75 { "BR_MISPRED_DETAIL2_IPREL_ALL_UNKNOWN_PRED", {0x40068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_CORRECT_PRED 76 { "BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_CORRECT_PRED", {0x50068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_ITA2_BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_WRONG_PATH 77 { "BR_MISPRED_DETAIL2_IPREL_UNKNOWN_PATH_WRONG_PATH", {0x60068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only IP relative branches, branches with unknown path prediction and wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_NRETIND_ALL_UNKNOWN_PRED 78 { "BR_MISPRED_DETAIL2_NRETIND_ALL_UNKNOWN_PRED", {0xc0068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_CORRECT_PRED 79 { "BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_CORRECT_PRED", {0xd0068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_ITA2_BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_WRONG_PATH 80 { "BR_MISPRED_DETAIL2_NRETIND_UNKNOWN_PATH_WRONG_PATH", {0xe0068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only non-return indirect branches, branches with unknown path prediction and wrong branch direction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_RETURN_ALL_UNKNOWN_PRED 81 { "BR_MISPRED_DETAIL2_RETURN_ALL_UNKNOWN_PRED", {0x80068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction"}, #define PME_ITA2_BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_CORRECT_PRED 82 { "BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_CORRECT_PRED", {0x90068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction and correct predicted branch (outcome & target)"}, #define PME_ITA2_BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_WRONG_PATH 83 { "BR_MISPRED_DETAIL2_RETURN_UNKNOWN_PATH_WRONG_PATH", {0xa0068}, 0xf0, 2, {0xf00003}, "FE Branch Mispredict Detail (Unknown Path Component) -- Only return type branches, branches with unknown path prediction and wrong branch direction"}, #define PME_ITA2_BR_PATH_PRED_ALL_MISPRED_NOTTAKEN 84 { "BR_PATH_PRED_ALL_MISPRED_NOTTAKEN", {0x54}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- All branch types, incorrectly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_ALL_MISPRED_TAKEN 85 { "BR_PATH_PRED_ALL_MISPRED_TAKEN", {0x10054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- All branch types, incorrectly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_ALL_OKPRED_NOTTAKEN 86 { "BR_PATH_PRED_ALL_OKPRED_NOTTAKEN", {0x20054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- All branch types, correctly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_ALL_OKPRED_TAKEN 87 { "BR_PATH_PRED_ALL_OKPRED_TAKEN", {0x30054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- All branch types, correctly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_IPREL_MISPRED_NOTTAKEN 88 { "BR_PATH_PRED_IPREL_MISPRED_NOTTAKEN", {0x40054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only IP relative branches, incorrectly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_IPREL_MISPRED_TAKEN 89 { "BR_PATH_PRED_IPREL_MISPRED_TAKEN", {0x50054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only IP relative branches, incorrectly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_IPREL_OKPRED_NOTTAKEN 90 { "BR_PATH_PRED_IPREL_OKPRED_NOTTAKEN", {0x60054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only IP relative branches, correctly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_IPREL_OKPRED_TAKEN 91 { "BR_PATH_PRED_IPREL_OKPRED_TAKEN", {0x70054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only IP relative branches, correctly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_NRETIND_MISPRED_NOTTAKEN 92 { "BR_PATH_PRED_NRETIND_MISPRED_NOTTAKEN", {0xc0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, incorrectly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_NRETIND_MISPRED_TAKEN 93 { "BR_PATH_PRED_NRETIND_MISPRED_TAKEN", {0xd0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, incorrectly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_NRETIND_OKPRED_NOTTAKEN 94 { "BR_PATH_PRED_NRETIND_OKPRED_NOTTAKEN", {0xe0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, correctly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_NRETIND_OKPRED_TAKEN 95 { "BR_PATH_PRED_NRETIND_OKPRED_TAKEN", {0xf0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only non-return indirect branches, correctly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_RETURN_MISPRED_NOTTAKEN 96 { "BR_PATH_PRED_RETURN_MISPRED_NOTTAKEN", {0x80054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only return type branches, incorrectly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_RETURN_MISPRED_TAKEN 97 { "BR_PATH_PRED_RETURN_MISPRED_TAKEN", {0x90054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only return type branches, incorrectly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED_RETURN_OKPRED_NOTTAKEN 98 { "BR_PATH_PRED_RETURN_OKPRED_NOTTAKEN", {0xa0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only return type branches, correctly predicted path and not taken branch"}, #define PME_ITA2_BR_PATH_PRED_RETURN_OKPRED_TAKEN 99 { "BR_PATH_PRED_RETURN_OKPRED_TAKEN", {0xb0054}, 0xf0, 3, {0xf00003}, "FE Branch Path Prediction Detail -- Only return type branches, correctly predicted path and taken branch"}, #define PME_ITA2_BR_PATH_PRED2_ALL_UNKNOWNPRED_NOTTAKEN 100 { "BR_PATH_PRED2_ALL_UNKNOWNPRED_NOTTAKEN", {0x6a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- All branch types, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_ALL_UNKNOWNPRED_TAKEN 101 { "BR_PATH_PRED2_ALL_UNKNOWNPRED_TAKEN", {0x1006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- All branch types, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_IPREL_UNKNOWNPRED_NOTTAKEN 102 { "BR_PATH_PRED2_IPREL_UNKNOWNPRED_NOTTAKEN", {0x4006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only IP relative branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_IPREL_UNKNOWNPRED_TAKEN 103 { "BR_PATH_PRED2_IPREL_UNKNOWNPRED_TAKEN", {0x5006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only IP relative branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_NRETIND_UNKNOWNPRED_NOTTAKEN 104 { "BR_PATH_PRED2_NRETIND_UNKNOWNPRED_NOTTAKEN", {0xc006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only non-return indirect branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_NRETIND_UNKNOWNPRED_TAKEN 105 { "BR_PATH_PRED2_NRETIND_UNKNOWNPRED_TAKEN", {0xd006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only non-return indirect branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_RETURN_UNKNOWNPRED_NOTTAKEN 106 { "BR_PATH_PRED2_RETURN_UNKNOWNPRED_NOTTAKEN", {0x8006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only return type branches, unknown predicted path and not taken branch (which impacts OKPRED_NOTTAKEN)"}, #define PME_ITA2_BR_PATH_PRED2_RETURN_UNKNOWNPRED_TAKEN 107 { "BR_PATH_PRED2_RETURN_UNKNOWNPRED_TAKEN", {0x9006a}, 0xf0, 2, {0xf00003}, "FE Branch Path Prediction Detail (Unknown pred component) -- Only return type branches, unknown predicted path and taken branch (which impacts MISPRED_TAKEN)"}, #define PME_ITA2_BUS_ALL_ANY 108 { "BUS_ALL_ANY", {0x30087}, 0xf0, 1, {0xf00000}, "Bus Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_ALL_IO 109 { "BUS_ALL_IO", {0x10087}, 0xf0, 1, {0xf00000}, "Bus Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_ALL_SELF 110 { "BUS_ALL_SELF", {0x20087}, 0xf0, 1, {0xf00000}, "Bus Transactions -- local processor"}, #define PME_ITA2_BUS_BACKSNP_REQ_THIS 111 { "BUS_BACKSNP_REQ_THIS", {0x1008e}, 0xf0, 1, {0xf00000}, "Bus Back Snoop Requests -- Counts the number of bus back snoop me requests"}, #define PME_ITA2_BUS_BRQ_LIVE_REQ_HI 112 { "BUS_BRQ_LIVE_REQ_HI", {0x9c}, 0xf0, 2, {0xf00000}, "BRQ Live Requests (upper 2 bits)"}, #define PME_ITA2_BUS_BRQ_LIVE_REQ_LO 113 { "BUS_BRQ_LIVE_REQ_LO", {0x9b}, 0xf0, 7, {0xf00000}, "BRQ Live Requests (lower 3 bits)"}, #define PME_ITA2_BUS_BRQ_REQ_INSERTED 114 { "BUS_BRQ_REQ_INSERTED", {0x9d}, 0xf0, 1, {0xf00000}, "BRQ Requests Inserted"}, #define PME_ITA2_BUS_DATA_CYCLE 115 { "BUS_DATA_CYCLE", {0x88}, 0xf0, 1, {0xf00000}, "Valid Data Cycle on the Bus"}, #define PME_ITA2_BUS_HITM 116 { "BUS_HITM", {0x84}, 0xf0, 1, {0xf00000}, "Bus Hit Modified Line Transactions"}, #define PME_ITA2_BUS_IO_ANY 117 { "BUS_IO_ANY", {0x30090}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Bus Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_IO_IO 118 { "BUS_IO_IO", {0x10090}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Bus Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_IO_SELF 119 { "BUS_IO_SELF", {0x20090}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Bus Transactions -- local processor"}, #define PME_ITA2_BUS_IOQ_LIVE_REQ_HI 120 { "BUS_IOQ_LIVE_REQ_HI", {0x98}, 0xf0, 2, {0xf00000}, "Inorder Bus Queue Requests (upper 2 bits)"}, #define PME_ITA2_BUS_IOQ_LIVE_REQ_LO 121 { "BUS_IOQ_LIVE_REQ_LO", {0x97}, 0xf0, 3, {0xf00000}, "Inorder Bus Queue Requests (lower2 bitst)"}, #define PME_ITA2_BUS_LOCK_ANY 122 { "BUS_LOCK_ANY", {0x30093}, 0xf0, 1, {0xf00000}, "IA-32 Compatible Bus Lock Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_LOCK_SELF 123 { "BUS_LOCK_SELF", {0x20093}, 0xf0, 1, {0xf00000}, "IA-32 Compatible Bus Lock Transactions -- local processor"}, #define PME_ITA2_BUS_MEMORY_ALL_ANY 124 { "BUS_MEMORY_ALL_ANY", {0xf008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- All bus transactions from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEMORY_ALL_IO 125 { "BUS_MEMORY_ALL_IO", {0xd008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- All bus transactions from non-CPU priority agents"}, #define PME_ITA2_BUS_MEMORY_ALL_SELF 126 { "BUS_MEMORY_ALL_SELF", {0xe008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- All bus transactions from local processor"}, #define PME_ITA2_BUS_MEMORY_EQ_128BYTE_ANY 127 { "BUS_MEMORY_EQ_128BYTE_ANY", {0x7008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL) from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEMORY_EQ_128BYTE_IO 128 { "BUS_MEMORY_EQ_128BYTE_IO", {0x5008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL) from non-CPU priority agents"}, #define PME_ITA2_BUS_MEMORY_EQ_128BYTE_SELF 129 { "BUS_MEMORY_EQ_128BYTE_SELF", {0x6008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of full cache line transactions (BRL, BRIL, BWL) from local processor"}, #define PME_ITA2_BUS_MEMORY_LT_128BYTE_ANY 130 { "BUS_MEMORY_LT_128BYTE_ANY", {0xb008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP) CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEMORY_LT_128BYTE_IO 131 { "BUS_MEMORY_LT_128BYTE_IO", {0x9008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP) from non-CPU priority agents"}, #define PME_ITA2_BUS_MEMORY_LT_128BYTE_SELF 132 { "BUS_MEMORY_LT_128BYTE_SELF", {0xa008a}, 0xf0, 1, {0xf00000}, "Bus Memory Transactions -- number of less than full cache line transactions (BRP, BWP) local processor"}, #define PME_ITA2_BUS_MEM_READ_ALL_ANY 133 { "BUS_MEM_READ_ALL_ANY", {0xf008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEM_READ_ALL_IO 134 { "BUS_MEM_READ_ALL_IO", {0xd008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from non-CPU priority agents"}, #define PME_ITA2_BUS_MEM_READ_ALL_SELF 135 { "BUS_MEM_READ_ALL_SELF", {0xe008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- All memory read transactions from local processor"}, #define PME_ITA2_BUS_MEM_READ_BIL_ANY 136 { "BUS_MEM_READ_BIL_ANY", {0x3008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEM_READ_BIL_IO 137 { "BUS_MEM_READ_BIL_IO", {0x1008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from non-CPU priority agents"}, #define PME_ITA2_BUS_MEM_READ_BIL_SELF 138 { "BUS_MEM_READ_BIL_SELF", {0x2008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of BIL 0-byte memory read invalidate transactions from local processor"}, #define PME_ITA2_BUS_MEM_READ_BRIL_ANY 139 { "BUS_MEM_READ_BRIL_ANY", {0xb008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEM_READ_BRIL_IO 140 { "BUS_MEM_READ_BRIL_IO", {0x9008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from non-CPU priority agents"}, #define PME_ITA2_BUS_MEM_READ_BRIL_SELF 141 { "BUS_MEM_READ_BRIL_SELF", {0xa008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read invalidate transactions from local processor"}, #define PME_ITA2_BUS_MEM_READ_BRL_ANY 142 { "BUS_MEM_READ_BRL_ANY", {0x7008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_MEM_READ_BRL_IO 143 { "BUS_MEM_READ_BRL_IO", {0x5008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from non-CPU priority agents"}, #define PME_ITA2_BUS_MEM_READ_BRL_SELF 144 { "BUS_MEM_READ_BRL_SELF", {0x6008b}, 0xf0, 1, {0xf00000}, "Full Cache Line D/I Memory RD, RD Invalidate, and BRIL -- Number of full cache line memory read transactions from local processor"}, #define PME_ITA2_BUS_MEM_READ_OUT_HI 145 { "BUS_MEM_READ_OUT_HI", {0x94}, 0xf0, 2, {0xf00000}, "Outstanding Memory Read Transactions (upper 2 bits)"}, #define PME_ITA2_BUS_MEM_READ_OUT_LO 146 { "BUS_MEM_READ_OUT_LO", {0x95}, 0xf0, 7, {0xf00000}, "Outstanding Memory Read Transactions (lower 3 bits)"}, #define PME_ITA2_BUS_OOQ_LIVE_REQ_HI 147 { "BUS_OOQ_LIVE_REQ_HI", {0x9a}, 0xf0, 2, {0xf00000}, "Out-of-order Bus Queue Requests (upper 2 bits)"}, #define PME_ITA2_BUS_OOQ_LIVE_REQ_LO 148 { "BUS_OOQ_LIVE_REQ_LO", {0x99}, 0xf0, 7, {0xf00000}, "Out-of-order Bus Queue Requests (lower 3 bits)"}, #define PME_ITA2_BUS_RD_DATA_ANY 149 { "BUS_RD_DATA_ANY", {0x3008c}, 0xf0, 1, {0xf00000}, "Bus Read Data Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_RD_DATA_IO 150 { "BUS_RD_DATA_IO", {0x1008c}, 0xf0, 1, {0xf00000}, "Bus Read Data Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_RD_DATA_SELF 151 { "BUS_RD_DATA_SELF", {0x2008c}, 0xf0, 1, {0xf00000}, "Bus Read Data Transactions -- local processor"}, #define PME_ITA2_BUS_RD_HIT 152 { "BUS_RD_HIT", {0x80}, 0xf0, 1, {0xf00000}, "Bus Read Hit Clean Non-local Cache Transactions"}, #define PME_ITA2_BUS_RD_HITM 153 { "BUS_RD_HITM", {0x81}, 0xf0, 1, {0xf00000}, "Bus Read Hit Modified Non-local Cache Transactions"}, #define PME_ITA2_BUS_RD_INVAL_ALL_HITM 154 { "BUS_RD_INVAL_ALL_HITM", {0x83}, 0xf0, 1, {0xf00000}, "Bus BRIL Burst Transaction Results in HITM"}, #define PME_ITA2_BUS_RD_INVAL_HITM 155 { "BUS_RD_INVAL_HITM", {0x82}, 0xf0, 1, {0xf00000}, "Bus BIL Transaction Results in HITM"}, #define PME_ITA2_BUS_RD_IO_ANY 156 { "BUS_RD_IO_ANY", {0x30091}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Read Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_RD_IO_IO 157 { "BUS_RD_IO_IO", {0x10091}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Read Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_RD_IO_SELF 158 { "BUS_RD_IO_SELF", {0x20091}, 0xf0, 1, {0xf00000}, "IA-32 Compatible IO Read Transactions -- local processor"}, #define PME_ITA2_BUS_RD_PRTL_ANY 159 { "BUS_RD_PRTL_ANY", {0x3008d}, 0xf0, 1, {0xf00000}, "Bus Read Partial Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_RD_PRTL_IO 160 { "BUS_RD_PRTL_IO", {0x1008d}, 0xf0, 1, {0xf00000}, "Bus Read Partial Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_RD_PRTL_SELF 161 { "BUS_RD_PRTL_SELF", {0x2008d}, 0xf0, 1, {0xf00000}, "Bus Read Partial Transactions -- local processor"}, #define PME_ITA2_BUS_SNOOPQ_REQ 162 { "BUS_SNOOPQ_REQ", {0x96}, 0xf0, 7, {0xf00000}, "Bus Snoop Queue Requests"}, #define PME_ITA2_BUS_SNOOPS_ANY 163 { "BUS_SNOOPS_ANY", {0x30086}, 0xf0, 1, {0xf00000}, "Bus Snoops Total -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_SNOOPS_IO 164 { "BUS_SNOOPS_IO", {0x10086}, 0xf0, 1, {0xf00000}, "Bus Snoops Total -- non-CPU priority agents"}, #define PME_ITA2_BUS_SNOOPS_SELF 165 { "BUS_SNOOPS_SELF", {0x20086}, 0xf0, 1, {0xf00000}, "Bus Snoops Total -- local processor"}, #define PME_ITA2_BUS_SNOOPS_HITM_ANY 166 { "BUS_SNOOPS_HITM_ANY", {0x30085}, 0xf0, 1, {0xf00000}, "Bus Snoops HIT Modified Cache Line -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_SNOOPS_HITM_SELF 167 { "BUS_SNOOPS_HITM_SELF", {0x20085}, 0xf0, 1, {0xf00000}, "Bus Snoops HIT Modified Cache Line -- local processor"}, #define PME_ITA2_BUS_SNOOP_STALL_CYCLES_ANY 168 { "BUS_SNOOP_STALL_CYCLES_ANY", {0x3008f}, 0xf0, 1, {0xf00000}, "Bus Snoop Stall Cycles (from any agent) -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_SNOOP_STALL_CYCLES_SELF 169 { "BUS_SNOOP_STALL_CYCLES_SELF", {0x2008f}, 0xf0, 1, {0xf00000}, "Bus Snoop Stall Cycles (from any agent) -- local processor"}, #define PME_ITA2_BUS_WR_WB_ALL_ANY 170 { "BUS_WR_WB_ALL_ANY", {0xf0092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_BUS_WR_WB_ALL_IO 171 { "BUS_WR_WB_ALL_IO", {0xd0092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- non-CPU priority agents"}, #define PME_ITA2_BUS_WR_WB_ALL_SELF 172 { "BUS_WR_WB_ALL_SELF", {0xe0092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- local processor"}, #define PME_ITA2_BUS_WR_WB_CCASTOUT_ANY 173 { "BUS_WR_WB_CCASTOUT_ANY", {0xb0092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)/Only 0-byte transactions with write back attribute (clean cast outs) will be counted"}, #define PME_ITA2_BUS_WR_WB_CCASTOUT_SELF 174 { "BUS_WR_WB_CCASTOUT_SELF", {0xa0092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- local processor/Only 0-byte transactions with write back attribute (clean cast outs) will be counted"}, #define PME_ITA2_BUS_WR_WB_EQ_128BYTE_ANY 175 { "BUS_WR_WB_EQ_128BYTE_ANY", {0x70092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- CPU or non-CPU (all transactions)./Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_ITA2_BUS_WR_WB_EQ_128BYTE_IO 176 { "BUS_WR_WB_EQ_128BYTE_IO", {0x50092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- non-CPU priority agents/Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_ITA2_BUS_WR_WB_EQ_128BYTE_SELF 177 { "BUS_WR_WB_EQ_128BYTE_SELF", {0x60092}, 0xf0, 1, {0xf00000}, "Bus Write Back Transactions -- local processor/Only cache line transactions with write back or write coalesce attributes will be counted."}, #define PME_ITA2_CPU_CPL_CHANGES 178 { "CPU_CPL_CHANGES", {0x13}, 0xf0, 1, {0xf00000}, "Privilege Level Changes"}, #define PME_ITA2_CPU_CYCLES 179 { "CPU_CYCLES", {0x12}, 0xf0, 1, {0xf00000}, "CPU Cycles"}, #define PME_ITA2_DATA_DEBUG_REGISTER_FAULT 180 { "DATA_DEBUG_REGISTER_FAULT", {0x52}, 0xf0, 1, {0xf00000}, "Fault Due to Data Debug Reg. Match to Load/Store Instruction"}, #define PME_ITA2_DATA_DEBUG_REGISTER_MATCHES 181 { "DATA_DEBUG_REGISTER_MATCHES", {0xc6}, 0xf0, 1, {0xf00007}, "Data Debug Register Matches Data Address of Memory Reference."}, #define PME_ITA2_DATA_EAR_ALAT 182 { "DATA_EAR_ALAT", {0x6c8}, 0xf0, 1, {0xf00007}, "Data EAR ALAT"}, #define PME_ITA2_DATA_EAR_CACHE_LAT1024 183 { "DATA_EAR_CACHE_LAT1024", {0x805c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 1024 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT128 184 { "DATA_EAR_CACHE_LAT128", {0x505c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 128 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT16 185 { "DATA_EAR_CACHE_LAT16", {0x205c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 16 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT2048 186 { "DATA_EAR_CACHE_LAT2048", {0x905c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 2048 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT256 187 { "DATA_EAR_CACHE_LAT256", {0x605c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 256 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT32 188 { "DATA_EAR_CACHE_LAT32", {0x305c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 32 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT4 189 { "DATA_EAR_CACHE_LAT4", {0x5c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 4 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT4096 190 { "DATA_EAR_CACHE_LAT4096", {0xa05c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 4096 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT512 191 { "DATA_EAR_CACHE_LAT512", {0x705c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 512 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT64 192 { "DATA_EAR_CACHE_LAT64", {0x405c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 64 Cycles"}, #define PME_ITA2_DATA_EAR_CACHE_LAT8 193 { "DATA_EAR_CACHE_LAT8", {0x105c8}, 0xf0, 1, {0xf00007}, "Data EAR Cache -- >= 8 Cycles"}, #define PME_ITA2_DATA_EAR_EVENTS 194 { "DATA_EAR_EVENTS", {0xc8}, 0xf0, 1, {0xf00007}, "L1 Data Cache EAR Events"}, #define PME_ITA2_DATA_EAR_TLB_ALL 195 { "DATA_EAR_TLB_ALL", {0xe04c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- All L1 DTLB Misses"}, #define PME_ITA2_DATA_EAR_TLB_FAULT 196 { "DATA_EAR_TLB_FAULT", {0x804c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- DTLB Misses which produce a software fault"}, #define PME_ITA2_DATA_EAR_TLB_L2DTLB 197 { "DATA_EAR_TLB_L2DTLB", {0x204c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB"}, #define PME_ITA2_DATA_EAR_TLB_L2DTLB_OR_FAULT 198 { "DATA_EAR_TLB_L2DTLB_OR_FAULT", {0xa04c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB or produce a software fault"}, #define PME_ITA2_DATA_EAR_TLB_L2DTLB_OR_VHPT 199 { "DATA_EAR_TLB_L2DTLB_OR_VHPT", {0x604c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- L1 DTLB Misses which hit L2 DTLB or VHPT"}, #define PME_ITA2_DATA_EAR_TLB_VHPT 200 { "DATA_EAR_TLB_VHPT", {0x404c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- L1 DTLB Misses which hit VHPT"}, #define PME_ITA2_DATA_EAR_TLB_VHPT_OR_FAULT 201 { "DATA_EAR_TLB_VHPT_OR_FAULT", {0xc04c8}, 0xf0, 1, {0xf00007}, "Data EAR TLB -- L1 DTLB Misses which hit VHPT or produce a software fault"}, #define PME_ITA2_DATA_REFERENCES_SET0 202 { "DATA_REFERENCES_SET0", {0xc3}, 0xf0, 4, {0x5010007}, "Data Memory References Issued to Memory Pipeline"}, #define PME_ITA2_DATA_REFERENCES_SET1 203 { "DATA_REFERENCES_SET1", {0xc5}, 0xf0, 4, {0x5110007}, "Data Memory References Issued to Memory Pipeline"}, #define PME_ITA2_DISP_STALLED 204 { "DISP_STALLED", {0x49}, 0xf0, 1, {0xf00000}, "Number of Cycles Dispersal Stalled"}, #define PME_ITA2_DTLB_INSERTS_HPW 205 { "DTLB_INSERTS_HPW", {0xc9}, 0xf0, 4, {0xf00007}, "Hardware Page Walker Installs to DTLB"}, #define PME_ITA2_DTLB_INSERTS_HPW_RETIRED 206 { "DTLB_INSERTS_HPW_RETIRED", {0x2c}, 0xf0, 4, {0xf00007}, "VHPT Entries Inserted into DTLB by the Hardware Page Walker"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL_ALL_PRED 207 { "ENCBR_MISPRED_DETAIL_ALL_ALL_PRED", {0x63}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- All encoded branches regardless of prediction result"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL_CORRECT_PRED 208 { "ENCBR_MISPRED_DETAIL_ALL_CORRECT_PRED", {0x10063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- All encoded branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL_WRONG_PATH 209 { "ENCBR_MISPRED_DETAIL_ALL_WRONG_PATH", {0x20063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- All encoded branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL_WRONG_TARGET 210 { "ENCBR_MISPRED_DETAIL_ALL_WRONG_TARGET", {0x30063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- All encoded branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL2_ALL_PRED 211 { "ENCBR_MISPRED_DETAIL_ALL2_ALL_PRED", {0xc0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, regardless of prediction result"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL2_CORRECT_PRED 212 { "ENCBR_MISPRED_DETAIL_ALL2_CORRECT_PRED", {0xd0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL2_WRONG_PATH 213 { "ENCBR_MISPRED_DETAIL_ALL2_WRONG_PATH", {0xe0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_ALL2_WRONG_TARGET 214 { "ENCBR_MISPRED_DETAIL_ALL2_WRONG_TARGET", {0xf0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only non-return indirect branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_OVERSUB_ALL_PRED 215 { "ENCBR_MISPRED_DETAIL_OVERSUB_ALL_PRED", {0x80063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only return type branches, regardless of prediction result"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_OVERSUB_CORRECT_PRED 216 { "ENCBR_MISPRED_DETAIL_OVERSUB_CORRECT_PRED", {0x90063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only return type branches, correctly predicted branches (outcome and target)"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_PATH 217 { "ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_PATH", {0xa0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only return type branches, mispredicted branches due to wrong branch direction"}, #define PME_ITA2_ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_TARGET 218 { "ENCBR_MISPRED_DETAIL_OVERSUB_WRONG_TARGET", {0xb0063}, 0xf0, 3, {0xf00003}, "Number of Encoded Branches Retired -- Only return type branches, mispredicted branches due to wrong target for taken branches"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_ALL 219 { "EXTERN_DP_PINS_0_TO_3_ALL", {0xf009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0 220 { "EXTERN_DP_PINS_0_TO_3_PIN0", {0x1009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1 221 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1", {0x3009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin1 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1_OR_PIN2 222 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1_OR_PIN2", {0x7009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin1 or pin2 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1_OR_PIN3 223 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN1_OR_PIN3", {0xb009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin1 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN2 224 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN2", {0x5009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin2 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN2_OR_PIN3 225 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN2_OR_PIN3", {0xd009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin2 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN3 226 { "EXTERN_DP_PINS_0_TO_3_PIN0_OR_PIN3", {0x9009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin0 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN1 227 { "EXTERN_DP_PINS_0_TO_3_PIN1", {0x2009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin1 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN2 228 { "EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN2", {0x6009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin1 or pin2 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN2_OR_PIN3 229 { "EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN2_OR_PIN3", {0xe009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin1 or pin2 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN3 230 { "EXTERN_DP_PINS_0_TO_3_PIN1_OR_PIN3", {0xa009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin1 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN2 231 { "EXTERN_DP_PINS_0_TO_3_PIN2", {0x4009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin2 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN2_OR_PIN3 232 { "EXTERN_DP_PINS_0_TO_3_PIN2_OR_PIN3", {0xc009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin2 or pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_0_TO_3_PIN3 233 { "EXTERN_DP_PINS_0_TO_3_PIN3", {0x8009e}, 0xf0, 1, {0xf00000}, "DP Pins 0-3 Asserted -- include pin3 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_4_TO_5_ALL 234 { "EXTERN_DP_PINS_4_TO_5_ALL", {0x3009f}, 0xf0, 1, {0xf00000}, "DP Pins 4-5 Asserted -- include pin5 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_4_TO_5_PIN4 235 { "EXTERN_DP_PINS_4_TO_5_PIN4", {0x1009f}, 0xf0, 1, {0xf00000}, "DP Pins 4-5 Asserted -- include pin4 assertion"}, #define PME_ITA2_EXTERN_DP_PINS_4_TO_5_PIN5 236 { "EXTERN_DP_PINS_4_TO_5_PIN5", {0x2009f}, 0xf0, 1, {0xf00000}, "DP Pins 4-5 Asserted -- include pin5 assertion"}, #define PME_ITA2_FE_BUBBLE_ALL 237 { "FE_BUBBLE_ALL", {0x71}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- count regardless of cause"}, #define PME_ITA2_FE_BUBBLE_ALLBUT_FEFLUSH_BUBBLE 238 { "FE_BUBBLE_ALLBUT_FEFLUSH_BUBBLE", {0xb0071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- ALL except FEFLUSH and BUBBLE"}, #define PME_ITA2_FE_BUBBLE_ALLBUT_IBFULL 239 { "FE_BUBBLE_ALLBUT_IBFULL", {0xc0071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- ALL except IBFULl"}, #define PME_ITA2_FE_BUBBLE_BRANCH 240 { "FE_BUBBLE_BRANCH", {0x90071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by any of 4 branch recirculates"}, #define PME_ITA2_FE_BUBBLE_BUBBLE 241 { "FE_BUBBLE_BUBBLE", {0xd0071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by branch bubble stall"}, #define PME_ITA2_FE_BUBBLE_FEFLUSH 242 { "FE_BUBBLE_FEFLUSH", {0x10071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by a front-end flush"}, #define PME_ITA2_FE_BUBBLE_FILL_RECIRC 243 { "FE_BUBBLE_FILL_RECIRC", {0x80071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by a recirculate for a cache line fill operation"}, #define PME_ITA2_FE_BUBBLE_GROUP1 244 { "FE_BUBBLE_GROUP1", {0x30071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- BUBBLE or BRANCH"}, #define PME_ITA2_FE_BUBBLE_GROUP2 245 { "FE_BUBBLE_GROUP2", {0x40071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- IMISS or TLBMISS"}, #define PME_ITA2_FE_BUBBLE_GROUP3 246 { "FE_BUBBLE_GROUP3", {0xa0071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- FILL_RECIRC or BRANCH"}, #define PME_ITA2_FE_BUBBLE_IBFULL 247 { "FE_BUBBLE_IBFULL", {0x50071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by instruction buffer full stall"}, #define PME_ITA2_FE_BUBBLE_IMISS 248 { "FE_BUBBLE_IMISS", {0x60071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by instruction cache miss stall"}, #define PME_ITA2_FE_BUBBLE_TLBMISS 249 { "FE_BUBBLE_TLBMISS", {0x70071}, 0xf0, 1, {0xf00000}, "Bubbles Seen by FE -- only if caused by TLB stall"}, #define PME_ITA2_FE_LOST_BW_ALL 250 { "FE_LOST_BW_ALL", {0x70}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- count regardless of cause"}, #define PME_ITA2_FE_LOST_BW_BI 251 { "FE_LOST_BW_BI", {0x90070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch initialization stall"}, #define PME_ITA2_FE_LOST_BW_BRQ 252 { "FE_LOST_BW_BRQ", {0xa0070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch retirement queue stall"}, #define PME_ITA2_FE_LOST_BW_BR_ILOCK 253 { "FE_LOST_BW_BR_ILOCK", {0xc0070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch interlock stall"}, #define PME_ITA2_FE_LOST_BW_BUBBLE 254 { "FE_LOST_BW_BUBBLE", {0xd0070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by branch resteer bubble stall"}, #define PME_ITA2_FE_LOST_BW_FEFLUSH 255 { "FE_LOST_BW_FEFLUSH", {0x10070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by a front-end flush"}, #define PME_ITA2_FE_LOST_BW_FILL_RECIRC 256 { "FE_LOST_BW_FILL_RECIRC", {0x80070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by a recirculate for a cache line fill operation"}, #define PME_ITA2_FE_LOST_BW_IBFULL 257 { "FE_LOST_BW_IBFULL", {0x50070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by instruction buffer full stall"}, #define PME_ITA2_FE_LOST_BW_IMISS 258 { "FE_LOST_BW_IMISS", {0x60070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by instruction cache miss stall"}, #define PME_ITA2_FE_LOST_BW_PLP 259 { "FE_LOST_BW_PLP", {0xb0070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by perfect loop prediction stall"}, #define PME_ITA2_FE_LOST_BW_TLBMISS 260 { "FE_LOST_BW_TLBMISS", {0x70070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by TLB stall"}, #define PME_ITA2_FE_LOST_BW_UNREACHED 261 { "FE_LOST_BW_UNREACHED", {0x40070}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Entrance to IB -- only if caused by unreachable bundle"}, #define PME_ITA2_FP_FAILED_FCHKF 262 { "FP_FAILED_FCHKF", {0x6}, 0xf0, 1, {0xf00001}, "Failed fchkf"}, #define PME_ITA2_FP_FALSE_SIRSTALL 263 { "FP_FALSE_SIRSTALL", {0x5}, 0xf0, 1, {0xf00001}, "SIR Stall Without a Trap"}, #define PME_ITA2_FP_FLUSH_TO_ZERO 264 { "FP_FLUSH_TO_ZERO", {0xb}, 0xf0, 2, {0xf00001}, "FP Result Flushed to Zero"}, #define PME_ITA2_FP_OPS_RETIRED 265 { "FP_OPS_RETIRED", {0x9}, 0xf0, 4, {0xf00001}, "Retired FP Operations"}, #define PME_ITA2_FP_TRUE_SIRSTALL 266 { "FP_TRUE_SIRSTALL", {0x3}, 0xf0, 1, {0xf00001}, "SIR stall asserted and leads to a trap"}, #define PME_ITA2_HPW_DATA_REFERENCES 267 { "HPW_DATA_REFERENCES", {0x2d}, 0xf0, 4, {0xf00007}, "Data Memory References to VHPT"}, #define PME_ITA2_IA32_INST_RETIRED 268 { "IA32_INST_RETIRED", {0x59}, 0xf0, 2, {0xf00000}, "IA-32 Instructions Retired"}, #define PME_ITA2_IA32_ISA_TRANSITIONS 269 { "IA32_ISA_TRANSITIONS", {0x7}, 0xf0, 1, {0xf00000}, "IA-64 to/from IA-32 ISA Transitions"}, #define PME_ITA2_IA64_INST_RETIRED 270 { "IA64_INST_RETIRED", {0x8}, 0xf0, 6, {0xf00003}, "Retired IA-64 Instructions, alias to IA64_INST_RETIRED_THIS"}, #define PME_ITA2_IA64_INST_RETIRED_THIS 271 { "IA64_INST_RETIRED_THIS", {0x8}, 0xf0, 6, {0xf00003}, "Retired IA-64 Instructions -- Retired IA-64 Instructions"}, #define PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP0_PMC8 272 { "IA64_TAGGED_INST_RETIRED_IBRP0_PMC8", {0x8}, 0xf0, 6, {0xf00003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 0 and opcode matcher PMC8. Code executed with PSR.is=1 is included."}, #define PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP1_PMC9 273 { "IA64_TAGGED_INST_RETIRED_IBRP1_PMC9", {0x10008}, 0xf0, 6, {0xf00003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 1 and opcode matcher PMC9. Code executed with PSR.is=1 is included."}, #define PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP2_PMC8 274 { "IA64_TAGGED_INST_RETIRED_IBRP2_PMC8", {0x20008}, 0xf0, 6, {0xf00003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 2 and opcode matcher PMC8. Code executed with PSR.is=1 is not included."}, #define PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP3_PMC9 275 { "IA64_TAGGED_INST_RETIRED_IBRP3_PMC9", {0x30008}, 0xf0, 6, {0xf00003}, "Retired Tagged Instructions -- Instruction tagged by Instruction Breakpoint Pair 3 and opcode matcher PMC9. Code executed with PSR.is=1 is not included."}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_ALL 276 { "IDEAL_BE_LOST_BW_DUE_TO_FE_ALL", {0x73}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- count regardless of cause"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_BI 277 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BI", {0x90073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by branch initialization stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_BRQ 278 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BRQ", {0xa0073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by branch retirement queue stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_BR_ILOCK 279 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BR_ILOCK", {0xc0073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by branch interlock stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_BUBBLE 280 { "IDEAL_BE_LOST_BW_DUE_TO_FE_BUBBLE", {0xd0073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by branch resteer bubble stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_FEFLUSH 281 { "IDEAL_BE_LOST_BW_DUE_TO_FE_FEFLUSH", {0x10073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by a front-end flush"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC 282 { "IDEAL_BE_LOST_BW_DUE_TO_FE_FILL_RECIRC", {0x80073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by a recirculate for a cache line fill operation"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_IBFULL 283 { "IDEAL_BE_LOST_BW_DUE_TO_FE_IBFULL", {0x50073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- (* meaningless for this event *)"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_IMISS 284 { "IDEAL_BE_LOST_BW_DUE_TO_FE_IMISS", {0x60073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by instruction cache miss stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_PLP 285 { "IDEAL_BE_LOST_BW_DUE_TO_FE_PLP", {0xb0073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by perfect loop prediction stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_TLBMISS 286 { "IDEAL_BE_LOST_BW_DUE_TO_FE_TLBMISS", {0x70073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by TLB stall"}, #define PME_ITA2_IDEAL_BE_LOST_BW_DUE_TO_FE_UNREACHED 287 { "IDEAL_BE_LOST_BW_DUE_TO_FE_UNREACHED", {0x40073}, 0xf0, 2, {0xf00000}, "Invalid Bundles at the Exit from IB -- only if caused by unreachable bundle"}, #define PME_ITA2_INST_CHKA_LDC_ALAT_ALL 288 { "INST_CHKA_LDC_ALAT_ALL", {0x30056}, 0xf0, 2, {0xf00007}, "Retired chk.a and ld.c Instructions -- both integer and floating point instructions"}, #define PME_ITA2_INST_CHKA_LDC_ALAT_FP 289 { "INST_CHKA_LDC_ALAT_FP", {0x20056}, 0xf0, 2, {0xf00007}, "Retired chk.a and ld.c Instructions -- only floating point instructions"}, #define PME_ITA2_INST_CHKA_LDC_ALAT_INT 290 { "INST_CHKA_LDC_ALAT_INT", {0x10056}, 0xf0, 2, {0xf00007}, "Retired chk.a and ld.c Instructions -- only integer instructions"}, #define PME_ITA2_INST_DISPERSED 291 { "INST_DISPERSED", {0x4d}, 0xf0, 6, {0xf00001}, "Syllables Dispersed from REN to REG stage"}, #define PME_ITA2_INST_FAILED_CHKA_LDC_ALAT_ALL 292 { "INST_FAILED_CHKA_LDC_ALAT_ALL", {0x30057}, 0xf0, 1, {0xf00007}, "Failed chk.a and ld.c Instructions -- both integer and floating point instructions"}, #define PME_ITA2_INST_FAILED_CHKA_LDC_ALAT_FP 293 { "INST_FAILED_CHKA_LDC_ALAT_FP", {0x20057}, 0xf0, 1, {0xf00007}, "Failed chk.a and ld.c Instructions -- only floating point instructions"}, #define PME_ITA2_INST_FAILED_CHKA_LDC_ALAT_INT 294 { "INST_FAILED_CHKA_LDC_ALAT_INT", {0x10057}, 0xf0, 1, {0xf00007}, "Failed chk.a and ld.c Instructions -- only integer instructions"}, #define PME_ITA2_INST_FAILED_CHKS_RETIRED_ALL 295 { "INST_FAILED_CHKS_RETIRED_ALL", {0x30055}, 0xf0, 1, {0xf00000}, "Failed chk.s Instructions -- both integer and floating point instructions"}, #define PME_ITA2_INST_FAILED_CHKS_RETIRED_FP 296 { "INST_FAILED_CHKS_RETIRED_FP", {0x20055}, 0xf0, 1, {0xf00000}, "Failed chk.s Instructions -- only floating point instructions"}, #define PME_ITA2_INST_FAILED_CHKS_RETIRED_INT 297 { "INST_FAILED_CHKS_RETIRED_INT", {0x10055}, 0xf0, 1, {0xf00000}, "Failed chk.s Instructions -- only integer instructions"}, #define PME_ITA2_ISB_BUNPAIRS_IN 298 { "ISB_BUNPAIRS_IN", {0x46}, 0xf0, 1, {0xf00001}, "Bundle Pairs Written from L2 into FE"}, #define PME_ITA2_ITLB_MISSES_FETCH_ALL 299 { "ITLB_MISSES_FETCH_ALL", {0x30047}, 0xf0, 1, {0xf00001}, "ITLB Misses Demand Fetch -- All tlb misses will be counted. Note that this is not equal to sum of the L1ITLB and L2ITLB umasks because any access could be a miss in L1ITLB and L2ITLB."}, #define PME_ITA2_ITLB_MISSES_FETCH_L1ITLB 300 { "ITLB_MISSES_FETCH_L1ITLB", {0x10047}, 0xf0, 1, {0xf00001}, "ITLB Misses Demand Fetch -- All misses in L1ITLB will be counted. even if L1ITLB is not updated for an access (Uncacheable/nat page/not present page/faulting/some flushed), it will be counted here."}, #define PME_ITA2_ITLB_MISSES_FETCH_L2ITLB 301 { "ITLB_MISSES_FETCH_L2ITLB", {0x20047}, 0xf0, 1, {0xf00001}, "ITLB Misses Demand Fetch -- All misses in L1ITLB which also missed in L2ITLB will be counted."}, #define PME_ITA2_L1DTLB_TRANSFER 302 { "L1DTLB_TRANSFER", {0xc0}, 0xf0, 1, {0x5010007}, "L1DTLB Misses That Hit in the L2DTLB for Accesses Counted in L1D_READS"}, #define PME_ITA2_L1D_READS_SET0 303 { "L1D_READS_SET0", {0xc2}, 0xf0, 2, {0x5010007}, "L1 Data Cache Reads"}, #define PME_ITA2_L1D_READS_SET1 304 { "L1D_READS_SET1", {0xc4}, 0xf0, 2, {0x5110007}, "L1 Data Cache Reads"}, #define PME_ITA2_L1D_READ_MISSES_ALL 305 { "L1D_READ_MISSES_ALL", {0xc7}, 0xf0, 2, {0x5110007}, "L1 Data Cache Read Misses -- all L1D read misses will be counted."}, #define PME_ITA2_L1D_READ_MISSES_RSE_FILL 306 { "L1D_READ_MISSES_RSE_FILL", {0x100c7}, 0xf0, 2, {0x5110007}, "L1 Data Cache Read Misses -- only L1D read misses caused by RSE fills will be counted"}, #define PME_ITA2_L1ITLB_INSERTS_HPW 307 { "L1ITLB_INSERTS_HPW", {0x48}, 0xf0, 1, {0xf00001}, "L1ITLB Hardware Page Walker Inserts"}, #define PME_ITA2_L1I_EAR_CACHE_LAT0 308 { "L1I_EAR_CACHE_LAT0", {0x400343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- > 0 Cycles (All L1 Misses)"}, #define PME_ITA2_L1I_EAR_CACHE_LAT1024 309 { "L1I_EAR_CACHE_LAT1024", {0xc00343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 1024 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT128 310 { "L1I_EAR_CACHE_LAT128", {0xf00343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 128 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT16 311 { "L1I_EAR_CACHE_LAT16", {0xfc0343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 16 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT256 312 { "L1I_EAR_CACHE_LAT256", {0xe00343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 256 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT32 313 { "L1I_EAR_CACHE_LAT32", {0xf80343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 32 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT4 314 { "L1I_EAR_CACHE_LAT4", {0xff0343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 4 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT4096 315 { "L1I_EAR_CACHE_LAT4096", {0x800343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 4096 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_LAT8 316 { "L1I_EAR_CACHE_LAT8", {0xfe0343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- >= 8 Cycles"}, #define PME_ITA2_L1I_EAR_CACHE_RAB 317 { "L1I_EAR_CACHE_RAB", {0x343}, 0xf0, 1, {0xf00001}, "L1I EAR Cache -- RAB HIT"}, #define PME_ITA2_L1I_EAR_EVENTS 318 { "L1I_EAR_EVENTS", {0x43}, 0xf0, 1, {0xf00001}, "Instruction EAR Events"}, #define PME_ITA2_L1I_EAR_TLB_ALL 319 { "L1I_EAR_TLB_ALL", {0x70243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- All L1 ITLB Misses"}, #define PME_ITA2_L1I_EAR_TLB_FAULT 320 { "L1I_EAR_TLB_FAULT", {0x40243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- ITLB Misses which produced a fault"}, #define PME_ITA2_L1I_EAR_TLB_L2TLB 321 { "L1I_EAR_TLB_L2TLB", {0x10243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB"}, #define PME_ITA2_L1I_EAR_TLB_L2TLB_OR_FAULT 322 { "L1I_EAR_TLB_L2TLB_OR_FAULT", {0x50243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB or produce a software fault"}, #define PME_ITA2_L1I_EAR_TLB_L2TLB_OR_VHPT 323 { "L1I_EAR_TLB_L2TLB_OR_VHPT", {0x30243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- L1 ITLB Misses which hit L2 ITLB or VHPT"}, #define PME_ITA2_L1I_EAR_TLB_VHPT 324 { "L1I_EAR_TLB_VHPT", {0x20243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- L1 ITLB Misses which hit VHPT"}, #define PME_ITA2_L1I_EAR_TLB_VHPT_OR_FAULT 325 { "L1I_EAR_TLB_VHPT_OR_FAULT", {0x60243}, 0xf0, 1, {0xf00001}, "L1I EAR TLB -- L1 ITLB Misses which hit VHPT or produce a software fault"}, #define PME_ITA2_L1I_FETCH_ISB_HIT 326 { "L1I_FETCH_ISB_HIT", {0x66}, 0xf0, 1, {0xf00001}, "\"Just-In-Time\" Instruction Fetch Hitting in and Being Bypassed from ISB"}, #define PME_ITA2_L1I_FETCH_RAB_HIT 327 { "L1I_FETCH_RAB_HIT", {0x65}, 0xf0, 1, {0xf00001}, "Instruction Fetch Hitting in RAB"}, #define PME_ITA2_L1I_FILLS 328 { "L1I_FILLS", {0x41}, 0xf0, 1, {0xf00001}, "L1 Instruction Cache Fills"}, #define PME_ITA2_L1I_PREFETCHES 329 { "L1I_PREFETCHES", {0x44}, 0xf0, 1, {0xf00001}, "L1 Instruction Prefetch Requests"}, #define PME_ITA2_L1I_PREFETCH_STALL_ALL 330 { "L1I_PREFETCH_STALL_ALL", {0x30067}, 0xf0, 1, {0xf00000}, "Prefetch Pipeline Stalls -- Number of clocks prefetch pipeline is stalled"}, #define PME_ITA2_L1I_PREFETCH_STALL_FLOW 331 { "L1I_PREFETCH_STALL_FLOW", {0x20067}, 0xf0, 1, {0xf00000}, "Prefetch Pipeline Stalls -- Number of clocks flow is not asserted"}, #define PME_ITA2_L1I_PURGE 332 { "L1I_PURGE", {0x4b}, 0xf0, 1, {0xf00001}, "L1ITLB Purges Handled by L1I"}, #define PME_ITA2_L1I_PVAB_OVERFLOW 333 { "L1I_PVAB_OVERFLOW", {0x69}, 0xf0, 1, {0xf00000}, "PVAB Overflow"}, #define PME_ITA2_L1I_RAB_ALMOST_FULL 334 { "L1I_RAB_ALMOST_FULL", {0x64}, 0xf0, 1, {0xf00000}, "Is RAB Almost Full?"}, #define PME_ITA2_L1I_RAB_FULL 335 { "L1I_RAB_FULL", {0x60}, 0xf0, 1, {0xf00000}, "Is RAB Full?"}, #define PME_ITA2_L1I_READS 336 { "L1I_READS", {0x40}, 0xf0, 1, {0xf00001}, "L1 Instruction Cache Reads"}, #define PME_ITA2_L1I_SNOOP 337 { "L1I_SNOOP", {0x4a}, 0xf0, 1, {0xf00007}, "Snoop Requests Handled by L1I"}, #define PME_ITA2_L1I_STRM_PREFETCHES 338 { "L1I_STRM_PREFETCHES", {0x5f}, 0xf0, 1, {0xf00001}, "L1 Instruction Cache Line Prefetch Requests"}, #define PME_ITA2_L2DTLB_MISSES 339 { "L2DTLB_MISSES", {0xc1}, 0xf0, 4, {0x5010007}, "L2DTLB Misses"}, #define PME_ITA2_L2_BAD_LINES_SELECTED_ANY 340 { "L2_BAD_LINES_SELECTED_ANY", {0xb9}, 0xf0, 4, {0x4320007}, "Valid Line Replaced When Invalid Line Is Available -- Valid line replaced when invalid line is available"}, #define PME_ITA2_L2_BYPASS_L2_DATA1 341 { "L2_BYPASS_L2_DATA1", {0xb8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L2 data bypasses (L1D to L2A)"}, #define PME_ITA2_L2_BYPASS_L2_DATA2 342 { "L2_BYPASS_L2_DATA2", {0x100b8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L2 data bypasses (L1W to L2I)"}, #define PME_ITA2_L2_BYPASS_L2_INST1 343 { "L2_BYPASS_L2_INST1", {0x400b8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L2 instruction bypasses (L1D to L2A)"}, #define PME_ITA2_L2_BYPASS_L2_INST2 344 { "L2_BYPASS_L2_INST2", {0x500b8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L2 instruction bypasses (L1W to L2I)"}, #define PME_ITA2_L2_BYPASS_L3_DATA1 345 { "L2_BYPASS_L3_DATA1", {0x200b8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L3 data bypasses (L1D to L2A)"}, #define PME_ITA2_L2_BYPASS_L3_INST1 346 { "L2_BYPASS_L3_INST1", {0x600b8}, 0xf0, 1, {0x4320007}, "Count L2 Bypasses -- Count only L3 instruction bypasses (L1D to L2A)"}, #define PME_ITA2_L2_DATA_REFERENCES_L2_ALL 347 { "L2_DATA_REFERENCES_L2_ALL", {0x300b2}, 0xf0, 4, {0x4120007}, "Data Read/Write Access to L2 -- count both read and write operations (semaphores will count as 2)"}, #define PME_ITA2_L2_DATA_REFERENCES_L2_DATA_READS 348 { "L2_DATA_REFERENCES_L2_DATA_READS", {0x100b2}, 0xf0, 4, {0x4120007}, "Data Read/Write Access to L2 -- count only data read and semaphore operations."}, #define PME_ITA2_L2_DATA_REFERENCES_L2_DATA_WRITES 349 { "L2_DATA_REFERENCES_L2_DATA_WRITES", {0x200b2}, 0xf0, 4, {0x4120007}, "Data Read/Write Access to L2 -- count only data write and semaphore operations"}, #define PME_ITA2_L2_FILLB_FULL_THIS 350 { "L2_FILLB_FULL_THIS", {0xbf}, 0xf0, 1, {0x4520000}, "L2D Fill Buffer Is Full -- L2 Fill buffer is full"}, #define PME_ITA2_L2_FORCE_RECIRC_ANY 351 { "L2_FORCE_RECIRC_ANY", {0xb4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count forced recirculates regardless of cause. SMC_HIT, TRAN_PREF & SNP_OR_L3 will not be included here."}, #define PME_ITA2_L2_FORCE_RECIRC_FILL_HIT 352 { "L2_FORCE_RECIRC_FILL_HIT", {0x900b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by an L2 miss which hit in the fill buffer."}, #define PME_ITA2_L2_FORCE_RECIRC_FRC_RECIRC 353 { "L2_FORCE_RECIRC_FRC_RECIRC", {0xe00b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- caused by an L2 miss when a force recirculate already existed"}, #define PME_ITA2_L2_FORCE_RECIRC_IPF_MISS 354 { "L2_FORCE_RECIRC_IPF_MISS", {0xa00b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- caused by L2 miss when instruction prefetch buffer miss already existed"}, #define PME_ITA2_L2_FORCE_RECIRC_L1W 355 { "L2_FORCE_RECIRC_L1W", {0x200b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by forced limbo"}, #define PME_ITA2_L2_FORCE_RECIRC_OZQ_MISS 356 { "L2_FORCE_RECIRC_OZQ_MISS", {0xc00b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- caused by an L2 miss when an OZQ miss already existed"}, #define PME_ITA2_L2_FORCE_RECIRC_SAME_INDEX 357 { "L2_FORCE_RECIRC_SAME_INDEX", {0xd00b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- caused by an L2 miss when a miss to the same index already existed"}, #define PME_ITA2_L2_FORCE_RECIRC_SMC_HIT 358 { "L2_FORCE_RECIRC_SMC_HIT", {0x100b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by SMC hits due to an ifetch and load to same cache line or a pending WT store"}, #define PME_ITA2_L2_FORCE_RECIRC_SNP_OR_L3 359 { "L2_FORCE_RECIRC_SNP_OR_L3", {0x600b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by a snoop or L3 issue"}, #define PME_ITA2_L2_FORCE_RECIRC_TAG_NOTOK 360 { "L2_FORCE_RECIRC_TAG_NOTOK", {0x400b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by L2 hits caused by in flight snoops, stores with a sibling miss to the same index, sibling probe to the same line or pending sync.ia instructions."}, #define PME_ITA2_L2_FORCE_RECIRC_TRAN_PREF 361 { "L2_FORCE_RECIRC_TRAN_PREF", {0x500b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by transforms to prefetches"}, #define PME_ITA2_L2_FORCE_RECIRC_VIC_BUF_FULL 362 { "L2_FORCE_RECIRC_VIC_BUF_FULL", {0xb00b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by an L2 miss with victim buffer full"}, #define PME_ITA2_L2_FORCE_RECIRC_VIC_PEND 363 { "L2_FORCE_RECIRC_VIC_PEND", {0x800b4}, 0x10, 4, {0x4220007}, "Forced Recirculates -- count only those caused by an L2 miss with pending victim"}, #define PME_ITA2_L2_GOT_RECIRC_IFETCH_ANY 364 { "L2_GOT_RECIRC_IFETCH_ANY", {0x800ba}, 0xf0, 1, {0x4420007}, "Instruction Fetch Recirculates Received by L2D -- Instruction fetch recirculates received by L2"}, #define PME_ITA2_L2_GOT_RECIRC_OZQ_ACC 365 { "L2_GOT_RECIRC_OZQ_ACC", {0xb6}, 0xf0, 1, {0x4220007}, "Counts Number of OZQ Accesses Recirculated to L1D"}, #define PME_ITA2_L2_IFET_CANCELS_ANY 366 { "L2_IFET_CANCELS_ANY", {0xa1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- total instruction fetch cancels by L2"}, #define PME_ITA2_L2_IFET_CANCELS_BYPASS 367 { "L2_IFET_CANCELS_BYPASS", {0x200a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels due to bypassing"}, #define PME_ITA2_L2_IFET_CANCELS_CHG_PRIO 368 { "L2_IFET_CANCELS_CHG_PRIO", {0xc00a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels due to change priority"}, #define PME_ITA2_L2_IFET_CANCELS_DATA_RD 369 { "L2_IFET_CANCELS_DATA_RD", {0x700a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch/prefetch cancels due to a data read"}, #define PME_ITA2_L2_IFET_CANCELS_DIDNT_RECIR 370 { "L2_IFET_CANCELS_DIDNT_RECIR", {0x400a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels because it did not recirculate"}, #define PME_ITA2_L2_IFET_CANCELS_IFETCH_BYP 371 { "L2_IFET_CANCELS_IFETCH_BYP", {0xd00a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- due to ifetch bypass during last clock"}, #define PME_ITA2_L2_IFET_CANCELS_PREEMPT 372 { "L2_IFET_CANCELS_PREEMPT", {0x800a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels due to preempts"}, #define PME_ITA2_L2_IFET_CANCELS_RECIR_OVER_SUB 373 { "L2_IFET_CANCELS_RECIR_OVER_SUB", {0x500a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels because of recirculate oversubscription"}, #define PME_ITA2_L2_IFET_CANCELS_ST_FILL_WB 374 { "L2_IFET_CANCELS_ST_FILL_WB", {0x600a1}, 0xf0, 1, {0x4020007}, "Instruction Fetch Cancels by the L2 -- ifetch cancels due to a store or fill or write back"}, #define PME_ITA2_L2_INST_DEMAND_READS 375 { "L2_INST_DEMAND_READS", {0x42}, 0xf0, 1, {0xf00001}, "L2 Instruction Demand Fetch Requests"}, #define PME_ITA2_L2_INST_PREFETCHES 376 { "L2_INST_PREFETCHES", {0x45}, 0xf0, 1, {0xf00001}, "L2 Instruction Prefetch Requests"}, #define PME_ITA2_L2_ISSUED_RECIRC_IFETCH_ANY 377 { "L2_ISSUED_RECIRC_IFETCH_ANY", {0x800b9}, 0xf0, 1, {0x4420007}, "Instruction Fetch Recirculates Issued by L2 -- Instruction fetch recirculates issued by L2"}, #define PME_ITA2_L2_ISSUED_RECIRC_OZQ_ACC 378 { "L2_ISSUED_RECIRC_OZQ_ACC", {0xb5}, 0xf0, 1, {0x4220007}, "Count Number of Times a Recirculate Issue Was Attempted and Not Preempted"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_ANY 379 { "L2_L3ACCESS_CANCEL_ANY", {0x900b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- count cancels due to any reason. This umask will count more than the sum of all the other umasks. It will count things that weren't committed accesses when they reached L1w, but the L2 attempted to bypass them to the L3 anyway (speculatively). This will include accesses made repeatedly while the main pipeline is stalled and the L1d is attempting to recirculate an access down the L1d pipeline. Thus, an access could get counted many times before it really does get bypassed to the L3. It is a measure of how many times we asserted a request to the L3 but didn't confirm it."}, #define PME_ITA2_L2_L3ACCESS_CANCEL_DFETCH 380 { "L2_L3ACCESS_CANCEL_DFETCH", {0xa00b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- data fetches"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_EBL_REJECT 381 { "L2_L3ACCESS_CANCEL_EBL_REJECT", {0x800b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- ebl rejects"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_FILLD_FULL 382 { "L2_L3ACCESS_CANCEL_FILLD_FULL", {0x200b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- filld being full"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_IFETCH 383 { "L2_L3ACCESS_CANCEL_IFETCH", {0xb00b0}, 0xf0, 1, {0x4120007}, "Canceled L3 Accesses -- instruction fetches"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_INV_L3_BYP 384 { "L2_L3ACCESS_CANCEL_INV_L3_BYP", {0x600b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- invalid L3 bypasses"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_SPEC_L3_BYP 385 { "L2_L3ACCESS_CANCEL_SPEC_L3_BYP", {0x100b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- speculative L3 bypasses"}, #define PME_ITA2_L2_L3ACCESS_CANCEL_UC_BLOCKED 386 { "L2_L3ACCESS_CANCEL_UC_BLOCKED", {0x500b0}, 0x10, 1, {0x4120007}, "Canceled L3 Accesses -- Uncacheable blocked L3 Accesses"}, #define PME_ITA2_L2_MISSES 387 { "L2_MISSES", {0xcb}, 0xf0, 1, {0xf00007}, "L2 Misses"}, #define PME_ITA2_L2_OPS_ISSUED_FP_LOAD 388 { "L2_OPS_ISSUED_FP_LOAD", {0x900b8}, 0xf0, 4, {0x4420007}, "Different Operations Issued by L2D -- Count only valid floating point loads"}, #define PME_ITA2_L2_OPS_ISSUED_INT_LOAD 389 { "L2_OPS_ISSUED_INT_LOAD", {0x800b8}, 0xf0, 4, {0x4420007}, "Different Operations Issued by L2D -- Count only valid integer loads"}, #define PME_ITA2_L2_OPS_ISSUED_NST_NLD 390 { "L2_OPS_ISSUED_NST_NLD", {0xc00b8}, 0xf0, 4, {0x4420007}, "Different Operations Issued by L2D -- Count only valid non-load, no-store accesses"}, #define PME_ITA2_L2_OPS_ISSUED_RMW 391 { "L2_OPS_ISSUED_RMW", {0xa00b8}, 0xf0, 4, {0x4420007}, "Different Operations Issued by L2D -- Count only valid read_modify_write stores"}, #define PME_ITA2_L2_OPS_ISSUED_STORE 392 { "L2_OPS_ISSUED_STORE", {0xb00b8}, 0xf0, 4, {0x4420007}, "Different Operations Issued by L2D -- Count only valid non-read_modify_write stores"}, #define PME_ITA2_L2_OZDB_FULL_THIS 393 { "L2_OZDB_FULL_THIS", {0xbd}, 0xf0, 1, {0x4520000}, "L2 OZ Data Buffer Is Full -- L2 OZ Data Buffer is full"}, #define PME_ITA2_L2_OZQ_ACQUIRE 394 { "L2_OZQ_ACQUIRE", {0xa2}, 0xf0, 1, {0x4020000}, "Clocks With Acquire Ordering Attribute Existed in L2 OZQ"}, #define PME_ITA2_L2_OZQ_CANCELS0_ANY 395 { "L2_OZQ_CANCELS0_ANY", {0xa0}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Late or Any) -- counts the total OZ Queue cancels"}, #define PME_ITA2_L2_OZQ_CANCELS0_LATE_ACQUIRE 396 { "L2_OZQ_CANCELS0_LATE_ACQUIRE", {0x300a0}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Late or Any) -- counts the late cancels caused by acquires"}, #define PME_ITA2_L2_OZQ_CANCELS0_LATE_BYP_EFFRELEASE 397 { "L2_OZQ_CANCELS0_LATE_BYP_EFFRELEASE", {0x400a0}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Late or Any) -- counts the late cancels caused by L1D to L2A bypass effective releases"}, #define PME_ITA2_L2_OZQ_CANCELS0_LATE_RELEASE 398 { "L2_OZQ_CANCELS0_LATE_RELEASE", {0x200a0}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Late or Any) -- counts the late cancels caused by releases"}, #define PME_ITA2_L2_OZQ_CANCELS0_LATE_SPEC_BYP 399 { "L2_OZQ_CANCELS0_LATE_SPEC_BYP", {0x100a0}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Late or Any) -- counts the late cancels caused by speculative bypasses"}, #define PME_ITA2_L2_OZQ_CANCELS1_BANK_CONF 400 { "L2_OZQ_CANCELS1_BANK_CONF", {0x100ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- bank conflicts"}, #define PME_ITA2_L2_OZQ_CANCELS1_CANC_L2M_ST 401 { "L2_OZQ_CANCELS1_CANC_L2M_ST", {0x600ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- caused by a canceled store in L2M"}, #define PME_ITA2_L2_OZQ_CANCELS1_CCV 402 { "L2_OZQ_CANCELS1_CCV", {0x900ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a ccv"}, #define PME_ITA2_L2_OZQ_CANCELS1_ECC 403 { "L2_OZQ_CANCELS1_ECC", {0xf00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- ECC hardware detecting a problem"}, #define PME_ITA2_L2_OZQ_CANCELS1_HPW_IFETCH_CONF 404 { "L2_OZQ_CANCELS1_HPW_IFETCH_CONF", {0x500ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a ifetch conflict (canceling HPW?)"}, #define PME_ITA2_L2_OZQ_CANCELS1_L1DF_L2M 405 { "L2_OZQ_CANCELS1_L1DF_L2M", {0xe00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- L1D fill in L2M"}, #define PME_ITA2_L2_OZQ_CANCELS1_L1_FILL_CONF 406 { "L2_OZQ_CANCELS1_L1_FILL_CONF", {0x700ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- an L1 fill conflict"}, #define PME_ITA2_L2_OZQ_CANCELS1_L2A_ST_MAT 407 { "L2_OZQ_CANCELS1_L2A_ST_MAT", {0xd00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a store match in L2A"}, #define PME_ITA2_L2_OZQ_CANCELS1_L2D_ST_MAT 408 { "L2_OZQ_CANCELS1_L2D_ST_MAT", {0x200ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a store match in L2D"}, #define PME_ITA2_L2_OZQ_CANCELS1_L2M_ST_MAT 409 { "L2_OZQ_CANCELS1_L2M_ST_MAT", {0xb00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a store match in L2M"}, #define PME_ITA2_L2_OZQ_CANCELS1_MFA 410 { "L2_OZQ_CANCELS1_MFA", {0xc00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a memory fence instruction"}, #define PME_ITA2_L2_OZQ_CANCELS1_REL 411 { "L2_OZQ_CANCELS1_REL", {0xac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- caused by release"}, #define PME_ITA2_L2_OZQ_CANCELS1_SEM 412 { "L2_OZQ_CANCELS1_SEM", {0xa00ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a semaphore"}, #define PME_ITA2_L2_OZQ_CANCELS1_ST_FILL_CONF 413 { "L2_OZQ_CANCELS1_ST_FILL_CONF", {0x800ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- a store fill conflict"}, #define PME_ITA2_L2_OZQ_CANCELS1_SYNC 414 { "L2_OZQ_CANCELS1_SYNC", {0x400ac}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 1) -- caused by sync.i"}, #define PME_ITA2_L2_OZQ_CANCELS2_ACQ 415 { "L2_OZQ_CANCELS2_ACQ", {0x400a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- caused by an acquire"}, #define PME_ITA2_L2_OZQ_CANCELS2_CANC_L2C_ST 416 { "L2_OZQ_CANCELS2_CANC_L2C_ST", {0x100a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- caused by a canceled store in L2C"}, #define PME_ITA2_L2_OZQ_CANCELS2_CANC_L2D_ST 417 { "L2_OZQ_CANCELS2_CANC_L2D_ST", {0xd00a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- caused by a canceled store in L2D"}, #define PME_ITA2_L2_OZQ_CANCELS2_DIDNT_RECIRC 418 { "L2_OZQ_CANCELS2_DIDNT_RECIRC", {0x900a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- caused because it did not recirculate"}, #define PME_ITA2_L2_OZQ_CANCELS2_D_IFET 419 { "L2_OZQ_CANCELS2_D_IFET", {0xf00a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- a demand ifetch"}, #define PME_ITA2_L2_OZQ_CANCELS2_L2C_ST_MAT 420 { "L2_OZQ_CANCELS2_L2C_ST_MAT", {0x200a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- a store match in L2C"}, #define PME_ITA2_L2_OZQ_CANCELS2_L2FILL_ST_CONF 421 { "L2_OZQ_CANCELS2_L2FILL_ST_CONF", {0x800a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- a L2fill and store conflict in L2C"}, #define PME_ITA2_L2_OZQ_CANCELS2_OVER_SUB 422 { "L2_OZQ_CANCELS2_OVER_SUB", {0xc00a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- oversubscription"}, #define PME_ITA2_L2_OZQ_CANCELS2_OZ_DATA_CONF 423 { "L2_OZQ_CANCELS2_OZ_DATA_CONF", {0x600a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- an OZ data conflict"}, #define PME_ITA2_L2_OZQ_CANCELS2_READ_WB_CONF 424 { "L2_OZQ_CANCELS2_READ_WB_CONF", {0x500a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- a write back conflict (canceling read?)"}, #define PME_ITA2_L2_OZQ_CANCELS2_RECIRC_OVER_SUB 425 { "L2_OZQ_CANCELS2_RECIRC_OVER_SUB", {0xa8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- caused by a recirculate oversubscription"}, #define PME_ITA2_L2_OZQ_CANCELS2_SCRUB 426 { "L2_OZQ_CANCELS2_SCRUB", {0x300a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- 32/64 byte HPW/L2D fill which needs scrub"}, #define PME_ITA2_L2_OZQ_CANCELS2_WEIRD 427 { "L2_OZQ_CANCELS2_WEIRD", {0xa00a8}, 0xf0, 4, {0x4020007}, "L2 OZQ Cancels (Specific Reason Set 2) -- counts the cancels caused by attempted 5-cycle bypasses for non-aligned accesses and bypasses blocking recirculates for too long"}, #define PME_ITA2_L2_OZQ_FULL_THIS 428 { "L2_OZQ_FULL_THIS", {0xbc}, 0xf0, 1, {0x4520000}, "L2D OZQ Is Full -- L2D OZQ is full"}, #define PME_ITA2_L2_OZQ_RELEASE 429 { "L2_OZQ_RELEASE", {0xa3}, 0xf0, 1, {0x4020000}, "Clocks With Release Ordering Attribute Existed in L2 OZQ"}, #define PME_ITA2_L2_REFERENCES 430 { "L2_REFERENCES", {0xb1}, 0xf0, 4, {0x4120007}, "Requests Made To L2"}, #define PME_ITA2_L2_STORE_HIT_SHARED_ANY 431 { "L2_STORE_HIT_SHARED_ANY", {0xba}, 0xf0, 2, {0x4320007}, "Store Hit a Shared Line -- Store hit a shared line"}, #define PME_ITA2_L2_SYNTH_PROBE 432 { "L2_SYNTH_PROBE", {0xb7}, 0xf0, 1, {0x4220007}, "Synthesized Probe"}, #define PME_ITA2_L2_VICTIMB_FULL_THIS 433 { "L2_VICTIMB_FULL_THIS", {0xbe}, 0xf0, 1, {0x4520000}, "L2D Victim Buffer Is Full -- L2D victim buffer is full"}, #define PME_ITA2_L3_LINES_REPLACED 434 { "L3_LINES_REPLACED", {0xdf}, 0xf0, 1, {0xf00000}, "L3 Cache Lines Replaced"}, #define PME_ITA2_L3_MISSES 435 { "L3_MISSES", {0xdc}, 0xf0, 1, {0xf00007}, "L3 Misses"}, #define PME_ITA2_L3_READS_ALL_ALL 436 { "L3_READS_ALL_ALL", {0xf00dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Read References"}, #define PME_ITA2_L3_READS_ALL_HIT 437 { "L3_READS_ALL_HIT", {0xd00dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Read Hits"}, #define PME_ITA2_L3_READS_ALL_MISS 438 { "L3_READS_ALL_MISS", {0xe00dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Read Misses"}, #define PME_ITA2_L3_READS_DATA_READ_ALL 439 { "L3_READS_DATA_READ_ALL", {0xb00dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Load References (excludes reads for ownership used to satisfy stores)"}, #define PME_ITA2_L3_READS_DATA_READ_HIT 440 { "L3_READS_DATA_READ_HIT", {0x900dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Load Hits (excludes reads for ownership used to satisfy stores)"}, #define PME_ITA2_L3_READS_DATA_READ_MISS 441 { "L3_READS_DATA_READ_MISS", {0xa00dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Load Misses (excludes reads for ownership used to satisfy stores)"}, #define PME_ITA2_L3_READS_DINST_FETCH_ALL 442 { "L3_READS_DINST_FETCH_ALL", {0x300dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Demand Instruction References"}, #define PME_ITA2_L3_READS_DINST_FETCH_HIT 443 { "L3_READS_DINST_FETCH_HIT", {0x100dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Demand Instruction Fetch Hits"}, #define PME_ITA2_L3_READS_DINST_FETCH_MISS 444 { "L3_READS_DINST_FETCH_MISS", {0x200dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Demand Instruction Fetch Misses"}, #define PME_ITA2_L3_READS_INST_FETCH_ALL 445 { "L3_READS_INST_FETCH_ALL", {0x700dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Instruction Fetch and Prefetch References"}, #define PME_ITA2_L3_READS_INST_FETCH_HIT 446 { "L3_READS_INST_FETCH_HIT", {0x500dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Instruction Fetch and Prefetch Hits"}, #define PME_ITA2_L3_READS_INST_FETCH_MISS 447 { "L3_READS_INST_FETCH_MISS", {0x600dd}, 0xf0, 1, {0xf00007}, "L3 Reads -- L3 Instruction Fetch and Prefetch Misses"}, #define PME_ITA2_L3_REFERENCES 448 { "L3_REFERENCES", {0xdb}, 0xf0, 1, {0xf00007}, "L3 References"}, #define PME_ITA2_L3_WRITES_ALL_ALL 449 { "L3_WRITES_ALL_ALL", {0xf00de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Write References"}, #define PME_ITA2_L3_WRITES_ALL_HIT 450 { "L3_WRITES_ALL_HIT", {0xd00de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Write Hits"}, #define PME_ITA2_L3_WRITES_ALL_MISS 451 { "L3_WRITES_ALL_MISS", {0xe00de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Write Misses"}, #define PME_ITA2_L3_WRITES_DATA_WRITE_ALL 452 { "L3_WRITES_DATA_WRITE_ALL", {0x700de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Store References (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_ITA2_L3_WRITES_DATA_WRITE_HIT 453 { "L3_WRITES_DATA_WRITE_HIT", {0x500de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Store Hits (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_ITA2_L3_WRITES_DATA_WRITE_MISS 454 { "L3_WRITES_DATA_WRITE_MISS", {0x600de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L3 Store Misses (excludes L2 write backs, includes L3 read for ownership requests that satisfy stores)"}, #define PME_ITA2_L3_WRITES_L2_WB_ALL 455 { "L3_WRITES_L2_WB_ALL", {0xb00de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L2 Write Back References"}, #define PME_ITA2_L3_WRITES_L2_WB_HIT 456 { "L3_WRITES_L2_WB_HIT", {0x900de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L2 Write Back Hits"}, #define PME_ITA2_L3_WRITES_L2_WB_MISS 457 { "L3_WRITES_L2_WB_MISS", {0xa00de}, 0xf0, 1, {0xf00007}, "L3 Writes -- L2 Write Back Misses"}, #define PME_ITA2_LOADS_RETIRED 458 { "LOADS_RETIRED", {0xcd}, 0xf0, 4, {0x5310007}, "Retired Loads"}, #define PME_ITA2_MEM_READ_CURRENT_ANY 459 { "MEM_READ_CURRENT_ANY", {0x30089}, 0xf0, 1, {0xf00000}, "Current Mem Read Transactions On Bus -- CPU or non-CPU (all transactions)."}, #define PME_ITA2_MEM_READ_CURRENT_IO 460 { "MEM_READ_CURRENT_IO", {0x10089}, 0xf0, 1, {0xf00000}, "Current Mem Read Transactions On Bus -- non-CPU priority agents"}, #define PME_ITA2_MISALIGNED_LOADS_RETIRED 461 { "MISALIGNED_LOADS_RETIRED", {0xce}, 0xf0, 4, {0x5310007}, "Retired Misaligned Load Instructions"}, #define PME_ITA2_MISALIGNED_STORES_RETIRED 462 { "MISALIGNED_STORES_RETIRED", {0xd2}, 0xf0, 2, {0x5410007}, "Retired Misaligned Store Instructions"}, #define PME_ITA2_NOPS_RETIRED 463 { "NOPS_RETIRED", {0x50}, 0xf0, 6, {0xf00003}, "Retired NOP Instructions"}, #define PME_ITA2_PREDICATE_SQUASHED_RETIRED 464 { "PREDICATE_SQUASHED_RETIRED", {0x51}, 0xf0, 6, {0xf00003}, "Instructions Squashed Due to Predicate Off"}, #define PME_ITA2_RSE_CURRENT_REGS_2_TO_0 465 { "RSE_CURRENT_REGS_2_TO_0", {0x2b}, 0xf0, 7, {0xf00000}, "Current RSE Registers (Bits 2:0)"}, #define PME_ITA2_RSE_CURRENT_REGS_5_TO_3 466 { "RSE_CURRENT_REGS_5_TO_3", {0x2a}, 0xf0, 7, {0xf00000}, "Current RSE Registers (Bits 5:3)"}, #define PME_ITA2_RSE_CURRENT_REGS_6 467 { "RSE_CURRENT_REGS_6", {0x26}, 0xf0, 1, {0xf00000}, "Current RSE Registers (Bit 6)"}, #define PME_ITA2_RSE_DIRTY_REGS_2_TO_0 468 { "RSE_DIRTY_REGS_2_TO_0", {0x29}, 0xf0, 7, {0xf00000}, "Dirty RSE Registers (Bits 2:0)"}, #define PME_ITA2_RSE_DIRTY_REGS_5_TO_3 469 { "RSE_DIRTY_REGS_5_TO_3", {0x28}, 0xf0, 7, {0xf00000}, "Dirty RSE Registers (Bits 5:3)"}, #define PME_ITA2_RSE_DIRTY_REGS_6 470 { "RSE_DIRTY_REGS_6", {0x24}, 0xf0, 1, {0xf00000}, "Dirty RSE Registers (Bit 6)"}, #define PME_ITA2_RSE_EVENT_RETIRED 471 { "RSE_EVENT_RETIRED", {0x32}, 0xf0, 1, {0xf00000}, "Retired RSE operations"}, #define PME_ITA2_RSE_REFERENCES_RETIRED_ALL 472 { "RSE_REFERENCES_RETIRED_ALL", {0x30020}, 0xf0, 2, {0xf00007}, "RSE Accesses -- Both RSE loads and stores will be counted."}, #define PME_ITA2_RSE_REFERENCES_RETIRED_LOAD 473 { "RSE_REFERENCES_RETIRED_LOAD", {0x10020}, 0xf0, 2, {0xf00007}, "RSE Accesses -- Only RSE loads will be counted."}, #define PME_ITA2_RSE_REFERENCES_RETIRED_STORE 474 { "RSE_REFERENCES_RETIRED_STORE", {0x20020}, 0xf0, 2, {0xf00007}, "RSE Accesses -- Only RSE stores will be counted."}, #define PME_ITA2_SERIALIZATION_EVENTS 475 { "SERIALIZATION_EVENTS", {0x53}, 0xf0, 1, {0xf00000}, "Number of srlz.i Instructions"}, #define PME_ITA2_STORES_RETIRED 476 { "STORES_RETIRED", {0xd1}, 0xf0, 2, {0x5410007}, "Retired Stores"}, #define PME_ITA2_SYLL_NOT_DISPERSED_ALL 477 { "SYLL_NOT_DISPERSED_ALL", {0xf004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Counts all syllables not dispersed. NOTE: Any combination of b0000-b1111 is valid."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL 478 { "SYLL_NOT_DISPERSED_EXPL", {0x1004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits. These consist of programmer specified architected S-bit and templates 1 and 5. Dispersal takes a 6-syllable (3-syllable) hit for every template 1/5 in bundle 0(1). Dispersal takes a 3-syllable (0 syllable) hit for every S-bit in bundle 0(1)"}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_FE 479 { "SYLL_NOT_DISPERSED_EXPL_OR_FE", {0x5004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or front-end not providing valid bundles or providing valid illegal templates."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_FE_OR_MLI 480 { "SYLL_NOT_DISPERSED_EXPL_OR_FE_OR_MLI", {0xd004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or due to front-end not providing valid bundles or providing valid illegal templates or due to MLI bundle and resteers to non-0 syllable."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_IMPL 481 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL", {0x3004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit/implicit stop bits."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_FE 482 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_FE", {0x7004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit or implicit stop bits or due to front-end not providing valid bundles or providing valid illegal template."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_MLI 483 { "SYLL_NOT_DISPERSED_EXPL_OR_IMPL_OR_MLI", {0xb004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit or implicit stop bits or due to MLI bundle and resteers to non-0 syllable."}, #define PME_ITA2_SYLL_NOT_DISPERSED_EXPL_OR_MLI 484 { "SYLL_NOT_DISPERSED_EXPL_OR_MLI", {0x9004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to explicit stop bits or to MLI bundle and resteers to non-0 syllable."}, #define PME_ITA2_SYLL_NOT_DISPERSED_FE 485 { "SYLL_NOT_DISPERSED_FE", {0x4004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to front-end not providing valid bundles or providing valid illegal templates. Dispersal takes a 3-syllable hit for every invalid bundle or valid illegal template from front-end. Bundle 1 with front-end fault, is counted here (3-syllable hit).."}, #define PME_ITA2_SYLL_NOT_DISPERSED_FE_OR_MLI 486 { "SYLL_NOT_DISPERSED_FE_OR_MLI", {0xc004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to MLI bundle and resteers to non-0 syllable or due to front-end not providing valid bundles or providing valid illegal templates."}, #define PME_ITA2_SYLL_NOT_DISPERSED_IMPL 487 { "SYLL_NOT_DISPERSED_IMPL", {0x2004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits. These consist of all of the non-architected stop bits (asymmetry, oversubscription, implicit). Dispersal takes a 6-syllable(3-syllable) hit for every implicit stop bits in bundle 0(1)."}, #define PME_ITA2_SYLL_NOT_DISPERSED_IMPL_OR_FE 488 { "SYLL_NOT_DISPERSED_IMPL_OR_FE", {0x6004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or to front-end not providing valid bundles or providing valid illegal templates."}, #define PME_ITA2_SYLL_NOT_DISPERSED_IMPL_OR_FE_OR_MLI 489 { "SYLL_NOT_DISPERSED_IMPL_OR_FE_OR_MLI", {0xe004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or due to front-end not providing valid bundles or providing valid illegal templates or due to MLI bundle and resteers to non-0 syllable."}, #define PME_ITA2_SYLL_NOT_DISPERSED_IMPL_OR_MLI 490 { "SYLL_NOT_DISPERSED_IMPL_OR_MLI", {0xa004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to implicit stop bits or to MLI bundle and resteers to non-0 syllable."}, #define PME_ITA2_SYLL_NOT_DISPERSED_MLI 491 { "SYLL_NOT_DISPERSED_MLI", {0x8004e}, 0xf0, 5, {0xf00001}, "Syllables Not Dispersed -- Count syllables not dispersed due to MLI bundle and resteers to non-0 syllable. Dispersal takes a 1 syllable hit for each MLI bundle . Dispersal could take 0-2 syllable hit depending on which syllable we resteer to. Bundle 1 with front-end fault which is split, is counted here (0-2 syllable hit)."}, #define PME_ITA2_SYLL_OVERCOUNT_ALL 492 { "SYLL_OVERCOUNT_ALL", {0x3004f}, 0xf0, 2, {0xf00001}, "Syllables Overcounted -- syllables overcounted in implicit & explicit bucket"}, #define PME_ITA2_SYLL_OVERCOUNT_EXPL 493 { "SYLL_OVERCOUNT_EXPL", {0x1004f}, 0xf0, 2, {0xf00001}, "Syllables Overcounted -- Only syllables overcounted in the explicit bucket"}, #define PME_ITA2_SYLL_OVERCOUNT_IMPL 494 { "SYLL_OVERCOUNT_IMPL", {0x2004f}, 0xf0, 2, {0xf00001}, "Syllables Overcounted -- Only syllables overcounted in the implicit bucket"}, #define PME_ITA2_UC_LOADS_RETIRED 495 { "UC_LOADS_RETIRED", {0xcf}, 0xf0, 4, {0x5310007}, "Retired Uncacheable Loads"}, #define PME_ITA2_UC_STORES_RETIRED 496 { "UC_STORES_RETIRED", {0xd0}, 0xf0, 2, {0x5410007}, "Retired Uncacheable Stores"}, }; #define PME_ITA2_EVENT_COUNT 497 papi-5.3.0/src/libpfm4/lib/events/amd64_events_fam14h.h0000600003276200002170000012457412247131123022161 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: amd64_fam14h (AMD64 Fam14h) */ static const amd64_umask_t amd64_fam14h_dispatched_fpu[]={ { .uname = "PIPE0", .udesc = "Pipe 0 (fadd, imul, mmx) ops", .ucode = 0x1, }, { .uname = "PIPE1", .udesc = "Pipe 1 (fmul, store, mmx) ops", .ucode = 0x2, }, { .uname = "ANY", .udesc = "Pipe 1 and Pipe 0 ops", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_retired_sse_operations[]={ { .uname = "SINGLE_ADD_SUB_OPS", .udesc = "Single precision add/subtract ops", .ucode = 0x1, }, { .uname = "SINGLE_MUL_OPS", .udesc = "Single precision multiply ops", .ucode = 0x2, }, { .uname = "SINGLE_DIV_OPS", .udesc = "Single precision divide/square root ops", .ucode = 0x4, }, { .uname = "DOUBLE_ADD_SUB_OPS", .udesc = "Double precision add/subtract ops", .ucode = 0x8, }, { .uname = "DOUBLE_MUL_OPS", .udesc = "Double precision multiply ops", .ucode = 0x10, }, { .uname = "DOUBLE_DIV_OPS", .udesc = "Double precision divide/square root ops", .ucode = 0x20, }, { .uname = "OP_TYPE", .udesc = "Op type: 0=uops. 1=FLOPS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_retired_move_ops[]={ { .uname = "ALL_OTHER_MERGING_MOVE_UOPS", .udesc = "All other merging move uops", .ucode = 0x4, }, { .uname = "ALL_OTHER_MOVE_UOPS", .udesc = "All other move uops", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xc, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_retired_serializing_ops[]={ { .uname = "SSE_BOTTOM_EXECUTING_UOPS", .udesc = "SSE bottom-executing uops retired", .ucode = 0x1, }, { .uname = "SSE_BOTTOM_SERIALIZING_UOPS", .udesc = "SSE bottom-serializing uops retired", .ucode = 0x2, }, { .uname = "X87_BOTTOM_EXECUTING_UOPS", .udesc = "X87 bottom-executing uops retired", .ucode = 0x4, }, { .uname = "X87_BOTTOM_SERIALIZING_UOPS", .udesc = "X87 bottom-serializing uops retired", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_retired_x87_fpu_ops[]={ { .uname = "ADD_SUB_OPS", .udesc = "Add/substract ops", .ucode = 0x1, }, { .uname = "MULT_OPS", .udesc = "Multiply ops", .ucode = 0x2, }, { .uname = "DIV_FSQRT_OPS", .udesc = "Divide and fqsrt ops", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_locked_ops[]={ { .uname = "EXECUTED", .udesc = "Number of locked instructions executed", .ucode = 0x1, }, { .uname = "BUS_LOCK", .udesc = "Number of cycles to acquire bus lock", .ucode = 0x2, }, { .uname = "UNLOCK_LINE", .udesc = "Number of cycles to unlock line (not including cache miss)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_cancelled_store_to_load_forward_operations[]={ { .uname = "ADDRESS_MISMATCHES", .udesc = "Address mismatches (starting byte not the same).", .ucode = 0x1, }, { .uname = "STORE_IS_SMALLER_THAN_LOAD", .udesc = "Store is smaller than load.", .ucode = 0x2, }, { .uname = "MISALIGNED", .udesc = "Misaligned.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_data_cache_refills[]={ { .uname = "UNCACHEABLE", .udesc = "From non-cacheable data", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "From shared lines", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "From exclusive lines", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "From owned lines", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "From modified lines", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_data_cache_refills_from_nb[]={ { .uname = "UNCACHEABLE", .udesc = "Uncacheable data", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_data_cache_lines_evicted[]={ { .uname = "PROBE", .udesc = "Eviction from probe", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared eviction", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive eviction", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned eviction", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified eviction", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_dtlb_miss[]={ { .uname = "STORES_L1TLB_MISS", .udesc = "Stores that miss L1TLB", .ucode = 0x1, }, { .uname = "LOADS_L1TLB_MISS", .udesc = "Loads that miss L1TLB", .ucode = 0x2, }, { .uname = "STORES_L2TLB_MISS", .udesc = "Stores that miss L2TLB", .ucode = 0x4, }, { .uname = "LOADS_L2TLB_MISS", .udesc = "Loads that miss L2TLB", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_l1_dtlb_hit[]={ { .uname = "L1_4K_TLB_HIT", .udesc = "L1 4K TLB hit", .ucode = 0x1, }, { .uname = "L1_2M_TLB_HIT", .udesc = "L1 2M TLB hit", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_dcache_sw_prefetches[]={ { .uname = "HIT", .udesc = "SW prefetch hit in the data cache", .ucode = 0x1, }, { .uname = "PENDING_FILL", .udesc = "SW prefetch hit a pending fill", .ucode = 0x2, }, { .uname = "NO_MAB", .udesc = "SW prefetch does not get a MAB", .ucode = 0x4, }, { .uname = "L2_HIT", .udesc = "SW prefetch hits L2", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to write-combining (WC) memory or WC buffer flushes to WB memory", .ucode = 0x2, }, { .uname = "STREAMING_STORE", .udesc = "Streaming store (SS) requests", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x83, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_mab_requests[]={ { .uname = "DC_BUFFER_0", .udesc = "Data cache buffer 0", .ucode = 0x0, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_1", .udesc = "Data cache buffer 1", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_2", .udesc = "Data cache buffer 2", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_3", .udesc = "Data cache buffer 3", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_4", .udesc = "Data cache buffer 4", .ucode = 0x4, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_5", .udesc = "Data cache buffer 5", .ucode = 0x5, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_6", .udesc = "Data cache buffer 6", .ucode = 0x6, .uflags= AMD64_FL_NCOMBO, }, { .uname = "DC_BUFFER_7", .udesc = "Data cache buffer 7", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO, }, { .uname = "IC_BUFFER_0", .udesc = "Instruction cache Buffer 1", .ucode = 0x8, .uflags= AMD64_FL_NCOMBO, }, { .uname = "IC_BUFFER_1", .udesc = "Instructions cache buffer 1", .ucode = 0x9, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ANY_IC_BUFFER", .udesc = "Any instruction cache buffer", .ucode = 0xa, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ANY_DC_BUFFER", .udesc = "Any data cache buffer", .ucode = 0xb, .uflags= AMD64_FL_NCOMBO, }, }; static const amd64_umask_t amd64_fam14h_system_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "DATA_ERROR", .udesc = "Data Error", .ucode = 0x10, }, { .uname = "DIRTY_SUCCESS", .udesc = "Change-to-dirty success", .ucode = 0x20, }, { .uname = "UNCACHEABLE", .udesc = "Uncacheable", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "SNOOP", .udesc = "Tag snoop request", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xb, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas EventSelect 041h does not)", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_l2_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills (victims from L1 caches, TLB page table walks and data prefetches)", .ucode = 0x1, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system.", .ucode = 0x2, }, { .uname = "IC_ATTR_WRITES_L2_ACCESS", .udesc = "Ic attribute writes which access the L2", .ucode = 0x4, }, { .uname = "IC_ATTR_WRITES_L2_WRITES", .udesc = "Ic attribute writes which store into the L2", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_l1_itlb_miss_and_l2_itlb_miss[]={ { .uname = "4K_PAGE_FETCHES", .udesc = "Instruction fetches to a 4K page.", .ucode = 0x1, }, { .uname = "2M_PAGE_FETCHES", .udesc = "Instruction fetches to a 2M page.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_instruction_cache_lines_invalidated[]={ { .uname = "INVALIDATING_LS_PROBE", .udesc = "IC invalidate due to an LS probe", .ucode = 0x1, }, { .uname = "INVALIDATING_BU_PROBE", .udesc = "IC invalidate due to a BU probe", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_retired_floating_point_instructions[]={ { .uname = "X87", .udesc = "X87 or MMX instructions", .ucode = 0x1, }, { .uname = "SSE", .udesc = "SSE (SSE, SSE2, SSE3, MNI) instructions", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_fpu_exceptions[]={ { .uname = "X87_RECLASS_MICROFAULTS", .udesc = "X87 reclass microfaults", .ucode = 0x1, }, { .uname = "SSE_RETYPE_MICROFAULTS", .udesc = "SSE retype microfaults", .ucode = 0x2, }, { .uname = "SSE_RECLASS_MICROFAULTS", .udesc = "SSE reclass microfaults", .ucode = 0x4, }, { .uname = "SSE_AND_X87_MICROTRAPS", .udesc = "SSE and x87 microtraps", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_dram_accesses_page[]={ { .uname = "HIT", .udesc = "DCT0 Page hit", .ucode = 0x1, }, { .uname = "MISS", .udesc = "DCT0 Page Miss", .ucode = 0x2, }, { .uname = "CONFLICT", .udesc = "DCT0 Page Conflict", .ucode = 0x4, }, { .uname = "WRITE_REQUEST", .udesc = "Write request", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x47, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_memory_controller_page_table[]={ { .uname = "DCT0_PAGE_TABLE_OVERFLOW", .udesc = "DCT0 Page Table Overflow", .ucode = 0x1, }, { .uname = "DCT0_PAGE_TABLE_STALE_HIT", .udesc = "DCT0 number of stale table entry hits (hit on a page closed too soon)", .ucode = 0x2, }, { .uname = "DCT0_PAGE_TABLE_IDLE_INC", .udesc = "DCT0 page table idle cycle limit incremented", .ucode = 0x4, }, { .uname = "DCT0_PAGE_TABLE_IDLE_DEC", .udesc = "DCT0 page table idle cycle limit decremented", .ucode = 0x8, }, { .uname = "DCT0_PAGE_TABLE_CLOSED", .udesc = "DCT0 page table is closed due to row inactivity", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_memory_controller_slot_misses[]={ { .uname = "DCT0_RBD", .udesc = "DCT0 RBD", .ucode = 0x10, }, { .uname = "DCT0_PREFETCH", .udesc = "DCT0 prefetch", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x50, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_memory_controller_rbd_queue_events[]={ { .uname = "DCQ_BYPASS_MAX", .udesc = "DCQ_BYPASS_MAX counter reached", .ucode = 0x4, }, { .uname = "BANK_CLOSED", .udesc = "Bank is closed due to bank conflict with an outstanding request in the RBD queue", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xc, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_thermal_status[]={ { .uname = "MEMHOT_L", .udesc = "MEMHOT_L assertions", .ucode = 0x1, }, { .uname = "HTC_TRANSITION", .udesc = "Number of times HTC transitions from inactive to active", .ucode = 0x4, }, { .uname = "CLOCKS_HTC_P_STATE_INACTIVE", .udesc = "Number of clocks HTC P-state is inactive.", .ucode = 0x20, }, { .uname = "CLOCKS_HTC_P_STATE_ACTIVE", .udesc = "Number of clocks HTC P-state is active", .ucode = 0x40, }, { .uname = "PROCHOT_L", .udesc = "PROCHOT_L asserted by an external source and the assertion causes a P-state change", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xc5, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_cpu_io_requests_to_memory_io[]={ { .uname = "I_O_TO_I_O", .udesc = "IO to IO", .ucode = 0x1, }, { .uname = "I_O_TO_MEM", .udesc = "IO to Mem", .ucode = 0x2, }, { .uname = "CPU_TO_I_O", .udesc = "CPU to IO", .ucode = 0x4, }, { .uname = "CPU_TO_MEM", .udesc = "CPU to Mem", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_cache_block[]={ { .uname = "VICTIM_WRITEBACK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "DCACHE_LOAD_MISS", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "SHARED_ICACHE_REFILL", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "CHANGE_TO_DIRTY", .udesc = "Change-to-Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_sized_commands[]={ { .uname = "NON_POSTED_WRITE_BYTE", .udesc = "Non-Posted SzWr Byte (1-32 bytes) Legacy or mapped IO, typically 1-4 bytes", .ucode = 0x1, }, { .uname = "NON_POSTED_WRITE_DWORD", .udesc = "Non-Posted SzWr DW (1-16 dwords) Legacy or mapped IO, typically 1 DWORD", .ucode = 0x2, }, { .uname = "POSTED_WRITE_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes) Subcache-line DMA writes, size varies; also flushes of partially-filled Write Combining buffer", .ucode = 0x4, }, { .uname = "POSTED_WRITE_DWORD", .udesc = "Posted SzWr DW (1-16 dwords) Block-oriented DMA writes, often cache-line sized; also processor Write Combining buffer flushes", .ucode = 0x8, }, { .uname = "READ_BYTE_4_BYTES", .udesc = "SzRd Byte (4 bytes) Legacy or mapped IO", .ucode = 0x10, }, { .uname = "READ_DWORD_1_16_DWORDS", .udesc = "SzRd DW (1-16 dwords) Block-oriented DMA reads, typically cache-line size", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_probe[]={ { .uname = "MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "HIT_DIRTY_NO_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_HIGH_PRIO_READS", .udesc = "Upstream high priority reads", .ucode = 0x10, }, { .uname = "UPSTREAM_LOW_PRIO_READS", .udesc = "Upstream low priority reads", .ucode = 0x20, }, { .uname = "UPSTREAM_LOW_PRIO_WRITES", .udesc = "Upstream non-ISOC writes", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xbf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_dev_events[]={ { .uname = "HIT", .udesc = "DEV hit", .ucode = 0x10, }, { .uname = "MISS", .udesc = "DEV miss", .ucode = 0x20, }, { .uname = "ERROR", .udesc = "DEV error", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x70, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_memory_controller_requests[]={ { .uname = "32_BYTES_WRITES", .udesc = "32 Bytes Sized Writes", .ucode = 0x8, }, { .uname = "64_BYTES_WRITES", .udesc = "64 Bytes Sized Writes", .ucode = 0x10, }, { .uname = "32_BYTES_READS", .udesc = "32 Bytes Sized Reads", .ucode = 0x20, }, { .uname = "64_BYTES_READS", .udesc = "64 Byte Sized Reads", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x78, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_sideband_signals_special_signals[]={ { .uname = "STOPGRANT", .udesc = "Stopgrant", .ucode = 0x2, }, { .uname = "SHUTDOWN", .udesc = "Shutdown", .ucode = 0x4, }, { .uname = "WBINVD", .udesc = "Wbinvd", .ucode = 0x8, }, { .uname = "INVD", .udesc = "Invd", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1c, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_interrupt_events[]={ { .uname = "FIXED_AND_LPA", .udesc = "Fixed and LPA", .ucode = 0x1, }, { .uname = "LPA", .udesc = "LPA", .ucode = 0x2, }, { .uname = "SMI", .udesc = "SMI", .ucode = 0x4, }, { .uname = "NMI", .udesc = "NMI", .ucode = 0x8, }, { .uname = "INIT", .udesc = "INIT", .ucode = 0x10, }, { .uname = "STARTUP", .udesc = "STARTUP", .ucode = 0x20, }, { .uname = "INT", .udesc = "INT", .ucode = 0x40, }, { .uname = "EOI", .udesc = "EOI", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam14h_pdc_miss[]={ { .uname = "HOST_PDE_LEVEL", .udesc = "Host PDE level", .ucode = 0x1, }, { .uname = "HOST_PDPE_LEVEL", .udesc = "Host PDPE level", .ucode = 0x2, }, { .uname = "HOST_PML4E_LEVEL", .udesc = "Host PML4E level", .ucode = 0x4, }, { .uname = "GUEST_PDE_LEVEL", .udesc = "Guest PDE level", .ucode = 0x10, }, { .uname = "GUEST_PDPE_LEVEL", .udesc = "Guest PDPE level", .ucode = 0x20, }, { .uname = "GUEST_PML4E_LEVEL", .udesc = "Guest PML4E level", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x67, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_fam14h_pe[]={ { .name = "DISPATCHED_FPU", .desc = "Number of uops dispatched to FPU execution pipelines", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_dispatched_fpu), .ngrp = 1, .umasks = amd64_fam14h_dispatched_fpu, }, { .name = "CYCLES_NO_FPU_OPS_RETIRED", .desc = "Cycles in which the FPU is Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1, }, { .name = "DISPATCHED_FPU_OPS_FAST_FLAG", .desc = "Dispatched Fast Flag FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2, }, { .name = "RETIRED_SSE_OPERATIONS", .desc = "Retired SSE Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_retired_sse_operations), .ngrp = 1, .umasks = amd64_fam14h_retired_sse_operations, }, { .name = "RETIRED_MOVE_OPS", .desc = "Retired Move Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_retired_move_ops), .ngrp = 1, .umasks = amd64_fam14h_retired_move_ops, }, { .name = "RETIRED_SERIALIZING_OPS", .desc = "Retired Serializing Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_retired_serializing_ops), .ngrp = 1, .umasks = amd64_fam14h_retired_serializing_ops, }, { .name = "RETIRED_X87_FPU_OPS", .desc = "Number of x87 floating points ops that have retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x11, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_retired_x87_fpu_ops), .ngrp = 1, .umasks = amd64_fam14h_retired_x87_fpu_ops, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_segment_register_loads), .ngrp = 1, .umasks = amd64_fam14h_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline Restart Due to Self-Modifying Code", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline Restart Due to Probe Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x22, }, { .name = "RSQ_FULL", .desc = "Number of cycles that the RSQ holds retired stores. This buffer holds the stores waiting to retired as well as requests that missed the data cacge and waiting on a refill", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x23, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_locked_ops), .ngrp = 1, .umasks = amd64_fam14h_locked_ops, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x27, }, { .name = "CANCELLED_STORE_TO_LOAD_FORWARD_OPERATIONS", .desc = "Cancelled Store to Load Forward Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_cancelled_store_to_load_forward_operations), .ngrp = 1, .umasks = amd64_fam14h_cancelled_store_to_load_forward_operations, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2 or Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_data_cache_refills), .ngrp = 1, .umasks = amd64_fam14h_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_NB", .desc = "Data Cache Refills from the Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_data_cache_refills_from_nb), .ngrp = 1, .umasks = amd64_fam14h_data_cache_refills_from_nb, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_data_cache_lines_evicted), .ngrp = 1, .umasks = amd64_fam14h_data_cache_lines_evicted, }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "Number of data cache accesses that miss in the L1 DTLB and hit the L2 DTLB. This is a speculative event", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x45, }, { .name = "DTLB_MISS", .desc = "L1 DTLB and L2 DTLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x46, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_dtlb_miss), .ngrp = 1, .umasks = amd64_fam14h_dtlb_miss, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x47, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_fam14h_prefetch_instructions_dispatched, }, { .name = "DCACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .desc = "DCACHE Misses by Locked Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4c, }, { .name = "L1_DTLB_HIT", .desc = "L1 DTLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_l1_dtlb_hit), .ngrp = 1, .umasks = amd64_fam14h_l1_dtlb_hit, }, { .name = "DCACHE_SW_PREFETCHES", .desc = "Number of software prefetches that do not cuase an actual data cache refill", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x52, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_dcache_sw_prefetches), .ngrp = 1, .umasks = amd64_fam14h_dcache_sw_prefetches, }, { .name = "GLOBAL_TLB_FLUSHES", .desc = "Global TLB Flushes", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x54, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_memory_requests), .ngrp = 1, .umasks = amd64_fam14h_memory_requests, }, { .name = "MAB_REQUESTS", .desc = "Number of L1 I-cache and D-cache misses per buffer. Average latency by combining with MAB_WAIT_CYCLES.", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_mab_requests), .ngrp = 1, .umasks = amd64_fam14h_mab_requests, }, { .name = "MAB_WAIT_CYCLES", .desc = "Latency of L1 I-cache and D-cache misses per buffer. Average latency by combining with MAB_REQUESTS.", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_mab_requests), .ngrp = 1, .umasks = amd64_fam14h_mab_requests, /* identical to actual umasks list for this event */ }, { .name = "SYSTEM_READ_RESPONSES", .desc = "Northbridge Read Responses by Coherency State", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_system_read_responses), .ngrp = 1, .umasks = amd64_fam14h_system_read_responses, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x76, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_requests_to_l2), .ngrp = 1, .umasks = amd64_fam14h_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_l2_cache_miss), .ngrp = 1, .umasks = amd64_fam14h_l2_cache_miss, }, { .name = "L2_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_l2_fill_writeback), .ngrp = 1, .umasks = amd64_fam14h_l2_fill_writeback, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_l1_itlb_miss_and_l2_itlb_miss), .ngrp = 1, .umasks = amd64_fam14h_l1_itlb_miss_and_l2_itlb_miss, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x89, }, { .name = "INSTRUCTION_CACHE_VICTIMS", .desc = "Instruction Cache Victims", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8b, }, { .name = "INSTRUCTION_CACHE_LINES_INVALIDATED", .desc = "Instruction Cache Lines Invalidated", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_instruction_cache_lines_invalidated), .ngrp = 1, .umasks = amd64_fam14h_instruction_cache_lines_invalidated, }, { .name = "ITLB_RELOADS", .desc = "ITLB Reloads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x99, }, { .name = "ITLB_RELOADS_ABORTED", .desc = "ITLB Reloads Aborted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x9a, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xca, }, { .name = "RETIRED_FLOATING_POINT_INSTRUCTIONS", .desc = "Retired SSE/MMX/FP Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_retired_floating_point_instructions), .ngrp = 1, .umasks = amd64_fam14h_retired_floating_point_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcf, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_fpu_exceptions), .ngrp = 1, .umasks = amd64_fam14h_fpu_exceptions, }, { .name = "DR0_BREAKPOINT_MATCHES", .desc = "DR0 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINT_MATCHES", .desc = "DR1 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINT_MATCHES", .desc = "DR2 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINT_MATCHES", .desc = "DR3 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdf, }, { .name = "DRAM_ACCESSES_PAGE", .desc = "DRAM Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_dram_accesses_page), .ngrp = 1, .umasks = amd64_fam14h_dram_accesses_page, }, { .name = "MEMORY_CONTROLLER_PAGE_TABLE", .desc = "Number of page table events in the local DRAM controller", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_memory_controller_page_table), .ngrp = 1, .umasks = amd64_fam14h_memory_controller_page_table, }, { .name = "MEMORY_CONTROLLER_SLOT_MISSES", .desc = "Memory Controller DRAM Command Slots Missed", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_memory_controller_slot_misses), .ngrp = 1, .umasks = amd64_fam14h_memory_controller_slot_misses, }, { .name = "MEMORY_CONTROLLER_RBD_QUEUE_EVENTS", .desc = "Memory Controller Bypass Counter Saturation", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_memory_controller_rbd_queue_events), .ngrp = 1, .umasks = amd64_fam14h_memory_controller_rbd_queue_events, }, { .name = "THERMAL_STATUS", .desc = "Thermal Status", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_thermal_status), .ngrp = 1, .umasks = amd64_fam14h_thermal_status, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_fam14h_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK", .desc = "Cache Block Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_cache_block), .ngrp = 1, .umasks = amd64_fam14h_cache_block, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_sized_commands), .ngrp = 1, .umasks = amd64_fam14h_sized_commands, }, { .name = "PROBE", .desc = "Probe Responses and Upstream Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_probe), .ngrp = 1, .umasks = amd64_fam14h_probe, }, { .name = "DEV_EVENTS", .desc = "DEV Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_dev_events), .ngrp = 1, .umasks = amd64_fam14h_dev_events, }, { .name = "MEMORY_CONTROLLER_REQUESTS", .desc = "Memory Controller Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1f0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_memory_controller_requests), .ngrp = 1, .umasks = amd64_fam14h_memory_controller_requests, }, { .name = "SIDEBAND_SIGNALS_SPECIAL_SIGNALS", .desc = "Sideband signals and special cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_sideband_signals_special_signals), .ngrp = 1, .umasks = amd64_fam14h_sideband_signals_special_signals, }, { .name = "INTERRUPT_EVENTS", .desc = "Interrupt events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1ea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_interrupt_events), .ngrp = 1, .umasks = amd64_fam14h_interrupt_events, }, { .name = "PDC_MISS", .desc = "PDC miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x162, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam14h_pdc_miss), .ngrp = 1, .umasks = amd64_fam14h_pdc_miss, }, }; papi-5.3.0/src/libpfm4/lib/events/power5_events.h0000600003276200002170000053001212247131124021314 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER5_EVENTS_H__ #define __POWER5_EVENTS_H__ /* * File: power5_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define POWER5_PME_PM_LSU_REJECT_RELOAD_CDF 0 #define POWER5_PME_PM_FPU1_SINGLE 1 #define POWER5_PME_PM_L3SB_REF 2 #define POWER5_PME_PM_THRD_PRIO_DIFF_3or4_CYC 3 #define POWER5_PME_PM_INST_FROM_L275_SHR 4 #define POWER5_PME_PM_MRK_DATA_FROM_L375_MOD 5 #define POWER5_PME_PM_DTLB_MISS_4K 6 #define POWER5_PME_PM_CLB_FULL_CYC 7 #define POWER5_PME_PM_MRK_ST_CMPL 8 #define POWER5_PME_PM_LSU_FLUSH_LRQ_FULL 9 #define POWER5_PME_PM_MRK_DATA_FROM_L275_SHR 10 #define POWER5_PME_PM_1INST_CLB_CYC 11 #define POWER5_PME_PM_MEM_SPEC_RD_CANCEL 12 #define POWER5_PME_PM_MRK_DTLB_MISS_16M 13 #define POWER5_PME_PM_FPU_FDIV 14 #define POWER5_PME_PM_FPU_SINGLE 15 #define POWER5_PME_PM_FPU0_FMA 16 #define POWER5_PME_PM_SLB_MISS 17 #define POWER5_PME_PM_LSU1_FLUSH_LRQ 18 #define POWER5_PME_PM_L2SA_ST_HIT 19 #define POWER5_PME_PM_DTLB_MISS 20 #define POWER5_PME_PM_BR_PRED_TA 21 #define POWER5_PME_PM_MRK_DATA_FROM_L375_MOD_CYC 22 #define POWER5_PME_PM_CMPLU_STALL_FXU 23 #define POWER5_PME_PM_EXT_INT 24 #define POWER5_PME_PM_MRK_LSU1_FLUSH_LRQ 25 #define POWER5_PME_PM_LSU1_LDF 26 #define POWER5_PME_PM_MRK_ST_GPS 27 #define POWER5_PME_PM_FAB_CMD_ISSUED 28 #define POWER5_PME_PM_LSU0_SRQ_STFWD 29 #define POWER5_PME_PM_CR_MAP_FULL_CYC 30 #define POWER5_PME_PM_L2SA_RCST_DISP_FAIL_RC_FULL 31 #define POWER5_PME_PM_MRK_LSU0_FLUSH_ULD 32 #define POWER5_PME_PM_LSU_FLUSH_SRQ_FULL 33 #define POWER5_PME_PM_FLUSH_IMBAL 34 #define POWER5_PME_PM_MEM_RQ_DISP_Q16to19 35 #define POWER5_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC 36 #define POWER5_PME_PM_DATA_FROM_L35_MOD 37 #define POWER5_PME_PM_MEM_HI_PRIO_WR_CMPL 38 #define POWER5_PME_PM_FPU1_FDIV 39 #define POWER5_PME_PM_FPU0_FRSP_FCONV 40 #define POWER5_PME_PM_MEM_RQ_DISP 41 #define POWER5_PME_PM_LWSYNC_HELD 42 #define POWER5_PME_PM_FXU_FIN 43 #define POWER5_PME_PM_DSLB_MISS 44 #define POWER5_PME_PM_FXLS1_FULL_CYC 45 #define POWER5_PME_PM_DATA_FROM_L275_SHR 46 #define POWER5_PME_PM_THRD_SEL_T0 47 #define POWER5_PME_PM_PTEG_RELOAD_VALID 48 #define POWER5_PME_PM_LSU_LMQ_LHR_MERGE 49 #define POWER5_PME_PM_MRK_STCX_FAIL 50 #define POWER5_PME_PM_2INST_CLB_CYC 51 #define POWER5_PME_PM_FAB_PNtoVN_DIRECT 52 #define POWER5_PME_PM_PTEG_FROM_L2MISS 53 #define POWER5_PME_PM_CMPLU_STALL_LSU 54 #define POWER5_PME_PM_MRK_DSLB_MISS 55 #define POWER5_PME_PM_LSU_FLUSH_ULD 56 #define POWER5_PME_PM_PTEG_FROM_LMEM 57 #define POWER5_PME_PM_MRK_BRU_FIN 58 #define POWER5_PME_PM_MEM_WQ_DISP_WRITE 59 #define POWER5_PME_PM_MRK_DATA_FROM_L275_MOD_CYC 60 #define POWER5_PME_PM_LSU1_NCLD 61 #define POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_OTHER 62 #define POWER5_PME_PM_SNOOP_PW_RETRY_WQ_PWQ 63 #define POWER5_PME_PM_FPR_MAP_FULL_CYC 64 #define POWER5_PME_PM_FPU1_FULL_CYC 65 #define POWER5_PME_PM_L3SA_ALL_BUSY 66 #define POWER5_PME_PM_3INST_CLB_CYC 67 #define POWER5_PME_PM_MEM_PWQ_DISP_Q2or3 68 #define POWER5_PME_PM_L2SA_SHR_INV 69 #define POWER5_PME_PM_THRESH_TIMEO 70 #define POWER5_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL 71 #define POWER5_PME_PM_THRD_SEL_OVER_GCT_IMBAL 72 #define POWER5_PME_PM_FPU_FSQRT 73 #define POWER5_PME_PM_MRK_LSU0_FLUSH_LRQ 74 #define POWER5_PME_PM_PMC1_OVERFLOW 75 #define POWER5_PME_PM_L3SC_SNOOP_RETRY 76 #define POWER5_PME_PM_DATA_TABLEWALK_CYC 77 #define POWER5_PME_PM_THRD_PRIO_6_CYC 78 #define POWER5_PME_PM_FPU_FEST 79 #define POWER5_PME_PM_FAB_M1toP1_SIDECAR_EMPTY 80 #define POWER5_PME_PM_MRK_DATA_FROM_RMEM 81 #define POWER5_PME_PM_MRK_DATA_FROM_L35_MOD_CYC 82 #define POWER5_PME_PM_MEM_PWQ_DISP 83 #define POWER5_PME_PM_FAB_P1toM1_SIDECAR_EMPTY 84 #define POWER5_PME_PM_LD_MISS_L1_LSU0 85 #define POWER5_PME_PM_SNOOP_PARTIAL_RTRY_QFULL 86 #define POWER5_PME_PM_FPU1_STALL3 87 #define POWER5_PME_PM_GCT_USAGE_80to99_CYC 88 #define POWER5_PME_PM_WORK_HELD 89 #define POWER5_PME_PM_INST_CMPL 90 #define POWER5_PME_PM_LSU1_FLUSH_UST 91 #define POWER5_PME_PM_FXU_IDLE 92 #define POWER5_PME_PM_LSU0_FLUSH_ULD 93 #define POWER5_PME_PM_LSU1_REJECT_LMQ_FULL 94 #define POWER5_PME_PM_GRP_DISP_REJECT 95 #define POWER5_PME_PM_L2SA_MOD_INV 96 #define POWER5_PME_PM_PTEG_FROM_L25_SHR 97 #define POWER5_PME_PM_FAB_CMD_RETRIED 98 #define POWER5_PME_PM_L3SA_SHR_INV 99 #define POWER5_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL 100 #define POWER5_PME_PM_L2SA_RCST_DISP_FAIL_ADDR 101 #define POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_RC_FULL 102 #define POWER5_PME_PM_PTEG_FROM_L375_MOD 103 #define POWER5_PME_PM_MRK_LSU1_FLUSH_UST 104 #define POWER5_PME_PM_BR_ISSUED 105 #define POWER5_PME_PM_MRK_GRP_BR_REDIR 106 #define POWER5_PME_PM_EE_OFF 107 #define POWER5_PME_PM_MEM_RQ_DISP_Q4to7 108 #define POWER5_PME_PM_MEM_FAST_PATH_RD_DISP 109 #define POWER5_PME_PM_INST_FROM_L3 110 #define POWER5_PME_PM_ITLB_MISS 111 #define POWER5_PME_PM_FXU1_BUSY_FXU0_IDLE 112 #define POWER5_PME_PM_FXLS_FULL_CYC 113 #define POWER5_PME_PM_DTLB_REF_4K 114 #define POWER5_PME_PM_GRP_DISP_VALID 115 #define POWER5_PME_PM_LSU_FLUSH_UST 116 #define POWER5_PME_PM_FXU1_FIN 117 #define POWER5_PME_PM_THRD_PRIO_4_CYC 118 #define POWER5_PME_PM_MRK_DATA_FROM_L35_MOD 119 #define POWER5_PME_PM_4INST_CLB_CYC 120 #define POWER5_PME_PM_MRK_DTLB_REF_16M 121 #define POWER5_PME_PM_INST_FROM_L375_MOD 122 #define POWER5_PME_PM_L2SC_RCST_DISP_FAIL_ADDR 123 #define POWER5_PME_PM_GRP_CMPL 124 #define POWER5_PME_PM_FPU1_1FLOP 125 #define POWER5_PME_PM_FPU_FRSP_FCONV 126 #define POWER5_PME_PM_5INST_CLB_CYC 127 #define POWER5_PME_PM_L3SC_REF 128 #define POWER5_PME_PM_THRD_L2MISS_BOTH_CYC 129 #define POWER5_PME_PM_MEM_PW_GATH 130 #define POWER5_PME_PM_FAB_PNtoNN_SIDECAR 131 #define POWER5_PME_PM_FAB_DCLAIM_ISSUED 132 #define POWER5_PME_PM_GRP_IC_MISS 133 #define POWER5_PME_PM_INST_FROM_L35_SHR 134 #define POWER5_PME_PM_LSU_LMQ_FULL_CYC 135 #define POWER5_PME_PM_MRK_DATA_FROM_L2_CYC 136 #define POWER5_PME_PM_LSU_SRQ_SYNC_CYC 137 #define POWER5_PME_PM_LSU0_BUSY_REJECT 138 #define POWER5_PME_PM_LSU_REJECT_ERAT_MISS 139 #define POWER5_PME_PM_MRK_DATA_FROM_RMEM_CYC 140 #define POWER5_PME_PM_DATA_FROM_L375_SHR 141 #define POWER5_PME_PM_FPU0_FMOV_FEST 142 #define POWER5_PME_PM_PTEG_FROM_L25_MOD 143 #define POWER5_PME_PM_LD_REF_L1_LSU0 144 #define POWER5_PME_PM_THRD_PRIO_7_CYC 145 #define POWER5_PME_PM_LSU1_FLUSH_SRQ 146 #define POWER5_PME_PM_L2SC_RCST_DISP 147 #define POWER5_PME_PM_CMPLU_STALL_DIV 148 #define POWER5_PME_PM_MEM_RQ_DISP_Q12to15 149 #define POWER5_PME_PM_INST_FROM_L375_SHR 150 #define POWER5_PME_PM_ST_REF_L1 151 #define POWER5_PME_PM_L3SB_ALL_BUSY 152 #define POWER5_PME_PM_FAB_P1toVNorNN_SIDECAR_EMPTY 153 #define POWER5_PME_PM_MRK_DATA_FROM_L275_SHR_CYC 154 #define POWER5_PME_PM_FAB_HOLDtoNN_EMPTY 155 #define POWER5_PME_PM_DATA_FROM_LMEM 156 #define POWER5_PME_PM_RUN_CYC 157 #define POWER5_PME_PM_PTEG_FROM_RMEM 158 #define POWER5_PME_PM_L2SC_RCLD_DISP 159 #define POWER5_PME_PM_LSU0_LDF 160 #define POWER5_PME_PM_LSU_LRQ_S0_VALID 161 #define POWER5_PME_PM_PMC3_OVERFLOW 162 #define POWER5_PME_PM_MRK_IMR_RELOAD 163 #define POWER5_PME_PM_MRK_GRP_TIMEO 164 #define POWER5_PME_PM_ST_MISS_L1 165 #define POWER5_PME_PM_STOP_COMPLETION 166 #define POWER5_PME_PM_LSU_BUSY_REJECT 167 #define POWER5_PME_PM_ISLB_MISS 168 #define POWER5_PME_PM_CYC 169 #define POWER5_PME_PM_THRD_ONE_RUN_CYC 170 #define POWER5_PME_PM_GRP_BR_REDIR_NONSPEC 171 #define POWER5_PME_PM_LSU1_SRQ_STFWD 172 #define POWER5_PME_PM_L3SC_MOD_INV 173 #define POWER5_PME_PM_L2_PREF 174 #define POWER5_PME_PM_GCT_NOSLOT_BR_MPRED 175 #define POWER5_PME_PM_MRK_DATA_FROM_L25_MOD 176 #define POWER5_PME_PM_L2SB_MOD_INV 177 #define POWER5_PME_PM_L2SB_ST_REQ 178 #define POWER5_PME_PM_MRK_L1_RELOAD_VALID 179 #define POWER5_PME_PM_L3SB_HIT 180 #define POWER5_PME_PM_L2SB_SHR_MOD 181 #define POWER5_PME_PM_EE_OFF_EXT_INT 182 #define POWER5_PME_PM_1PLUS_PPC_CMPL 183 #define POWER5_PME_PM_L2SC_SHR_MOD 184 #define POWER5_PME_PM_PMC6_OVERFLOW 185 #define POWER5_PME_PM_LSU_LRQ_FULL_CYC 186 #define POWER5_PME_PM_IC_PREF_INSTALL 187 #define POWER5_PME_PM_TLB_MISS 188 #define POWER5_PME_PM_GCT_FULL_CYC 189 #define POWER5_PME_PM_FXU_BUSY 190 #define POWER5_PME_PM_MRK_DATA_FROM_L3_CYC 191 #define POWER5_PME_PM_LSU_REJECT_LMQ_FULL 192 #define POWER5_PME_PM_LSU_SRQ_S0_ALLOC 193 #define POWER5_PME_PM_GRP_MRK 194 #define POWER5_PME_PM_INST_FROM_L25_SHR 195 #define POWER5_PME_PM_FPU1_FIN 196 #define POWER5_PME_PM_DC_PREF_STREAM_ALLOC 197 #define POWER5_PME_PM_BR_MPRED_TA 198 #define POWER5_PME_PM_CRQ_FULL_CYC 199 #define POWER5_PME_PM_L2SA_RCLD_DISP 200 #define POWER5_PME_PM_SNOOP_WR_RETRY_QFULL 201 #define POWER5_PME_PM_MRK_DTLB_REF_4K 202 #define POWER5_PME_PM_LSU_SRQ_S0_VALID 203 #define POWER5_PME_PM_LSU0_FLUSH_LRQ 204 #define POWER5_PME_PM_INST_FROM_L275_MOD 205 #define POWER5_PME_PM_GCT_EMPTY_CYC 206 #define POWER5_PME_PM_LARX_LSU0 207 #define POWER5_PME_PM_THRD_PRIO_DIFF_5or6_CYC 208 #define POWER5_PME_PM_SNOOP_RETRY_1AHEAD 209 #define POWER5_PME_PM_FPU1_FSQRT 210 #define POWER5_PME_PM_MRK_LD_MISS_L1_LSU1 211 #define POWER5_PME_PM_MRK_FPU_FIN 212 #define POWER5_PME_PM_THRD_PRIO_5_CYC 213 #define POWER5_PME_PM_MRK_DATA_FROM_LMEM 214 #define POWER5_PME_PM_FPU1_FRSP_FCONV 215 #define POWER5_PME_PM_SNOOP_TLBIE 216 #define POWER5_PME_PM_L3SB_SNOOP_RETRY 217 #define POWER5_PME_PM_FAB_VBYPASS_EMPTY 218 #define POWER5_PME_PM_MRK_DATA_FROM_L275_MOD 219 #define POWER5_PME_PM_6INST_CLB_CYC 220 #define POWER5_PME_PM_L2SB_RCST_DISP 221 #define POWER5_PME_PM_FLUSH 222 #define POWER5_PME_PM_L2SC_MOD_INV 223 #define POWER5_PME_PM_FPU_DENORM 224 #define POWER5_PME_PM_L3SC_HIT 225 #define POWER5_PME_PM_SNOOP_WR_RETRY_RQ 226 #define POWER5_PME_PM_LSU1_REJECT_SRQ 227 #define POWER5_PME_PM_IC_PREF_REQ 228 #define POWER5_PME_PM_L3SC_ALL_BUSY 229 #define POWER5_PME_PM_MRK_GRP_IC_MISS 230 #define POWER5_PME_PM_GCT_NOSLOT_IC_MISS 231 #define POWER5_PME_PM_MRK_DATA_FROM_L3 232 #define POWER5_PME_PM_GCT_NOSLOT_SRQ_FULL 233 #define POWER5_PME_PM_THRD_SEL_OVER_ISU_HOLD 234 #define POWER5_PME_PM_CMPLU_STALL_DCACHE_MISS 235 #define POWER5_PME_PM_L3SA_MOD_INV 236 #define POWER5_PME_PM_LSU_FLUSH_LRQ 237 #define POWER5_PME_PM_THRD_PRIO_2_CYC 238 #define POWER5_PME_PM_LSU_FLUSH_SRQ 239 #define POWER5_PME_PM_MRK_LSU_SRQ_INST_VALID 240 #define POWER5_PME_PM_L3SA_REF 241 #define POWER5_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL 242 #define POWER5_PME_PM_FPU0_STALL3 243 #define POWER5_PME_PM_GPR_MAP_FULL_CYC 244 #define POWER5_PME_PM_TB_BIT_TRANS 245 #define POWER5_PME_PM_MRK_LSU_FLUSH_LRQ 246 #define POWER5_PME_PM_FPU0_STF 247 #define POWER5_PME_PM_MRK_DTLB_MISS 248 #define POWER5_PME_PM_FPU1_FMA 249 #define POWER5_PME_PM_L2SA_MOD_TAG 250 #define POWER5_PME_PM_LSU1_FLUSH_ULD 251 #define POWER5_PME_PM_MRK_LSU0_FLUSH_UST 252 #define POWER5_PME_PM_MRK_INST_FIN 253 #define POWER5_PME_PM_FPU0_FULL_CYC 254 #define POWER5_PME_PM_LSU_LRQ_S0_ALLOC 255 #define POWER5_PME_PM_MRK_LSU1_FLUSH_ULD 256 #define POWER5_PME_PM_MRK_DTLB_REF 257 #define POWER5_PME_PM_BR_UNCOND 258 #define POWER5_PME_PM_THRD_SEL_OVER_L2MISS 259 #define POWER5_PME_PM_L2SB_SHR_INV 260 #define POWER5_PME_PM_MEM_LO_PRIO_WR_CMPL 261 #define POWER5_PME_PM_L3SC_MOD_TAG 262 #define POWER5_PME_PM_MRK_ST_MISS_L1 263 #define POWER5_PME_PM_GRP_DISP_SUCCESS 264 #define POWER5_PME_PM_THRD_PRIO_DIFF_1or2_CYC 265 #define POWER5_PME_PM_IC_DEMAND_L2_BHT_REDIRECT 266 #define POWER5_PME_PM_MEM_WQ_DISP_Q8to15 267 #define POWER5_PME_PM_FPU0_SINGLE 268 #define POWER5_PME_PM_LSU_DERAT_MISS 269 #define POWER5_PME_PM_THRD_PRIO_1_CYC 270 #define POWER5_PME_PM_L2SC_RCST_DISP_FAIL_OTHER 271 #define POWER5_PME_PM_FPU1_FEST 272 #define POWER5_PME_PM_FAB_HOLDtoVN_EMPTY 273 #define POWER5_PME_PM_SNOOP_RD_RETRY_RQ 274 #define POWER5_PME_PM_SNOOP_DCLAIM_RETRY_QFULL 275 #define POWER5_PME_PM_MRK_DATA_FROM_L25_SHR_CYC 276 #define POWER5_PME_PM_MRK_ST_CMPL_INT 277 #define POWER5_PME_PM_FLUSH_BR_MPRED 278 #define POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_ADDR 279 #define POWER5_PME_PM_FPU_STF 280 #define POWER5_PME_PM_CMPLU_STALL_FPU 281 #define POWER5_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC 282 #define POWER5_PME_PM_GCT_NOSLOT_CYC 283 #define POWER5_PME_PM_FXU0_BUSY_FXU1_IDLE 284 #define POWER5_PME_PM_PTEG_FROM_L35_SHR 285 #define POWER5_PME_PM_MRK_LSU_FLUSH_UST 286 #define POWER5_PME_PM_L3SA_HIT 287 #define POWER5_PME_PM_MRK_DATA_FROM_L25_SHR 288 #define POWER5_PME_PM_L2SB_RCST_DISP_FAIL_ADDR 289 #define POWER5_PME_PM_MRK_DATA_FROM_L35_SHR 290 #define POWER5_PME_PM_IERAT_XLATE_WR 291 #define POWER5_PME_PM_L2SA_ST_REQ 292 #define POWER5_PME_PM_THRD_SEL_T1 293 #define POWER5_PME_PM_IC_DEMAND_L2_BR_REDIRECT 294 #define POWER5_PME_PM_INST_FROM_LMEM 295 #define POWER5_PME_PM_FPU0_1FLOP 296 #define POWER5_PME_PM_MRK_DATA_FROM_L35_SHR_CYC 297 #define POWER5_PME_PM_PTEG_FROM_L2 298 #define POWER5_PME_PM_MEM_PW_CMPL 299 #define POWER5_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC 300 #define POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_OTHER 301 #define POWER5_PME_PM_FPU0_FIN 302 #define POWER5_PME_PM_MRK_DTLB_MISS_4K 303 #define POWER5_PME_PM_L3SC_SHR_INV 304 #define POWER5_PME_PM_GRP_BR_REDIR 305 #define POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_RC_FULL 306 #define POWER5_PME_PM_MRK_LSU_FLUSH_SRQ 307 #define POWER5_PME_PM_PTEG_FROM_L275_SHR 308 #define POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_RC_FULL 309 #define POWER5_PME_PM_SNOOP_RD_RETRY_WQ 310 #define POWER5_PME_PM_LSU0_NCLD 311 #define POWER5_PME_PM_FAB_DCLAIM_RETRIED 312 #define POWER5_PME_PM_LSU1_BUSY_REJECT 313 #define POWER5_PME_PM_FXLS0_FULL_CYC 314 #define POWER5_PME_PM_FPU0_FEST 315 #define POWER5_PME_PM_DTLB_REF_16M 316 #define POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_ADDR 317 #define POWER5_PME_PM_LSU0_REJECT_ERAT_MISS 318 #define POWER5_PME_PM_DATA_FROM_L25_MOD 319 #define POWER5_PME_PM_GCT_USAGE_60to79_CYC 320 #define POWER5_PME_PM_DATA_FROM_L375_MOD 321 #define POWER5_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 322 #define POWER5_PME_PM_LSU0_REJECT_RELOAD_CDF 323 #define POWER5_PME_PM_0INST_FETCH 324 #define POWER5_PME_PM_LSU1_REJECT_RELOAD_CDF 325 #define POWER5_PME_PM_L1_PREF 326 #define POWER5_PME_PM_MEM_WQ_DISP_Q0to7 327 #define POWER5_PME_PM_MRK_DATA_FROM_LMEM_CYC 328 #define POWER5_PME_PM_BRQ_FULL_CYC 329 #define POWER5_PME_PM_GRP_IC_MISS_NONSPEC 330 #define POWER5_PME_PM_PTEG_FROM_L275_MOD 331 #define POWER5_PME_PM_MRK_LD_MISS_L1_LSU0 332 #define POWER5_PME_PM_MRK_DATA_FROM_L375_SHR_CYC 333 #define POWER5_PME_PM_LSU_FLUSH 334 #define POWER5_PME_PM_DATA_FROM_L3 335 #define POWER5_PME_PM_INST_FROM_L2 336 #define POWER5_PME_PM_PMC2_OVERFLOW 337 #define POWER5_PME_PM_FPU0_DENORM 338 #define POWER5_PME_PM_FPU1_FMOV_FEST 339 #define POWER5_PME_PM_INST_FETCH_CYC 340 #define POWER5_PME_PM_LSU_LDF 341 #define POWER5_PME_PM_INST_DISP 342 #define POWER5_PME_PM_DATA_FROM_L25_SHR 343 #define POWER5_PME_PM_L1_DCACHE_RELOAD_VALID 344 #define POWER5_PME_PM_MEM_WQ_DISP_DCLAIM 345 #define POWER5_PME_PM_FPU_FULL_CYC 346 #define POWER5_PME_PM_MRK_GRP_ISSUED 347 #define POWER5_PME_PM_THRD_PRIO_3_CYC 348 #define POWER5_PME_PM_FPU_FMA 349 #define POWER5_PME_PM_INST_FROM_L35_MOD 350 #define POWER5_PME_PM_MRK_CRU_FIN 351 #define POWER5_PME_PM_SNOOP_WR_RETRY_WQ 352 #define POWER5_PME_PM_CMPLU_STALL_REJECT 353 #define POWER5_PME_PM_LSU1_REJECT_ERAT_MISS 354 #define POWER5_PME_PM_MRK_FXU_FIN 355 #define POWER5_PME_PM_L2SB_RCST_DISP_FAIL_OTHER 356 #define POWER5_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY 357 #define POWER5_PME_PM_PMC4_OVERFLOW 358 #define POWER5_PME_PM_L3SA_SNOOP_RETRY 359 #define POWER5_PME_PM_PTEG_FROM_L35_MOD 360 #define POWER5_PME_PM_INST_FROM_L25_MOD 361 #define POWER5_PME_PM_THRD_SMT_HANG 362 #define POWER5_PME_PM_CMPLU_STALL_ERAT_MISS 363 #define POWER5_PME_PM_L3SA_MOD_TAG 364 #define POWER5_PME_PM_FLUSH_SYNC 365 #define POWER5_PME_PM_INST_FROM_L2MISS 366 #define POWER5_PME_PM_L2SC_ST_HIT 367 #define POWER5_PME_PM_MEM_RQ_DISP_Q8to11 368 #define POWER5_PME_PM_MRK_GRP_DISP 369 #define POWER5_PME_PM_L2SB_MOD_TAG 370 #define POWER5_PME_PM_CLB_EMPTY_CYC 371 #define POWER5_PME_PM_L2SB_ST_HIT 372 #define POWER5_PME_PM_MEM_NONSPEC_RD_CANCEL 373 #define POWER5_PME_PM_BR_PRED_CR_TA 374 #define POWER5_PME_PM_MRK_LSU0_FLUSH_SRQ 375 #define POWER5_PME_PM_MRK_LSU_FLUSH_ULD 376 #define POWER5_PME_PM_INST_DISP_ATTEMPT 377 #define POWER5_PME_PM_INST_FROM_RMEM 378 #define POWER5_PME_PM_ST_REF_L1_LSU0 379 #define POWER5_PME_PM_LSU0_DERAT_MISS 380 #define POWER5_PME_PM_L2SB_RCLD_DISP 381 #define POWER5_PME_PM_FPU_STALL3 382 #define POWER5_PME_PM_BR_PRED_CR 383 #define POWER5_PME_PM_MRK_DATA_FROM_L2 384 #define POWER5_PME_PM_LSU0_FLUSH_SRQ 385 #define POWER5_PME_PM_FAB_PNtoNN_DIRECT 386 #define POWER5_PME_PM_IOPS_CMPL 387 #define POWER5_PME_PM_L2SC_SHR_INV 388 #define POWER5_PME_PM_L2SA_RCST_DISP_FAIL_OTHER 389 #define POWER5_PME_PM_L2SA_RCST_DISP 390 #define POWER5_PME_PM_SNOOP_RETRY_AB_COLLISION 391 #define POWER5_PME_PM_FAB_PNtoVN_SIDECAR 392 #define POWER5_PME_PM_LSU_LMQ_S0_ALLOC 393 #define POWER5_PME_PM_LSU0_REJECT_LMQ_FULL 394 #define POWER5_PME_PM_SNOOP_PW_RETRY_RQ 395 #define POWER5_PME_PM_DTLB_REF 396 #define POWER5_PME_PM_PTEG_FROM_L3 397 #define POWER5_PME_PM_FAB_M1toVNorNN_SIDECAR_EMPTY 398 #define POWER5_PME_PM_LSU_SRQ_EMPTY_CYC 399 #define POWER5_PME_PM_FPU1_STF 400 #define POWER5_PME_PM_LSU_LMQ_S0_VALID 401 #define POWER5_PME_PM_GCT_USAGE_00to59_CYC 402 #define POWER5_PME_PM_DATA_FROM_L2MISS 403 #define POWER5_PME_PM_GRP_DISP_BLK_SB_CYC 404 #define POWER5_PME_PM_FPU_FMOV_FEST 405 #define POWER5_PME_PM_XER_MAP_FULL_CYC 406 #define POWER5_PME_PM_FLUSH_SB 407 #define POWER5_PME_PM_MRK_DATA_FROM_L375_SHR 408 #define POWER5_PME_PM_MRK_GRP_CMPL 409 #define POWER5_PME_PM_SUSPENDED 410 #define POWER5_PME_PM_GRP_IC_MISS_BR_REDIR_NONSPEC 411 #define POWER5_PME_PM_SNOOP_RD_RETRY_QFULL 412 #define POWER5_PME_PM_L3SB_MOD_INV 413 #define POWER5_PME_PM_DATA_FROM_L35_SHR 414 #define POWER5_PME_PM_LD_MISS_L1_LSU1 415 #define POWER5_PME_PM_STCX_FAIL 416 #define POWER5_PME_PM_DC_PREF_DST 417 #define POWER5_PME_PM_GRP_DISP 418 #define POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_ADDR 419 #define POWER5_PME_PM_FPU0_FPSCR 420 #define POWER5_PME_PM_DATA_FROM_L2 421 #define POWER5_PME_PM_FPU1_DENORM 422 #define POWER5_PME_PM_FPU_1FLOP 423 #define POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_OTHER 424 #define POWER5_PME_PM_L2SC_RCST_DISP_FAIL_RC_FULL 425 #define POWER5_PME_PM_FPU0_FSQRT 426 #define POWER5_PME_PM_LD_REF_L1 427 #define POWER5_PME_PM_INST_FROM_L1 428 #define POWER5_PME_PM_TLBIE_HELD 429 #define POWER5_PME_PM_DC_PREF_OUT_OF_STREAMS 430 #define POWER5_PME_PM_MRK_DATA_FROM_L25_MOD_CYC 431 #define POWER5_PME_PM_MRK_LSU1_FLUSH_SRQ 432 #define POWER5_PME_PM_MEM_RQ_DISP_Q0to3 433 #define POWER5_PME_PM_ST_REF_L1_LSU1 434 #define POWER5_PME_PM_MRK_LD_MISS_L1 435 #define POWER5_PME_PM_L1_WRITE_CYC 436 #define POWER5_PME_PM_L2SC_ST_REQ 437 #define POWER5_PME_PM_CMPLU_STALL_FDIV 438 #define POWER5_PME_PM_THRD_SEL_OVER_CLB_EMPTY 439 #define POWER5_PME_PM_BR_MPRED_CR 440 #define POWER5_PME_PM_L3SB_MOD_TAG 441 #define POWER5_PME_PM_MRK_DATA_FROM_L2MISS 442 #define POWER5_PME_PM_LSU_REJECT_SRQ 443 #define POWER5_PME_PM_LD_MISS_L1 444 #define POWER5_PME_PM_INST_FROM_PREF 445 #define POWER5_PME_PM_DC_INV_L2 446 #define POWER5_PME_PM_STCX_PASS 447 #define POWER5_PME_PM_LSU_SRQ_FULL_CYC 448 #define POWER5_PME_PM_FPU_FIN 449 #define POWER5_PME_PM_L2SA_SHR_MOD 450 #define POWER5_PME_PM_LSU_SRQ_STFWD 451 #define POWER5_PME_PM_0INST_CLB_CYC 452 #define POWER5_PME_PM_FXU0_FIN 453 #define POWER5_PME_PM_L2SB_RCST_DISP_FAIL_RC_FULL 454 #define POWER5_PME_PM_THRD_GRP_CMPL_BOTH_CYC 455 #define POWER5_PME_PM_PMC5_OVERFLOW 456 #define POWER5_PME_PM_FPU0_FDIV 457 #define POWER5_PME_PM_PTEG_FROM_L375_SHR 458 #define POWER5_PME_PM_LD_REF_L1_LSU1 459 #define POWER5_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY 460 #define POWER5_PME_PM_HV_CYC 461 #define POWER5_PME_PM_THRD_PRIO_DIFF_0_CYC 462 #define POWER5_PME_PM_LR_CTR_MAP_FULL_CYC 463 #define POWER5_PME_PM_L3SB_SHR_INV 464 #define POWER5_PME_PM_DATA_FROM_RMEM 465 #define POWER5_PME_PM_DATA_FROM_L275_MOD 466 #define POWER5_PME_PM_LSU0_REJECT_SRQ 467 #define POWER5_PME_PM_LSU1_DERAT_MISS 468 #define POWER5_PME_PM_MRK_LSU_FIN 469 #define POWER5_PME_PM_DTLB_MISS_16M 470 #define POWER5_PME_PM_LSU0_FLUSH_UST 471 #define POWER5_PME_PM_L2SC_MOD_TAG 472 #define POWER5_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY 473 static const pme_power_entry_t power5_pe[] = { [ POWER5_PME_PM_LSU_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU_REJECT_RELOAD_CDF", .pme_code = 0x2c6090, .pme_short_desc = "LSU reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0x20e7, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "FPU1 has executed a single precision instruction.", }, [ POWER5_PME_PM_L3SB_REF ] = { .pme_name = "PM_L3SB_REF", .pme_code = 0x701c4, .pme_short_desc = "L3 slice B references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice ", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_3or4_CYC", .pme_code = 0x430e5, .pme_short_desc = "Cycles thread priority difference is 3 or 4", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 3 or 4.", }, [ POWER5_PME_PM_INST_FROM_L275_SHR ] = { .pme_name = "PM_INST_FROM_L275_SHR", .pme_code = 0x322096, .pme_short_desc = "Instruction fetched from L2.75 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (T) data from the L2 on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_MRK_DATA_FROM_L375_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L375_MOD", .pme_code = 0x1c70a7, .pme_short_desc = "Marked data loaded from L3.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on a different module than this processor is located due to a marked load.", }, [ POWER5_PME_PM_DTLB_MISS_4K ] = { .pme_name = "PM_DTLB_MISS_4K", .pme_code = 0xc40c0, .pme_short_desc = "Data TLB miss for 4K page", .pme_long_desc = "Data TLB references to 4KB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5_PME_PM_CLB_FULL_CYC ] = { .pme_name = "PM_CLB_FULL_CYC", .pme_code = 0x220e5, .pme_short_desc = "Cycles CLB full", .pme_long_desc = "Cycles when both thread's CLB is full.", }, [ POWER5_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x100003, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ POWER5_PME_PM_LSU_FLUSH_LRQ_FULL ] = { .pme_name = "PM_LSU_FLUSH_LRQ_FULL", .pme_code = 0x320e7, .pme_short_desc = "Flush caused by LRQ full", .pme_long_desc = "This thread was flushed at dispatch because its Load Request Queue was full. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L275_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L275_SHR", .pme_code = 0x3c7097, .pme_short_desc = "Marked data loaded from L2.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T) data from the L2 on a different module than this processor is located due to a marked load.", }, [ POWER5_PME_PM_1INST_CLB_CYC ] = { .pme_name = "PM_1INST_CLB_CYC", .pme_code = 0x400c1, .pme_short_desc = "Cycles 1 instruction in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_MEM_SPEC_RD_CANCEL ] = { .pme_name = "PM_MEM_SPEC_RD_CANCEL", .pme_code = 0x721e6, .pme_short_desc = "Speculative memory read cancelled", .pme_long_desc = "Speculative memory read cancelled (i.e. cresp = sourced by L2/L3)", }, [ POWER5_PME_PM_MRK_DTLB_MISS_16M ] = { .pme_name = "PM_MRK_DTLB_MISS_16M", .pme_code = 0xc40c5, .pme_short_desc = "Marked Data TLB misses for 16M page", .pme_long_desc = "Marked Data TLB misses for 16M page", }, [ POWER5_PME_PM_FPU_FDIV ] = { .pme_name = "PM_FPU_FDIV", .pme_code = 0x100088, .pme_short_desc = "FPU executed FDIV instruction", .pme_long_desc = "The floating point unit has executed a divide instruction. This could be fdiv, fdivs, fdiv., fdivs.. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x102090, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0xc1, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "The floating point unit has executed a multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_SLB_MISS ] = { .pme_name = "PM_SLB_MISS", .pme_code = 0x280088, .pme_short_desc = "SLB misses", .pme_long_desc = "Total of all Segment Lookaside Buffer (SLB) misses, Instructions + Data.", }, [ POWER5_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0xc00c6, .pme_short_desc = "LSU1 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5_PME_PM_L2SA_ST_HIT ] = { .pme_name = "PM_L2SA_ST_HIT", .pme_code = 0x733e0, .pme_short_desc = "L2 slice A store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A, B, and C.", }, [ POWER5_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x800c4, .pme_short_desc = "Data TLB misses", .pme_long_desc = "Data TLB misses, all page sizes.", }, [ POWER5_PME_PM_BR_PRED_TA ] = { .pme_name = "PM_BR_PRED_TA", .pme_code = 0x230e3, .pme_short_desc = "A conditional branch was predicted, target prediction", .pme_long_desc = "The target address of a branch instruction was predicted.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L375_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L375_MOD_CYC", .pme_code = 0x4c70a7, .pme_short_desc = "Marked load latency from L3.75 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_CMPLU_STALL_FXU ] = { .pme_name = "PM_CMPLU_STALL_FXU", .pme_code = 0x211099, .pme_short_desc = "Completion stall caused by FXU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point instruction.", }, [ POWER5_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x400003, .pme_short_desc = "External interrupts", .pme_long_desc = "An interrupt due to an external exception occurred", }, [ POWER5_PME_PM_MRK_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_LRQ", .pme_code = 0x810c6, .pme_short_desc = "LSU1 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0xc50c4, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed by LSU1", }, [ POWER5_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x200003, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ POWER5_PME_PM_FAB_CMD_ISSUED ] = { .pme_name = "PM_FAB_CMD_ISSUED", .pme_code = 0x700c7, .pme_short_desc = "Fabric command issued", .pme_long_desc = "Incremented when a chip issues a command on its SnoopA address bus. Each of the two address busses (SnoopA and SnoopB) is capable of one transaction per fabric cycle (one fabric cycle = 2 cpu cycles in normal 2:1 mode), but each chip can only drive the SnoopA bus, and can only drive one transaction every two fabric cycles (i.e., every four cpu cycles). In MCM-based systems, two chips interleave their accesses to each of the two fabric busses (SnoopA, SnoopB) to reach a peak capability of one transaction per cpu clock cycle. The two chips that drive SnoopB are wired so that the chips refer to the bus as SnoopA but it is connected to the other two chips as SnoopB. Note that this event will only be recorded by the FBC on the chip that sourced the operation. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0xc20e0, .pme_short_desc = "LSU0 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER5_PME_PM_CR_MAP_FULL_CYC ] = { .pme_name = "PM_CR_MAP_FULL_CYC", .pme_code = 0x100c4, .pme_short_desc = "Cycles CR logical operation mapper full", .pme_long_desc = "The Conditional Register mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_L2SA_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5_PME_PM_MRK_LSU0_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU0_FLUSH_ULD", .pme_code = 0x810c0, .pme_short_desc = "LSU0 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5_PME_PM_LSU_FLUSH_SRQ_FULL ] = { .pme_name = "PM_LSU_FLUSH_SRQ_FULL", .pme_code = 0x330e0, .pme_short_desc = "Flush caused by SRQ full", .pme_long_desc = "This thread was flushed at dispatch because its Store Request Queue was full. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5_PME_PM_FLUSH_IMBAL ] = { .pme_name = "PM_FLUSH_IMBAL", .pme_code = 0x330e3, .pme_short_desc = "Flush caused by thread GCT imbalance", .pme_long_desc = "This thread has been flushed at dispatch because it is stalled and a GCT imbalance exists. GCT thresholds are set in the TSCR register. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5_PME_PM_MEM_RQ_DISP_Q16to19 ] = { .pme_name = "PM_MEM_RQ_DISP_Q16to19", .pme_code = 0x727e6, .pme_short_desc = "Memory read queue dispatched to queues 16-19", .pme_long_desc = "A memory operation was dispatched to read queue 16,17,18 or 19. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_minus3or4_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus3or4_CYC", .pme_code = 0x430e1, .pme_short_desc = "Cycles thread priority difference is -3 or -4", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 3 or 4.", }, [ POWER5_PME_PM_DATA_FROM_L35_MOD ] = { .pme_name = "PM_DATA_FROM_L35_MOD", .pme_code = 0x2c309e, .pme_short_desc = "Data loaded from L3.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_MEM_HI_PRIO_WR_CMPL ] = { .pme_name = "PM_MEM_HI_PRIO_WR_CMPL", .pme_code = 0x726e6, .pme_short_desc = "High priority write completed", .pme_long_desc = "A memory write, which was upgraded to high priority, completed. Writes can be upgraded to high priority to ensure that read traffic does not lock out writes. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FPU1_FDIV ] = { .pme_name = "PM_FPU1_FDIV", .pme_code = 0xc4, .pme_short_desc = "FPU1 executed FDIV instruction", .pme_long_desc = "FPU1 has executed a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER5_PME_PM_FPU0_FRSP_FCONV ] = { .pme_name = "PM_FPU0_FRSP_FCONV", .pme_code = 0x10c1, .pme_short_desc = "FPU0 executed FRSP or FCONV instructions", .pme_long_desc = "FPU0 has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_MEM_RQ_DISP ] = { .pme_name = "PM_MEM_RQ_DISP", .pme_code = 0x701c6, .pme_short_desc = "Memory read queue dispatched", .pme_long_desc = "A memory read was dispatched. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_LWSYNC_HELD ] = { .pme_name = "PM_LWSYNC_HELD", .pme_code = 0x130e0, .pme_short_desc = "LWSYNC held at dispatch", .pme_long_desc = "Cycles a LWSYNC instruction was held at dispatch. LWSYNC instructions are held at dispatch until all previous loads are done and all previous stores have issued. LWSYNC enters the Store Request Queue and is sent to the storage subsystem but does not wait for a response.", }, [ POWER5_PME_PM_FXU_FIN ] = { .pme_name = "PM_FXU_FIN", .pme_code = 0x313088, .pme_short_desc = "FXU produced a result", .pme_long_desc = "The fixed point unit (Unit 0 + Unit 1) finished an instruction. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x800c5, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve.", }, [ POWER5_PME_PM_FXLS1_FULL_CYC ] = { .pme_name = "PM_FXLS1_FULL_CYC", .pme_code = 0x110c4, .pme_short_desc = "Cycles FXU1/LS1 queue full", .pme_long_desc = "The issue queue that feeds the Fixed Point unit 1 / Load Store Unit 1 is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_DATA_FROM_L275_SHR ] = { .pme_name = "PM_DATA_FROM_L275_SHR", .pme_code = 0x3c3097, .pme_short_desc = "Data loaded from L2.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T) data from the L2 on a different module than this processor is located due to a demand load. ", }, [ POWER5_PME_PM_THRD_SEL_T0 ] = { .pme_name = "PM_THRD_SEL_T0", .pme_code = 0x410c0, .pme_short_desc = "Decode selected thread 0", .pme_long_desc = "Thread selection picked thread 0 for decode.", }, [ POWER5_PME_PM_PTEG_RELOAD_VALID ] = { .pme_name = "PM_PTEG_RELOAD_VALID", .pme_code = 0x830e4, .pme_short_desc = "PTEG reload valid", .pme_long_desc = "A Page Table Entry was loaded into the TLB.", }, [ POWER5_PME_PM_LSU_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU_LMQ_LHR_MERGE", .pme_code = 0xc70e5, .pme_short_desc = "LMQ LHR merges", .pme_long_desc = "A data cache miss occurred for the same real cache line address as an earlier request already in the Load Miss Queue and was merged into the LMQ entry.", }, [ POWER5_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x820e6, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ POWER5_PME_PM_2INST_CLB_CYC ] = { .pme_name = "PM_2INST_CLB_CYC", .pme_code = 0x400c2, .pme_short_desc = "Cycles 2 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_FAB_PNtoVN_DIRECT ] = { .pme_name = "PM_FAB_PNtoVN_DIRECT", .pme_code = 0x723e7, .pme_short_desc = "PN to VN beat went straight to its destination", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and passes it through to the outbound VN bus without going into a sidecar. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_PTEG_FROM_L2MISS ] = { .pme_name = "PM_PTEG_FROM_L2MISS", .pme_code = 0x38309b, .pme_short_desc = "PTEG loaded from L2 miss", .pme_long_desc = "A Page Table Entry was loaded into the TLB but not from the local L2.", }, [ POWER5_PME_PM_CMPLU_STALL_LSU ] = { .pme_name = "PM_CMPLU_STALL_LSU", .pme_code = 0x211098, .pme_short_desc = "Completion stall caused by LSU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a load/store instruction.", }, [ POWER5_PME_PM_MRK_DSLB_MISS ] = { .pme_name = "PM_MRK_DSLB_MISS", .pme_code = 0xc50c7, .pme_short_desc = "Marked Data SLB misses", .pme_long_desc = "A Data SLB miss was caused by a marked instruction.", }, [ POWER5_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0x1c0088, .pme_short_desc = "LRQ unaligned load flushes", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1). Combined Unit 0 + 1.", }, [ POWER5_PME_PM_PTEG_FROM_LMEM ] = { .pme_name = "PM_PTEG_FROM_LMEM", .pme_code = 0x283087, .pme_short_desc = "PTEG loaded from local memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to the same module this proccessor is located on.", }, [ POWER5_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x200005, .pme_short_desc = "Marked instruction BRU processing finished", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_MEM_WQ_DISP_WRITE ] = { .pme_name = "PM_MEM_WQ_DISP_WRITE", .pme_code = 0x703c6, .pme_short_desc = "Memory write queue dispatched due to write", .pme_long_desc = "A memory write was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L275_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L275_MOD_CYC", .pme_code = 0x4c70a3, .pme_short_desc = "Marked load latency from L2.75 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_LSU1_NCLD ] = { .pme_name = "PM_LSU1_NCLD", .pme_code = 0xc50c5, .pme_short_desc = "LSU1 non-cacheable loads", .pme_long_desc = "A non-cacheable load was executed by Unit 0.", }, [ POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5_PME_PM_SNOOP_PW_RETRY_WQ_PWQ ] = { .pme_name = "PM_SNOOP_PW_RETRY_WQ_PWQ", .pme_code = 0x717c6, .pme_short_desc = "Snoop partial-write retry due to collision with active write or partial-write queue", .pme_long_desc = "A snoop request for a partial write to memory was retried because it matched the cache line of an active write or partial write. When this happens the snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FPR_MAP_FULL_CYC ] = { .pme_name = "PM_FPR_MAP_FULL_CYC", .pme_code = 0x100c1, .pme_short_desc = "Cycles FPR mapper full", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations. ", }, [ POWER5_PME_PM_FPU1_FULL_CYC ] = { .pme_name = "PM_FPU1_FULL_CYC", .pme_code = 0x100c7, .pme_short_desc = "Cycles FPU1 issue queue full", .pme_long_desc = "The issue queue for FPU1 cannot accept any more instructions. Dispatch to this issue queue is stopped", }, [ POWER5_PME_PM_L3SA_ALL_BUSY ] = { .pme_name = "PM_L3SA_ALL_BUSY", .pme_code = 0x721e3, .pme_short_desc = "L3 slice A active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5_PME_PM_3INST_CLB_CYC ] = { .pme_name = "PM_3INST_CLB_CYC", .pme_code = 0x400c3, .pme_short_desc = "Cycles 3 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_MEM_PWQ_DISP_Q2or3 ] = { .pme_name = "PM_MEM_PWQ_DISP_Q2or3", .pme_code = 0x734e6, .pme_short_desc = "Memory partial-write queue dispatched to Write Queue 2 or 3", .pme_long_desc = "Memory partial-write queue dispatched to Write Queue 2 or 3. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_L2SA_SHR_INV ] = { .pme_name = "PM_L2SA_SHR_INV", .pme_code = 0x710c0, .pme_short_desc = "L2 slice A transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x30000b, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ POWER5_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SA_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c0, .pme_short_desc = "L2 slice A RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5_PME_PM_THRD_SEL_OVER_GCT_IMBAL ] = { .pme_name = "PM_THRD_SEL_OVER_GCT_IMBAL", .pme_code = 0x410c4, .pme_short_desc = "Thread selection overrides caused by GCT imbalance", .pme_long_desc = "Thread selection was overridden because of a GCT imbalance.", }, [ POWER5_PME_PM_FPU_FSQRT ] = { .pme_name = "PM_FPU_FSQRT", .pme_code = 0x200090, .pme_short_desc = "FPU executed FSQRT instruction", .pme_long_desc = "The floating point unit has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_MRK_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_LRQ", .pme_code = 0x810c2, .pme_short_desc = "LSU0 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x20000a, .pme_short_desc = "PMC1 Overflow", .pme_long_desc = "Overflows from PMC1 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_L3SC_SNOOP_RETRY ] = { .pme_name = "PM_L3SC_SNOOP_RETRY", .pme_code = 0x731e5, .pme_short_desc = "L3 slice C snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x800c7, .pme_short_desc = "Cycles doing data tablewalks", .pme_long_desc = "Cycles a translation tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ POWER5_PME_PM_THRD_PRIO_6_CYC ] = { .pme_name = "PM_THRD_PRIO_6_CYC", .pme_code = 0x420e5, .pme_short_desc = "Cycles thread running at priority level 6", .pme_long_desc = "Cycles this thread was running at priority level 6.", }, [ POWER5_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x401090, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "The floating point unit has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_FAB_M1toP1_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_M1toP1_SIDECAR_EMPTY", .pme_code = 0x702c7, .pme_short_desc = "M1 to P1 sidecar empty", .pme_long_desc = "Fabric cycles when the Minus-1 hip/hop sidecars (sidecars for chip to chip data transfer) are empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_RMEM ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM", .pme_code = 0x1c70a1, .pme_short_desc = "Marked data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to a different module than this proccessor is located on.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L35_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L35_MOD_CYC", .pme_code = 0x4c70a6, .pme_short_desc = "Marked load latency from L3.5 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_MEM_PWQ_DISP ] = { .pme_name = "PM_MEM_PWQ_DISP", .pme_code = 0x704c6, .pme_short_desc = "Memory partial-write queue dispatched", .pme_long_desc = "Number of Partial Writes dispatched. The MC provides resources to gather partial cacheline writes (Partial line DMA writes & CI-stores) to up to four different cachelines at a time. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FAB_P1toM1_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_P1toM1_SIDECAR_EMPTY", .pme_code = 0x701c7, .pme_short_desc = "P1 to M1 sidecar empty", .pme_long_desc = "Fabric cycles when the Plus-1 hip/hop sidecars (sidecars for chip to chip data transfer) are empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_LD_MISS_L1_LSU0", .pme_code = 0xc10c2, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by unit 0.", }, [ POWER5_PME_PM_SNOOP_PARTIAL_RTRY_QFULL ] = { .pme_name = "PM_SNOOP_PARTIAL_RTRY_QFULL", .pme_code = 0x730e6, .pme_short_desc = "Snoop partial write retry due to partial-write queues full", .pme_long_desc = "A snoop request for a partial write to memory was retried because the write queues that handle partial writes were full. When this happens the active writes are changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FPU1_STALL3 ] = { .pme_name = "PM_FPU1_STALL3", .pme_code = 0x20e5, .pme_short_desc = "FPU1 stalled in pipe3", .pme_long_desc = "FPU1 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always).", }, [ POWER5_PME_PM_GCT_USAGE_80to99_CYC ] = { .pme_name = "PM_GCT_USAGE_80to99_CYC", .pme_code = 0x30001f, .pme_short_desc = "Cycles GCT 80-99% full", .pme_long_desc = "Cycles when the Global Completion Table has between 80% and 99% of its slots used. The GCT has 20 entries shared between threads", }, [ POWER5_PME_PM_WORK_HELD ] = { .pme_name = "PM_WORK_HELD", .pme_code = 0x40000c, .pme_short_desc = "Work held", .pme_long_desc = "RAS Unit has signaled completion to stop and there are groups waiting to complete", }, [ POWER5_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x100009, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of PowerPC instructions that completed. ", }, [ POWER5_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0xc00c5, .pme_short_desc = "LSU1 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4K boundary)", }, [ POWER5_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x100012, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle.", }, [ POWER5_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0xc00c0, .pme_short_desc = "LSU0 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1)", }, [ POWER5_PME_PM_LSU1_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_LMQ_FULL", .pme_code = 0xc60e5, .pme_short_desc = "LSU1 reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER5_PME_PM_GRP_DISP_REJECT ] = { .pme_name = "PM_GRP_DISP_REJECT", .pme_code = 0x120e4, .pme_short_desc = "Group dispatch rejected", .pme_long_desc = "A group that previously attempted dispatch was rejected.", }, [ POWER5_PME_PM_L2SA_MOD_INV ] = { .pme_name = "PM_L2SA_MOD_INV", .pme_code = 0x730e0, .pme_short_desc = "L2 slice A transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_PTEG_FROM_L25_SHR ] = { .pme_name = "PM_PTEG_FROM_L25_SHR", .pme_code = 0x183097, .pme_short_desc = "PTEG loaded from L2.5 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_FAB_CMD_RETRIED ] = { .pme_name = "PM_FAB_CMD_RETRIED", .pme_code = 0x710c7, .pme_short_desc = "Fabric command retried", .pme_long_desc = "Incremented when a command issued by a chip on its SnoopA address bus is retried for any reason. The overwhelming majority of retries are due to running out of memory controller queues but retries can also be caused by trying to reference addresses that are in a transient cache state -- e.g. a line is transient after issuing a DCLAIM instruction to a shared line but before the associated store completes. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_L3SA_SHR_INV ] = { .pme_name = "PM_L3SA_SHR_INV", .pme_code = 0x710c3, .pme_short_desc = "L3 slice A transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SB_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c1, .pme_short_desc = "L2 slice B RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5_PME_PM_L2SA_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5_PME_PM_PTEG_FROM_L375_MOD ] = { .pme_name = "PM_PTEG_FROM_L375_MOD", .pme_code = 0x1830a7, .pme_short_desc = "PTEG loaded from L3.75 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L3 of a chip on a different module than this processor is located, due to a demand load.", }, [ POWER5_PME_PM_MRK_LSU1_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU1_FLUSH_UST", .pme_code = 0x810c5, .pme_short_desc = "LSU1 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ POWER5_PME_PM_BR_ISSUED ] = { .pme_name = "PM_BR_ISSUED", .pme_code = 0x230e4, .pme_short_desc = "Branches issued", .pme_long_desc = "A branch instruction was issued to the branch unit. A branch that was incorrectly predicted may issue and execute multiple times.", }, [ POWER5_PME_PM_MRK_GRP_BR_REDIR ] = { .pme_name = "PM_MRK_GRP_BR_REDIR", .pme_code = 0x212091, .pme_short_desc = "Group experienced marked branch redirect", .pme_long_desc = "A group containing a marked (sampled) instruction experienced a branch redirect.", }, [ POWER5_PME_PM_EE_OFF ] = { .pme_name = "PM_EE_OFF", .pme_code = 0x130e3, .pme_short_desc = "Cycles MSR(EE) bit off", .pme_long_desc = "Cycles MSR(EE) bit was off indicating that interrupts due to external exceptions were masked.", }, [ POWER5_PME_PM_MEM_RQ_DISP_Q4to7 ] = { .pme_name = "PM_MEM_RQ_DISP_Q4to7", .pme_code = 0x712c6, .pme_short_desc = "Memory read queue dispatched to queues 4-7", .pme_long_desc = "A memory operation was dispatched to read queue 4,5,6 or 7. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MEM_FAST_PATH_RD_DISP ] = { .pme_name = "PM_MEM_FAST_PATH_RD_DISP", .pme_code = 0x713e6, .pme_short_desc = "Fast path memory read dispatched", .pme_long_desc = "Fast path memory read dispatched", }, [ POWER5_PME_PM_INST_FROM_L3 ] = { .pme_name = "PM_INST_FROM_L3", .pme_code = 0x12208d, .pme_short_desc = "Instruction fetched from L3", .pme_long_desc = "An instruction fetch group was fetched from the local L3. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x800c0, .pme_short_desc = "Instruction TLB misses", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ POWER5_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x400012, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy.", }, [ POWER5_PME_PM_FXLS_FULL_CYC ] = { .pme_name = "PM_FXLS_FULL_CYC", .pme_code = 0x411090, .pme_short_desc = "Cycles FXLS queue is full", .pme_long_desc = "Cycles when the issue queues for one or both FXU/LSU units is full. Use with caution since this is the sum of cycles when Unit 0 was full plus Unit 1 full. It does not indicate when both units were full.", }, [ POWER5_PME_PM_DTLB_REF_4K ] = { .pme_name = "PM_DTLB_REF_4K", .pme_code = 0xc40c2, .pme_short_desc = "Data TLB reference for 4K page", .pme_long_desc = "Data TLB references for 4KB pages. Includes hits + misses.", }, [ POWER5_PME_PM_GRP_DISP_VALID ] = { .pme_name = "PM_GRP_DISP_VALID", .pme_code = 0x120e3, .pme_short_desc = "Group dispatch valid", .pme_long_desc = "A group is available for dispatch. This does not mean it was successfully dispatched.", }, [ POWER5_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0x2c0088, .pme_short_desc = "SRQ unaligned store flushes", .pme_long_desc = "A store was flushed because it was unaligned (crossed a 4K boundary). Combined Unit 0 + 1.", }, [ POWER5_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x130e6, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_THRD_PRIO_4_CYC ] = { .pme_name = "PM_THRD_PRIO_4_CYC", .pme_code = 0x420e3, .pme_short_desc = "Cycles thread running at priority level 4", .pme_long_desc = "Cycles this thread was running at priority level 4.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L35_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L35_MOD", .pme_code = 0x2c709e, .pme_short_desc = "Marked data loaded from L3.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5_PME_PM_4INST_CLB_CYC ] = { .pme_name = "PM_4INST_CLB_CYC", .pme_code = 0x400c4, .pme_short_desc = "Cycles 4 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_MRK_DTLB_REF_16M ] = { .pme_name = "PM_MRK_DTLB_REF_16M", .pme_code = 0xc40c7, .pme_short_desc = "Marked Data TLB reference for 16M page", .pme_long_desc = "Data TLB references by a marked instruction for 16MB pages.", }, [ POWER5_PME_PM_INST_FROM_L375_MOD ] = { .pme_name = "PM_INST_FROM_L375_MOD", .pme_code = 0x42209d, .pme_short_desc = "Instruction fetched from L3.75 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L3 of a chip on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_L2SC_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c2, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x300013, .pme_short_desc = "Group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER5_PME_PM_FPU1_1FLOP ] = { .pme_name = "PM_FPU1_1FLOP", .pme_code = 0xc7, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5_PME_PM_FPU_FRSP_FCONV ] = { .pme_name = "PM_FPU_FRSP_FCONV", .pme_code = 0x301090, .pme_short_desc = "FPU executed FRSP or FCONV instructions", .pme_long_desc = "The floating point unit has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_5INST_CLB_CYC ] = { .pme_name = "PM_5INST_CLB_CYC", .pme_code = 0x400c5, .pme_short_desc = "Cycles 5 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_L3SC_REF ] = { .pme_name = "PM_L3SC_REF", .pme_code = 0x701c5, .pme_short_desc = "L3 slice C references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice.", }, [ POWER5_PME_PM_THRD_L2MISS_BOTH_CYC ] = { .pme_name = "PM_THRD_L2MISS_BOTH_CYC", .pme_code = 0x410c7, .pme_short_desc = "Cycles both threads in L2 misses", .pme_long_desc = "Cycles that both threads have L2 miss pending. If only one thread has a L2 miss pending the other thread is given priority at decode. If both threads have L2 miss pending decode priority is determined by the number of GCT entries used.", }, [ POWER5_PME_PM_MEM_PW_GATH ] = { .pme_name = "PM_MEM_PW_GATH", .pme_code = 0x714c6, .pme_short_desc = "Memory partial-write gathered", .pme_long_desc = "Two or more partial-writes have been merged into a single memory write. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FAB_PNtoNN_SIDECAR ] = { .pme_name = "PM_FAB_PNtoNN_SIDECAR", .pme_code = 0x713c7, .pme_short_desc = "PN to NN beat went to sidecar first", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and forwards it on to the outbound NN data bus after going into a sidecar first. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5_PME_PM_FAB_DCLAIM_ISSUED ] = { .pme_name = "PM_FAB_DCLAIM_ISSUED", .pme_code = 0x720e7, .pme_short_desc = "dclaim issued", .pme_long_desc = "A DCLAIM command was issued. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly. ", }, [ POWER5_PME_PM_GRP_IC_MISS ] = { .pme_name = "PM_GRP_IC_MISS", .pme_code = 0x120e7, .pme_short_desc = "Group experienced I cache miss", .pme_long_desc = "Number of groups, counted at dispatch, that have encountered an icache miss redirect. Every group constructed from a fetch group that missed the instruction cache will count.", }, [ POWER5_PME_PM_INST_FROM_L35_SHR ] = { .pme_name = "PM_INST_FROM_L35_SHR", .pme_code = 0x12209d, .pme_short_desc = "Instruction fetched from L3.5 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L3 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0xc30e7, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The Load Miss Queue was full.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L2_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2_CYC", .pme_code = 0x2c70a0, .pme_short_desc = "Marked load latency from L2", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0x830e5, .pme_short_desc = "SRQ sync duration", .pme_long_desc = "Cycles that a sync instruction is active in the Store Request Queue.", }, [ POWER5_PME_PM_LSU0_BUSY_REJECT ] = { .pme_name = "PM_LSU0_BUSY_REJECT", .pme_code = 0xc20e3, .pme_short_desc = "LSU0 busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions. ", }, [ POWER5_PME_PM_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU_REJECT_ERAT_MISS", .pme_code = 0x1c6090, .pme_short_desc = "LSU reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions due to an ERAT miss. Combined unit 0 + 1. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5_PME_PM_MRK_DATA_FROM_RMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_RMEM_CYC", .pme_code = 0x4c70a1, .pme_short_desc = "Marked load latency from remote memory", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_DATA_FROM_L375_SHR ] = { .pme_name = "PM_DATA_FROM_L375_SHR", .pme_code = 0x3c309e, .pme_short_desc = "Data loaded from L3.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on a different module than this processor is located due to a demand load.", }, [ POWER5_PME_PM_FPU0_FMOV_FEST ] = { .pme_name = "PM_FPU0_FMOV_FEST", .pme_code = 0x10c0, .pme_short_desc = "FPU0 executed FMOV or FEST instructions", .pme_long_desc = "FPU0 has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5_PME_PM_PTEG_FROM_L25_MOD ] = { .pme_name = "PM_PTEG_FROM_L25_MOD", .pme_code = 0x283097, .pme_short_desc = "PTEG loaded from L2.5 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0xc10c0, .pme_short_desc = "LSU0 L1 D cache load references", .pme_long_desc = "Load references to Level 1 Data Cache, by unit 0.", }, [ POWER5_PME_PM_THRD_PRIO_7_CYC ] = { .pme_name = "PM_THRD_PRIO_7_CYC", .pme_code = 0x420e6, .pme_short_desc = "Cycles thread running at priority level 7", .pme_long_desc = "Cycles this thread was running at priority level 7.", }, [ POWER5_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0xc00c7, .pme_short_desc = "LSU1 SRQ lhs flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. ", }, [ POWER5_PME_PM_L2SC_RCST_DISP ] = { .pme_name = "PM_L2SC_RCST_DISP", .pme_code = 0x702c2, .pme_short_desc = "L2 slice C RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5_PME_PM_CMPLU_STALL_DIV ] = { .pme_name = "PM_CMPLU_STALL_DIV", .pme_code = 0x411099, .pme_short_desc = "Completion stall caused by DIV instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a fixed point divide instruction. This is a subset of PM_CMPLU_STALL_FXU.", }, [ POWER5_PME_PM_MEM_RQ_DISP_Q12to15 ] = { .pme_name = "PM_MEM_RQ_DISP_Q12to15", .pme_code = 0x732e6, .pme_short_desc = "Memory read queue dispatched to queues 12-15", .pme_long_desc = "A memory operation was dispatched to read queue 12,13,14 or 15. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_INST_FROM_L375_SHR ] = { .pme_name = "PM_INST_FROM_L375_SHR", .pme_code = 0x32209d, .pme_short_desc = "Instruction fetched from L3.75 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (S) data from the L3 of a chip on a different module than this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x3c1090, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Store references to the Data Cache. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_L3SB_ALL_BUSY ] = { .pme_name = "PM_L3SB_ALL_BUSY", .pme_code = 0x721e4, .pme_short_desc = "L3 slice B active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5_PME_PM_FAB_P1toVNorNN_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_P1toVNorNN_SIDECAR_EMPTY", .pme_code = 0x711c7, .pme_short_desc = "P1 to VN/NN sidecar empty", .pme_long_desc = "Fabric cycles when the Plus-1 jump sidecar (sidecars for mcm to mcm data transfer) is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L275_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L275_SHR_CYC", .pme_code = 0x2c70a3, .pme_short_desc = "Marked load latency from L2.75 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_FAB_HOLDtoNN_EMPTY ] = { .pme_name = "PM_FAB_HOLDtoNN_EMPTY", .pme_code = 0x722e7, .pme_short_desc = "Hold buffer to NN empty", .pme_long_desc = "Fabric cyles when the Next Node out hold-buffers are emtpy. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_DATA_FROM_LMEM ] = { .pme_name = "PM_DATA_FROM_LMEM", .pme_code = 0x2c3087, .pme_short_desc = "Data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to the same module this proccessor is located on.", }, [ POWER5_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x100005, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. Gating by the run latch filters out the idle loop.", }, [ POWER5_PME_PM_PTEG_FROM_RMEM ] = { .pme_name = "PM_PTEG_FROM_RMEM", .pme_code = 0x1830a1, .pme_short_desc = "PTEG loaded from remote memory", .pme_long_desc = "A Page Table Entry was loaded into the TLB from memory attached to a different module than this proccessor is located on.", }, [ POWER5_PME_PM_L2SC_RCLD_DISP ] = { .pme_name = "PM_L2SC_RCLD_DISP", .pme_code = 0x701c2, .pme_short_desc = "L2 slice C RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0xc50c0, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed by LSU0", }, [ POWER5_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0xc20e2, .pme_short_desc = "LRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the LRQ is split between the two threads (16 entries each).", }, [ POWER5_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x40000a, .pme_short_desc = "PMC3 Overflow", .pme_long_desc = "Overflows from PMC3 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_MRK_IMR_RELOAD ] = { .pme_name = "PM_MRK_IMR_RELOAD", .pme_code = 0x820e2, .pme_short_desc = "Marked IMR reloaded", .pme_long_desc = "A DL1 reload occurred due to marked load", }, [ POWER5_PME_PM_MRK_GRP_TIMEO ] = { .pme_name = "PM_MRK_GRP_TIMEO", .pme_code = 0x40000b, .pme_short_desc = "Marked group completion timeout", .pme_long_desc = "The sampling timeout expired indicating that the previously sampled instruction is no longer in the processor", }, [ POWER5_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0xc10c3, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_STOP_COMPLETION ] = { .pme_name = "PM_STOP_COMPLETION", .pme_code = 0x300018, .pme_short_desc = "Completion stopped", .pme_long_desc = "RAS Unit has signaled completion to stop", }, [ POWER5_PME_PM_LSU_BUSY_REJECT ] = { .pme_name = "PM_LSU_BUSY_REJECT", .pme_code = 0x1c2090, .pme_short_desc = "LSU busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions. Combined unit 0 + 1.", }, [ POWER5_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x800c1, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ POWER5_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0xf, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ POWER5_PME_PM_THRD_ONE_RUN_CYC ] = { .pme_name = "PM_THRD_ONE_RUN_CYC", .pme_code = 0x10000b, .pme_short_desc = "One of the threads in run cycles", .pme_long_desc = "At least one thread has set its run latch. Operating systems use the run latch to indicate when they are doing useful work. The run latch is typically cleared in the OS idle loop. This event does not respect FCWAIT.", }, [ POWER5_PME_PM_GRP_BR_REDIR_NONSPEC ] = { .pme_name = "PM_GRP_BR_REDIR_NONSPEC", .pme_code = 0x112091, .pme_short_desc = "Group experienced non-speculative branch redirect", .pme_long_desc = "Number of groups, counted at completion, that have encountered a branch redirect.", }, [ POWER5_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0xc20e4, .pme_short_desc = "LSU1 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss.", }, [ POWER5_PME_PM_L3SC_MOD_INV ] = { .pme_name = "PM_L3SC_MOD_INV", .pme_code = 0x730e5, .pme_short_desc = "L3 slice C transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I) Mu|Me are not included since they are formed due to a previous read op Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_L2_PREF ] = { .pme_name = "PM_L2_PREF", .pme_code = 0xc50c3, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "A request to prefetch data into L2 was made", }, [ POWER5_PME_PM_GCT_NOSLOT_BR_MPRED ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED", .pme_code = 0x41009c, .pme_short_desc = "No slot in GCT caused by branch mispredict", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of a branch misprediction.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x2c7097, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5_PME_PM_L2SB_MOD_INV ] = { .pme_name = "PM_L2SB_MOD_INV", .pme_code = 0x730e1, .pme_short_desc = "L2 slice B transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_L2SB_ST_REQ ] = { .pme_name = "PM_L2SB_ST_REQ", .pme_code = 0x723e1, .pme_short_desc = "L2 slice B store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0xc70e4, .pme_short_desc = "Marked L1 reload data source valid", .pme_long_desc = "The source information is valid and is for a marked load", }, [ POWER5_PME_PM_L3SB_HIT ] = { .pme_name = "PM_L3SB_HIT", .pme_code = 0x711c4, .pme_short_desc = "L3 slice B hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 slice", }, [ POWER5_PME_PM_L2SB_SHR_MOD ] = { .pme_name = "PM_L2SB_SHR_MOD", .pme_code = 0x700c1, .pme_short_desc = "L2 slice B transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C. ", }, [ POWER5_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x130e7, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles when an interrupt due to an external exception is pending but external exceptions were masked.", }, [ POWER5_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x100013, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ POWER5_PME_PM_L2SC_SHR_MOD ] = { .pme_name = "PM_L2SC_SHR_MOD", .pme_code = 0x700c2, .pme_short_desc = "L2 slice C transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C. ", }, [ POWER5_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x30001a, .pme_short_desc = "PMC6 Overflow", .pme_long_desc = "Overflows from PMC6 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_LSU_LRQ_FULL_CYC ] = { .pme_name = "PM_LSU_LRQ_FULL_CYC", .pme_code = 0x110c2, .pme_short_desc = "Cycles LRQ full", .pme_long_desc = "Cycles when the LRQ is full.", }, [ POWER5_PME_PM_IC_PREF_INSTALL ] = { .pme_name = "PM_IC_PREF_INSTALL", .pme_code = 0x210c7, .pme_short_desc = "Instruction prefetched installed in prefetch buffer", .pme_long_desc = "A prefetch buffer entry (line) is allocated but the request is not a demand fetch.", }, [ POWER5_PME_PM_TLB_MISS ] = { .pme_name = "PM_TLB_MISS", .pme_code = 0x180088, .pme_short_desc = "TLB misses", .pme_long_desc = "Total of Data TLB mises + Instruction TLB misses", }, [ POWER5_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x100c0, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The Global Completion Table is completely full.", }, [ POWER5_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x200012, .pme_short_desc = "FXU busy", .pme_long_desc = "Cycles when both FXU0 and FXU1 are busy.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L3_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3_CYC", .pme_code = 0x2c70a4, .pme_short_desc = "Marked load latency from L3", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_LSU_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU_REJECT_LMQ_FULL", .pme_code = 0x2c6088, .pme_short_desc = "LSU reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all the eight entries are full, subsequent load instructions are rejected. Combined unit 0 + 1.", }, [ POWER5_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0xc20e5, .pme_short_desc = "SRQ slot 0 allocated", .pme_long_desc = "SRQ Slot zero was allocated", }, [ POWER5_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x100014, .pme_short_desc = "Group marked in IDU", .pme_long_desc = "A group was sampled (marked). The group is called a marked group. One instruction within the group is tagged for detailed monitoring. The sampled instruction is called a marked instructions. Events associated with the marked instruction are annotated with the marked term.", }, [ POWER5_PME_PM_INST_FROM_L25_SHR ] = { .pme_name = "PM_INST_FROM_L25_SHR", .pme_code = 0x122096, .pme_short_desc = "Instruction fetched from L2.5 shared", .pme_long_desc = "An instruction fetch group was fetched with shared (T or SL) data from the L2 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions.", }, [ POWER5_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0x10c7, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "FPU1 finished, produced a result. This only indicates finish, not completion. Floating Point Stores are included in this count but not Floating Point Loads., , ", }, [ POWER5_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x830e7, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated.", }, [ POWER5_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x230e6, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "A branch instruction target was incorrectly predicted. This will result in a branch mispredict flush unless a flush is detected from an older instruction.", }, [ POWER5_PME_PM_CRQ_FULL_CYC ] = { .pme_name = "PM_CRQ_FULL_CYC", .pme_code = 0x110c1, .pme_short_desc = "Cycles CR issue queue full", .pme_long_desc = "The issue queue that feeds the Conditional Register unit is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_L2SA_RCLD_DISP ] = { .pme_name = "PM_L2SA_RCLD_DISP", .pme_code = 0x701c0, .pme_short_desc = "L2 slice A RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5_PME_PM_SNOOP_WR_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_WR_RETRY_QFULL", .pme_code = 0x710c6, .pme_short_desc = "Snoop read retry due to read queue full", .pme_long_desc = "A snoop request for a write to memory was retried because the write queues were full. When this happens the snoop request is retried and the writes in the write reorder queue are changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DTLB_REF_4K ] = { .pme_name = "PM_MRK_DTLB_REF_4K", .pme_code = 0xc40c3, .pme_short_desc = "Marked Data TLB reference for 4K page", .pme_long_desc = "Data TLB references by a marked instruction for 4KB pages.", }, [ POWER5_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0xc20e1, .pme_short_desc = "SRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin. In SMT mode the SRQ is split between the two threads (16 entries each).", }, [ POWER5_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0xc00c2, .pme_short_desc = "LSU0 LRQ flushes", .pme_long_desc = "A load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5_PME_PM_INST_FROM_L275_MOD ] = { .pme_name = "PM_INST_FROM_L275_MOD", .pme_code = 0x422096, .pme_short_desc = "Instruction fetched from L2.75 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L2 on a different module than this processor is located. Fetch groups can contain up to 8 instructions ", }, [ POWER5_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x200004, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ POWER5_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0x820e7, .pme_short_desc = "Larx executed on LSU0", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 (there is no corresponding unit 1 event since larx instructions can only execute on unit 0)", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_5or6_CYC", .pme_code = 0x430e6, .pme_short_desc = "Cycles thread priority difference is 5 or 6", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 5 or 6.", }, [ POWER5_PME_PM_SNOOP_RETRY_1AHEAD ] = { .pme_name = "PM_SNOOP_RETRY_1AHEAD", .pme_code = 0x725e6, .pme_short_desc = "Snoop retry due to one ahead collision", .pme_long_desc = "Snoop retry due to one ahead collision", }, [ POWER5_PME_PM_FPU1_FSQRT ] = { .pme_name = "PM_FPU1_FSQRT", .pme_code = 0xc6, .pme_short_desc = "FPU1 executed FSQRT instruction", .pme_long_desc = "FPU1 has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_MRK_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU1", .pme_code = 0x820e4, .pme_short_desc = "LSU1 marked L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by LSU1.", }, [ POWER5_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x300014, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5_PME_PM_THRD_PRIO_5_CYC ] = { .pme_name = "PM_THRD_PRIO_5_CYC", .pme_code = 0x420e4, .pme_short_desc = "Cycles thread running at priority level 5", .pme_long_desc = "Cycles this thread was running at priority level 5.", }, [ POWER5_PME_PM_MRK_DATA_FROM_LMEM ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM", .pme_code = 0x2c7087, .pme_short_desc = "Marked data loaded from local memory", .pme_long_desc = "The processor's Data Cache was reloaded due to a marked load from memory attached to the same module this proccessor is located on.", }, [ POWER5_PME_PM_FPU1_FRSP_FCONV ] = { .pme_name = "PM_FPU1_FRSP_FCONV", .pme_code = 0x10c5, .pme_short_desc = "FPU1 executed FRSP or FCONV instructions", .pme_long_desc = "FPU1 has executed a frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0x800c3, .pme_short_desc = "Snoop TLBIE", .pme_long_desc = "A tlbie was snooped from another processor.", }, [ POWER5_PME_PM_L3SB_SNOOP_RETRY ] = { .pme_name = "PM_L3SB_SNOOP_RETRY", .pme_code = 0x731e4, .pme_short_desc = "L3 slice B snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5_PME_PM_FAB_VBYPASS_EMPTY ] = { .pme_name = "PM_FAB_VBYPASS_EMPTY", .pme_code = 0x731e7, .pme_short_desc = "Vertical bypass buffer empty", .pme_long_desc = "Fabric cycles when the Middle Bypass sidecar is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L275_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L275_MOD", .pme_code = 0x1c70a3, .pme_short_desc = "Marked data loaded from L2.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 on a different module than this processor is located due to a marked load.", }, [ POWER5_PME_PM_6INST_CLB_CYC ] = { .pme_name = "PM_6INST_CLB_CYC", .pme_code = 0x400c6, .pme_short_desc = "Cycles 6 instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_L2SB_RCST_DISP ] = { .pme_name = "PM_L2SB_RCST_DISP", .pme_code = 0x702c1, .pme_short_desc = "L2 slice B RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5_PME_PM_FLUSH ] = { .pme_name = "PM_FLUSH", .pme_code = 0x110c7, .pme_short_desc = "Flushes", .pme_long_desc = "Flushes occurred including LSU and Branch flushes.", }, [ POWER5_PME_PM_L2SC_MOD_INV ] = { .pme_name = "PM_L2SC_MOD_INV", .pme_code = 0x730e2, .pme_short_desc = "L2 slice C transition from modified to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Invalid state. This transition was caused by any RWITM snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x102088, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "The floating point unit has encountered a denormalized operand. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_L3SC_HIT ] = { .pme_name = "PM_L3SC_HIT", .pme_code = 0x711c5, .pme_short_desc = "L3 slice C hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 Slice", }, [ POWER5_PME_PM_SNOOP_WR_RETRY_RQ ] = { .pme_name = "PM_SNOOP_WR_RETRY_RQ", .pme_code = 0x706c6, .pme_short_desc = "Snoop write/dclaim retry due to collision with active read queue", .pme_long_desc = "A snoop request for a write or dclaim to memory was retried because it matched the cacheline of an active read. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5_PME_PM_LSU1_REJECT_SRQ ] = { .pme_name = "PM_LSU1_REJECT_SRQ", .pme_code = 0xc60e4, .pme_short_desc = "LSU1 SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue.", }, [ POWER5_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x220e6, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "An instruction prefetch request has been made.", }, [ POWER5_PME_PM_L3SC_ALL_BUSY ] = { .pme_name = "PM_L3SC_ALL_BUSY", .pme_code = 0x721e5, .pme_short_desc = "L3 slice C active for every cycle all CI/CO machines busy", .pme_long_desc = "Cycles All Castin/Castout machines are busy.", }, [ POWER5_PME_PM_MRK_GRP_IC_MISS ] = { .pme_name = "PM_MRK_GRP_IC_MISS", .pme_code = 0x412091, .pme_short_desc = "Group experienced marked I cache miss", .pme_long_desc = "A group containing a marked (sampled) instruction experienced an instruction cache miss.", }, [ POWER5_PME_PM_GCT_NOSLOT_IC_MISS ] = { .pme_name = "PM_GCT_NOSLOT_IC_MISS", .pme_code = 0x21009c, .pme_short_desc = "No slot in GCT caused by I cache miss", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because of an Instruction Cache miss.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x1c708e, .pme_short_desc = "Marked data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a marked load.", }, [ POWER5_PME_PM_GCT_NOSLOT_SRQ_FULL ] = { .pme_name = "PM_GCT_NOSLOT_SRQ_FULL", .pme_code = 0x310084, .pme_short_desc = "No slot in GCT caused by SRQ full", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread because the Store Request Queue (SRQ) is full. This happens when the storage subsystem can not process the stores in the SRQ. Groups can not be dispatched until a SRQ entry is available.", }, [ POWER5_PME_PM_THRD_SEL_OVER_ISU_HOLD ] = { .pme_name = "PM_THRD_SEL_OVER_ISU_HOLD", .pme_code = 0x410c5, .pme_short_desc = "Thread selection overrides caused by ISU holds", .pme_long_desc = "Thread selection was overridden because of an ISU hold.", }, [ POWER5_PME_PM_CMPLU_STALL_DCACHE_MISS ] = { .pme_name = "PM_CMPLU_STALL_DCACHE_MISS", .pme_code = 0x21109a, .pme_short_desc = "Completion stall caused by D cache miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a Data Cache Miss. Data Cache Miss has higher priority than any other Load/Store delay, so if an instruction encounters multiple delays only the Data Cache Miss will be reported and the entire delay period will be charged to Data Cache Miss. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER5_PME_PM_L3SA_MOD_INV ] = { .pme_name = "PM_L3SA_MOD_INV", .pme_code = 0x730e3, .pme_short_desc = "L3 slice A transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I) Mu|Me are not included since they are formed due to a prev read op. Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0x2c0090, .pme_short_desc = "LRQ flushes", .pme_long_desc = "A load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte. Combined Units 0 and 1.", }, [ POWER5_PME_PM_THRD_PRIO_2_CYC ] = { .pme_name = "PM_THRD_PRIO_2_CYC", .pme_code = 0x420e1, .pme_short_desc = "Cycles thread running at priority level 2", .pme_long_desc = "Cycles this thread was running at priority level 2.", }, [ POWER5_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0x1c0090, .pme_short_desc = "SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_MRK_LSU_SRQ_INST_VALID ] = { .pme_name = "PM_MRK_LSU_SRQ_INST_VALID", .pme_code = 0xc70e6, .pme_short_desc = "Marked instruction valid in SRQ", .pme_long_desc = "This signal is asserted every cycle when a marked request is resident in the Store Request Queue", }, [ POWER5_PME_PM_L3SA_REF ] = { .pme_name = "PM_L3SA_REF", .pme_code = 0x701c3, .pme_short_desc = "L3 slice A references", .pme_long_desc = "Number of attempts made by this chip cores to find data in the L3. Reported per L3 slice ", }, [ POWER5_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL ] = { .pme_name = "PM_L2SC_RC_DISP_FAIL_CO_BUSY_ALL", .pme_code = 0x713c2, .pme_short_desc = "L2 slice C RC dispatch attempt failed due to all CO busy", .pme_long_desc = "A Read/Claim dispatch was rejected because all Castout machines were busy.", }, [ POWER5_PME_PM_FPU0_STALL3 ] = { .pme_name = "PM_FPU0_STALL3", .pme_code = 0x20e1, .pme_short_desc = "FPU0 stalled in pipe3", .pme_long_desc = "FPU0 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always).", }, [ POWER5_PME_PM_GPR_MAP_FULL_CYC ] = { .pme_name = "PM_GPR_MAP_FULL_CYC", .pme_code = 0x130e5, .pme_short_desc = "Cycles GPR mapper full", .pme_long_desc = "The General Purpose Register mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x100018, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ POWER5_PME_PM_MRK_LSU_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_LRQ", .pme_code = 0x381088, .pme_short_desc = "Marked LRQ flushes", .pme_long_desc = "A marked load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ POWER5_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0x20e2, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "FPU0 has executed a Floating Point Store instruction.", }, [ POWER5_PME_PM_MRK_DTLB_MISS ] = { .pme_name = "PM_MRK_DTLB_MISS", .pme_code = 0xc50c6, .pme_short_desc = "Marked Data TLB misses", .pme_long_desc = "Data TLB references by a marked instruction that missed the TLB (all page sizes).", }, [ POWER5_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0xc5, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "The floating point unit has executed a multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_L2SA_MOD_TAG ] = { .pme_name = "PM_L2SA_MOD_TAG", .pme_code = 0x720e0, .pme_short_desc = "L2 slice A transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0xc00c4, .pme_short_desc = "LSU1 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64 byte boundary, or 32 byte if it missed the L1).", }, [ POWER5_PME_PM_MRK_LSU0_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU0_FLUSH_UST", .pme_code = 0x810c1, .pme_short_desc = "LSU0 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 0 because it was unaligned", }, [ POWER5_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x300005, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5_PME_PM_FPU0_FULL_CYC ] = { .pme_name = "PM_FPU0_FULL_CYC", .pme_code = 0x100c3, .pme_short_desc = "Cycles FPU0 issue queue full", .pme_long_desc = "The issue queue for FPU0 cannot accept any more instruction. Dispatch to this issue queue is stopped.", }, [ POWER5_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0xc20e6, .pme_short_desc = "LRQ slot 0 allocated", .pme_long_desc = "LRQ slot zero was allocated", }, [ POWER5_PME_PM_MRK_LSU1_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU1_FLUSH_ULD", .pme_code = 0x810c4, .pme_short_desc = "LSU1 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5_PME_PM_MRK_DTLB_REF ] = { .pme_name = "PM_MRK_DTLB_REF", .pme_code = 0x1c4090, .pme_short_desc = "Marked Data TLB reference", .pme_long_desc = "Total number of Data TLB references by a marked instruction for all page sizes. Page size is determined at TLB reload time.", }, [ POWER5_PME_PM_BR_UNCOND ] = { .pme_name = "PM_BR_UNCOND", .pme_code = 0x123087, .pme_short_desc = "Unconditional branch", .pme_long_desc = "An unconditional branch was executed.", }, [ POWER5_PME_PM_THRD_SEL_OVER_L2MISS ] = { .pme_name = "PM_THRD_SEL_OVER_L2MISS", .pme_code = 0x410c3, .pme_short_desc = "Thread selection overrides caused by L2 misses", .pme_long_desc = "Thread selection was overridden because one thread was had a L2 miss pending.", }, [ POWER5_PME_PM_L2SB_SHR_INV ] = { .pme_name = "PM_L2SB_SHR_INV", .pme_code = 0x710c1, .pme_short_desc = "L2 slice B transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5_PME_PM_MEM_LO_PRIO_WR_CMPL ] = { .pme_name = "PM_MEM_LO_PRIO_WR_CMPL", .pme_code = 0x736e6, .pme_short_desc = "Low priority write completed", .pme_long_desc = "A memory write, which was not upgraded to high priority, completed. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5_PME_PM_L3SC_MOD_TAG ] = { .pme_name = "PM_L3SC_MOD_TAG", .pme_code = 0x720e5, .pme_short_desc = "L3 slice C transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case); Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_MRK_ST_MISS_L1 ] = { .pme_name = "PM_MRK_ST_MISS_L1", .pme_code = 0x820e3, .pme_short_desc = "Marked L1 D cache store misses", .pme_long_desc = "A marked store missed the dcache", }, [ POWER5_PME_PM_GRP_DISP_SUCCESS ] = { .pme_name = "PM_GRP_DISP_SUCCESS", .pme_code = 0x300002, .pme_short_desc = "Group dispatch success", .pme_long_desc = "Number of groups sucessfully dispatched (not rejected)", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_1or2_CYC", .pme_code = 0x430e4, .pme_short_desc = "Cycles thread priority difference is 1 or 2", .pme_long_desc = "Cycles when this thread's priority is higher than the other thread's priority by 1 or 2.", }, [ POWER5_PME_PM_IC_DEMAND_L2_BHT_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BHT_REDIRECT", .pme_code = 0x230e0, .pme_short_desc = "L2 I cache demand request due to BHT redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (CR mispredict).", }, [ POWER5_PME_PM_MEM_WQ_DISP_Q8to15 ] = { .pme_name = "PM_MEM_WQ_DISP_Q8to15", .pme_code = 0x733e6, .pme_short_desc = "Memory write queue dispatched to queues 8-15", .pme_long_desc = "A memory operation was dispatched to a write queue in the range between 8 and 15. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0x20e3, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "FPU0 has executed a single precision instruction.", }, [ POWER5_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x280090, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total D-ERAT Misses. Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_THRD_PRIO_1_CYC ] = { .pme_name = "PM_THRD_PRIO_1_CYC", .pme_code = 0x420e0, .pme_short_desc = "Cycles thread running at priority level 1", .pme_long_desc = "Cycles this thread was running at priority level 1. Priority level 1 is the lowest and indicates the thread is sleeping.", }, [ POWER5_PME_PM_L2SC_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e2, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0x10c6, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "FPU1 has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5_PME_PM_FAB_HOLDtoVN_EMPTY ] = { .pme_name = "PM_FAB_HOLDtoVN_EMPTY", .pme_code = 0x721e7, .pme_short_desc = "Hold buffer to VN empty", .pme_long_desc = "Fabric cycles when the Vertical Node out hold-buffers are emtpy. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_SNOOP_RD_RETRY_RQ ] = { .pme_name = "PM_SNOOP_RD_RETRY_RQ", .pme_code = 0x705c6, .pme_short_desc = "Snoop read retry due to collision with active read queue", .pme_long_desc = "A snoop request for a read from memory was retried because it matched the cache line of an active read. The snoop request is retried because the L2 may be able to source data via intervention for the 2nd read faster than the MC. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_SNOOP_DCLAIM_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_DCLAIM_RETRY_QFULL", .pme_code = 0x720e6, .pme_short_desc = "Snoop dclaim/flush retry due to write/dclaim queues full", .pme_long_desc = "The memory controller A memory write was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L25_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR_CYC", .pme_code = 0x2c70a2, .pme_short_desc = "Marked load latency from L2.5 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x300003, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ POWER5_PME_PM_FLUSH_BR_MPRED ] = { .pme_name = "PM_FLUSH_BR_MPRED", .pme_code = 0x110c6, .pme_short_desc = "Flush caused by branch mispredict", .pme_long_desc = "A flush was caused by a branch mispredict.", }, [ POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x202090, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU has executed a store instruction. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_CMPLU_STALL_FPU ] = { .pme_name = "PM_CMPLU_STALL_FPU", .pme_code = 0x411098, .pme_short_desc = "Completion stall caused by FPU instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a floating point instruction.", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_minus1or2_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus1or2_CYC", .pme_code = 0x430e2, .pme_short_desc = "Cycles thread priority difference is -1 or -2", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 1 or 2.", }, [ POWER5_PME_PM_GCT_NOSLOT_CYC ] = { .pme_name = "PM_GCT_NOSLOT_CYC", .pme_code = 0x100004, .pme_short_desc = "Cycles no GCT slot allocated", .pme_long_desc = "Cycles when the Global Completion Table has no slots from this thread.", }, [ POWER5_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x300012, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ POWER5_PME_PM_PTEG_FROM_L35_SHR ] = { .pme_name = "PM_PTEG_FROM_L35_SHR", .pme_code = 0x18309e, .pme_short_desc = "PTEG loaded from L3.5 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (S) data from the L3 of a chip on the same module as this processor is located, due to a demand load.", }, [ POWER5_PME_PM_MRK_LSU_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU_FLUSH_UST", .pme_code = 0x381090, .pme_short_desc = "Marked unaligned store flushes", .pme_long_desc = "A marked store was flushed because it was unaligned", }, [ POWER5_PME_PM_L3SA_HIT ] = { .pme_name = "PM_L3SA_HIT", .pme_code = 0x711c3, .pme_short_desc = "L3 slice A hits", .pme_long_desc = "Number of attempts made by this chip cores that resulted in an L3 hit. Reported per L3 slice", }, [ POWER5_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x1c7097, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5_PME_PM_L2SB_RCST_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_ADDR", .pme_code = 0x712c1, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a store failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L35_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L35_SHR", .pme_code = 0x1c709e, .pme_short_desc = "Marked data loaded from L3.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on the same module as this processor is located due to a marked load.", }, [ POWER5_PME_PM_IERAT_XLATE_WR ] = { .pme_name = "PM_IERAT_XLATE_WR", .pme_code = 0x220e7, .pme_short_desc = "Translation written to ierat", .pme_long_desc = "An entry was written into the IERAT as a result of an IERAT miss. This event can be used to count IERAT misses. An ERAT miss that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed.", }, [ POWER5_PME_PM_L2SA_ST_REQ ] = { .pme_name = "PM_L2SA_ST_REQ", .pme_code = 0x723e0, .pme_short_desc = "L2 slice A store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_THRD_SEL_T1 ] = { .pme_name = "PM_THRD_SEL_T1", .pme_code = 0x410c1, .pme_short_desc = "Decode selected thread 1", .pme_long_desc = "Thread selection picked thread 1 for decode.", }, [ POWER5_PME_PM_IC_DEMAND_L2_BR_REDIRECT ] = { .pme_name = "PM_IC_DEMAND_L2_BR_REDIRECT", .pme_code = 0x230e1, .pme_short_desc = "L2 I cache demand request due to branch redirect", .pme_long_desc = "A demand (not prefetch) miss to the instruction cache was sent to the L2 as a result of a branch prediction redirect (either ALL mispredicted or Target).", }, [ POWER5_PME_PM_INST_FROM_LMEM ] = { .pme_name = "PM_INST_FROM_LMEM", .pme_code = 0x222086, .pme_short_desc = "Instruction fetched from local memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to the same module this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_FPU0_1FLOP ] = { .pme_name = "PM_FPU0_1FLOP", .pme_code = 0xc3, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L35_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L35_SHR_CYC", .pme_code = 0x2c70a6, .pme_short_desc = "Marked load latency from L3.5 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_PTEG_FROM_L2 ] = { .pme_name = "PM_PTEG_FROM_L2", .pme_code = 0x183087, .pme_short_desc = "PTEG loaded from L2", .pme_long_desc = "A Page Table Entry was loaded into the TLB from the local L2 due to a demand load", }, [ POWER5_PME_PM_MEM_PW_CMPL ] = { .pme_name = "PM_MEM_PW_CMPL", .pme_code = 0x724e6, .pme_short_desc = "Memory partial-write completed", .pme_long_desc = "Number of Partial Writes completed. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_minus5or6_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_minus5or6_CYC", .pme_code = 0x430e0, .pme_short_desc = "Cycles thread priority difference is -5 or -6", .pme_long_desc = "Cycles when this thread's priority is lower than the other thread's priority by 5 or 6.", }, [ POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0x10c3, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "FPU0 finished, produced a result. This only indicates finish, not completion. Floating Point Stores are included in this count but not Floating Point Loads.", }, [ POWER5_PME_PM_MRK_DTLB_MISS_4K ] = { .pme_name = "PM_MRK_DTLB_MISS_4K", .pme_code = 0xc40c1, .pme_short_desc = "Marked Data TLB misses for 4K page", .pme_long_desc = "Data TLB references to 4KB pages by a marked instruction that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5_PME_PM_L3SC_SHR_INV ] = { .pme_name = "PM_L3SC_SHR_INV", .pme_code = 0x710c5, .pme_short_desc = "L3 slice C transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5_PME_PM_GRP_BR_REDIR ] = { .pme_name = "PM_GRP_BR_REDIR", .pme_code = 0x120e6, .pme_short_desc = "Group experienced branch redirect", .pme_long_desc = "Number of groups, counted at dispatch, that have encountered a branch redirect. Every group constructed from a fetch group that has been redirected will count.", }, [ POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5_PME_PM_MRK_LSU_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU_FLUSH_SRQ", .pme_code = 0x481088, .pme_short_desc = "Marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5_PME_PM_PTEG_FROM_L275_SHR ] = { .pme_name = "PM_PTEG_FROM_L275_SHR", .pme_code = 0x383097, .pme_short_desc = "PTEG loaded from L2.75 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (T) data from the L2 on a different module than this processor is located due to a demand load.", }, [ POWER5_PME_PM_L2SB_RCLD_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SB_RCLD_DISP_FAIL_RC_FULL", .pme_code = 0x721e1, .pme_short_desc = "L2 slice B RC load dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a load failed because all RC machines are busy.", }, [ POWER5_PME_PM_SNOOP_RD_RETRY_WQ ] = { .pme_name = "PM_SNOOP_RD_RETRY_WQ", .pme_code = 0x715c6, .pme_short_desc = "Snoop read retry due to collision with active write queue", .pme_long_desc = "A snoop request for a read from memory was retried because it matched the cache line of an active write. The snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_LSU0_NCLD ] = { .pme_name = "PM_LSU0_NCLD", .pme_code = 0xc50c1, .pme_short_desc = "LSU0 non-cacheable loads", .pme_long_desc = "A non-cacheable load was executed by unit 0.", }, [ POWER5_PME_PM_FAB_DCLAIM_RETRIED ] = { .pme_name = "PM_FAB_DCLAIM_RETRIED", .pme_code = 0x730e7, .pme_short_desc = "dclaim retried", .pme_long_desc = "A DCLAIM command was retried. Each chip reports its own counts. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_LSU1_BUSY_REJECT ] = { .pme_name = "PM_LSU1_BUSY_REJECT", .pme_code = 0xc20e7, .pme_short_desc = "LSU1 busy due to reject", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions.", }, [ POWER5_PME_PM_FXLS0_FULL_CYC ] = { .pme_name = "PM_FXLS0_FULL_CYC", .pme_code = 0x110c0, .pme_short_desc = "Cycles FXU0/LS0 queue full", .pme_long_desc = "The issue queue that feeds the Fixed Point unit 0 / Load Store Unit 0 is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0x10c2, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "FPU0 has executed an estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ POWER5_PME_PM_DTLB_REF_16M ] = { .pme_name = "PM_DTLB_REF_16M", .pme_code = 0xc40c6, .pme_short_desc = "Data TLB reference for 16M page", .pme_long_desc = "Data TLB references for 16MB pages. Includes hits + misses.", }, [ POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_LSU0_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU0_REJECT_ERAT_MISS", .pme_code = 0xc60e3, .pme_short_desc = "LSU0 reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions due to an ERAT miss. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x2c3097, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_GCT_USAGE_60to79_CYC ] = { .pme_name = "PM_GCT_USAGE_60to79_CYC", .pme_code = 0x20001f, .pme_short_desc = "Cycles GCT 60-79% full", .pme_long_desc = "Cycles when the Global Completion Table has between 60% and 70% of its slots used. The GCT has 20 entries shared between threads.", }, [ POWER5_PME_PM_DATA_FROM_L375_MOD ] = { .pme_name = "PM_DATA_FROM_L375_MOD", .pme_code = 0x1c30a7, .pme_short_desc = "Data loaded from L3.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x200015, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ POWER5_PME_PM_LSU0_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU0_REJECT_RELOAD_CDF", .pme_code = 0xc60e2, .pme_short_desc = "LSU0 reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated.", }, [ POWER5_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x42208d, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ POWER5_PME_PM_LSU1_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU1_REJECT_RELOAD_CDF", .pme_code = 0xc60e6, .pme_short_desc = "LSU1 reject due to reload CDF or tag update collision", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions because of Critical Data Forward. When critical data arrives from the storage system it is formatted and immediately forwarded, bypassing the data cache, to the destination register using the result bus. Any instruction the requires the result bus in the same cycle is rejected. Tag update rejects are caused when an instruction requires access to the Dcache directory or ERAT in the same system when they are being updated.", }, [ POWER5_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0xc70e7, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ POWER5_PME_PM_MEM_WQ_DISP_Q0to7 ] = { .pme_name = "PM_MEM_WQ_DISP_Q0to7", .pme_code = 0x723e6, .pme_short_desc = "Memory write queue dispatched to queues 0-7", .pme_long_desc = "A memory operation was dispatched to a write queue in the range between 0 and 7. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_DATA_FROM_LMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM_CYC", .pme_code = 0x4c70a0, .pme_short_desc = "Marked load latency from local memory", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_BRQ_FULL_CYC ] = { .pme_name = "PM_BRQ_FULL_CYC", .pme_code = 0x100c5, .pme_short_desc = "Cycles branch queue full", .pme_long_desc = "Cycles when the issue queue that feeds the branch unit is full. This condition will prevent dispatch groups from being dispatched. This event only indicates that the queue was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_GRP_IC_MISS_NONSPEC ] = { .pme_name = "PM_GRP_IC_MISS_NONSPEC", .pme_code = 0x112099, .pme_short_desc = "Group experienced non-speculative I cache miss", .pme_long_desc = "Number of groups, counted at completion, that have encountered an instruction cache miss.", }, [ POWER5_PME_PM_PTEG_FROM_L275_MOD ] = { .pme_name = "PM_PTEG_FROM_L275_MOD", .pme_code = 0x1830a3, .pme_short_desc = "PTEG loaded from L2.75 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L2 on a different module than this processor is located due to a demand load. ", }, [ POWER5_PME_PM_MRK_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU0", .pme_code = 0x820e0, .pme_short_desc = "LSU0 marked L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by LSU0.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L375_SHR_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L375_SHR_CYC", .pme_code = 0x2c70a7, .pme_short_desc = "Marked load latency from L3.75 shared", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_LSU_FLUSH ] = { .pme_name = "PM_LSU_FLUSH", .pme_code = 0x110c5, .pme_short_desc = "Flush initiated by LSU", .pme_long_desc = "A flush was initiated by the Load Store Unit", }, [ POWER5_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x1c308e, .pme_short_desc = "Data loaded from L3", .pme_long_desc = "The processor's Data Cache was reloaded from the local L3 due to a demand load.", }, [ POWER5_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x122086, .pme_short_desc = "Instruction fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ POWER5_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x30000a, .pme_short_desc = "PMC2 Overflow", .pme_long_desc = "Overflows from PMC2 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0x20e0, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "FPU0 has encountered a denormalized operand. ", }, [ POWER5_PME_PM_FPU1_FMOV_FEST ] = { .pme_name = "PM_FPU1_FMOV_FEST", .pme_code = 0x10c4, .pme_short_desc = "FPU1 executed FMOV or FEST instructions", .pme_long_desc = "FPU1 has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.", }, [ POWER5_PME_PM_INST_FETCH_CYC ] = { .pme_name = "PM_INST_FETCH_CYC", .pme_code = 0x220e4, .pme_short_desc = "Cycles at least 1 instruction fetched", .pme_long_desc = "Cycles when at least one instruction was sent from the fetch unit to the decode unit.", }, [ POWER5_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x4c5090, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x300009, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "Number of PowerPC instructions successfully dispatched.", }, [ POWER5_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x1c3097, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (T or SL) data from the L2 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0xc30e4, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid,the data cache has been reloaded. Prior to POWER5+ this included data cache reloads due to prefetch activity. With POWER5+ this now only includes reloads due to demand loads.", }, [ POWER5_PME_PM_MEM_WQ_DISP_DCLAIM ] = { .pme_name = "PM_MEM_WQ_DISP_DCLAIM", .pme_code = 0x713c6, .pme_short_desc = "Memory write queue dispatched due to dclaim/flush", .pme_long_desc = "A memory dclaim or flush operation was dispatched to a write queue. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_FPU_FULL_CYC ] = { .pme_name = "PM_FPU_FULL_CYC", .pme_code = 0x110090, .pme_short_desc = "Cycles FPU issue queue full", .pme_long_desc = "Cycles when one or both FPU issue queues are full. Combined Unit 0 + 1. Use with caution since this is the sum of cycles when Unit 0 was full plus Unit 1 full. It does not indicate when both units were full.", }, [ POWER5_PME_PM_MRK_GRP_ISSUED ] = { .pme_name = "PM_MRK_GRP_ISSUED", .pme_code = 0x100015, .pme_short_desc = "Marked group issued", .pme_long_desc = "A sampled instruction was issued.", }, [ POWER5_PME_PM_THRD_PRIO_3_CYC ] = { .pme_name = "PM_THRD_PRIO_3_CYC", .pme_code = 0x420e2, .pme_short_desc = "Cycles thread running at priority level 3", .pme_long_desc = "Cycles this thread was running at priority level 3.", }, [ POWER5_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x200088, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_INST_FROM_L35_MOD ] = { .pme_name = "PM_INST_FROM_L35_MOD", .pme_code = 0x22209d, .pme_short_desc = "Instruction fetched from L3.5 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L3 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_MRK_CRU_FIN ] = { .pme_name = "PM_MRK_CRU_FIN", .pme_code = 0x400005, .pme_short_desc = "Marked instruction CRU processing finished", .pme_long_desc = "The Condition Register Unit finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_SNOOP_WR_RETRY_WQ ] = { .pme_name = "PM_SNOOP_WR_RETRY_WQ", .pme_code = 0x716c6, .pme_short_desc = "Snoop write/dclaim retry due to collision with active write queue", .pme_long_desc = "A snoop request for a write or dclaim to memory was retried because it matched the cache line of an active write. The snoop request is retried and the active write is changed to high priority. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_CMPLU_STALL_REJECT ] = { .pme_name = "PM_CMPLU_STALL_REJECT", .pme_code = 0x41109a, .pme_short_desc = "Completion stall caused by reject", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered a load/store reject. This is a subset of PM_CMPLU_STALL_LSU.", }, [ POWER5_PME_PM_LSU1_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU1_REJECT_ERAT_MISS", .pme_code = 0xc60e7, .pme_short_desc = "LSU1 reject due to ERAT miss", .pme_long_desc = "Total cycles the Load Store Unit 1 is busy rejecting instructions due to an ERAT miss. Requests that miss the Derat are rejected and retried until the request hits in the Erat.", }, [ POWER5_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x200014, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "One of the Fixed Point Units finished a marked instruction. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_L2SB_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e1, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5_PME_PM_L2SC_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SC_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c2, .pme_short_desc = "L2 slice C RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", }, [ POWER5_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x10000a, .pme_short_desc = "PMC4 Overflow", .pme_long_desc = "Overflows from PMC4 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_L3SA_SNOOP_RETRY ] = { .pme_name = "PM_L3SA_SNOOP_RETRY", .pme_code = 0x731e3, .pme_short_desc = "L3 slice A snoop retries", .pme_long_desc = "Number of times an L3 retried a snoop because it got two in at the same time (one on snp_a, one on snp_b)", }, [ POWER5_PME_PM_PTEG_FROM_L35_MOD ] = { .pme_name = "PM_PTEG_FROM_L35_MOD", .pme_code = 0x28309e, .pme_short_desc = "PTEG loaded from L3.5 modified", .pme_long_desc = "A Page Table Entry was loaded into the TLB with modified (M) data from the L3 of a chip on the same module as this processor is located, due to a demand load.", }, [ POWER5_PME_PM_INST_FROM_L25_MOD ] = { .pme_name = "PM_INST_FROM_L25_MOD", .pme_code = 0x222096, .pme_short_desc = "Instruction fetched from L2.5 modified", .pme_long_desc = "An instruction fetch group was fetched with modified (M) data from the L2 of a chip on the same module as this processor is located. Fetch groups can contain up to 8 instructions.", }, [ POWER5_PME_PM_THRD_SMT_HANG ] = { .pme_name = "PM_THRD_SMT_HANG", .pme_code = 0x330e7, .pme_short_desc = "SMT hang detected", .pme_long_desc = "A hung thread was detected", }, [ POWER5_PME_PM_CMPLU_STALL_ERAT_MISS ] = { .pme_name = "PM_CMPLU_STALL_ERAT_MISS", .pme_code = 0x41109b, .pme_short_desc = "Completion stall caused by ERAT miss", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes suffered an ERAT miss. This is a subset of PM_CMPLU_STALL_REJECT.", }, [ POWER5_PME_PM_L3SA_MOD_TAG ] = { .pme_name = "PM_L3SA_MOD_TAG", .pme_code = 0x720e3, .pme_short_desc = "L3 slice A transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case) Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_FLUSH_SYNC ] = { .pme_name = "PM_FLUSH_SYNC", .pme_code = 0x330e1, .pme_short_desc = "Flush caused by sync", .pme_long_desc = "This thread has been flushed at dispatch due to a sync, lwsync, ptesync, or tlbsync instruction. This allows the other thread to have more machine resources for it to make progress until the sync finishes.", }, [ POWER5_PME_PM_INST_FROM_L2MISS ] = { .pme_name = "PM_INST_FROM_L2MISS", .pme_code = 0x12209b, .pme_short_desc = "Instruction fetched missed L2", .pme_long_desc = "An instruction fetch group was fetched from beyond the local L2.", }, [ POWER5_PME_PM_L2SC_ST_HIT ] = { .pme_name = "PM_L2SC_ST_HIT", .pme_code = 0x733e2, .pme_short_desc = "L2 slice C store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_MEM_RQ_DISP_Q8to11 ] = { .pme_name = "PM_MEM_RQ_DISP_Q8to11", .pme_code = 0x722e6, .pme_short_desc = "Memory read queue dispatched to queues 8-11", .pme_long_desc = "A memory operation was dispatched to read queue 8,9,10 or 11. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_MRK_GRP_DISP ] = { .pme_name = "PM_MRK_GRP_DISP", .pme_code = 0x100002, .pme_short_desc = "Marked group dispatched", .pme_long_desc = "A group containing a sampled instruction was dispatched", }, [ POWER5_PME_PM_L2SB_MOD_TAG ] = { .pme_name = "PM_L2SB_MOD_TAG", .pme_code = 0x720e1, .pme_short_desc = "L2 slice B transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_CLB_EMPTY_CYC ] = { .pme_name = "PM_CLB_EMPTY_CYC", .pme_code = 0x410c6, .pme_short_desc = "Cycles CLB empty", .pme_long_desc = "Cycles when both thread's CLB is completely empty.", }, [ POWER5_PME_PM_L2SB_ST_HIT ] = { .pme_name = "PM_L2SB_ST_HIT", .pme_code = 0x733e1, .pme_short_desc = "L2 slice B store hits", .pme_long_desc = "A store request made from the core hit in the L2 directory. This event is provided on each of the three L2 slices A, B and C.", }, [ POWER5_PME_PM_MEM_NONSPEC_RD_CANCEL ] = { .pme_name = "PM_MEM_NONSPEC_RD_CANCEL", .pme_code = 0x711c6, .pme_short_desc = "Non speculative memory read cancelled", .pme_long_desc = "A non-speculative read was cancelled because the combined response indicated it was sourced from aother L2 or L3. This event is sent from the Memory Controller clock domain and must be scaled accordingly", }, [ POWER5_PME_PM_BR_PRED_CR_TA ] = { .pme_name = "PM_BR_PRED_CR_TA", .pme_code = 0x423087, .pme_short_desc = "A conditional branch was predicted, CR and target prediction", .pme_long_desc = "Both the condition (taken or not taken) and the target address of a branch instruction was predicted.", }, [ POWER5_PME_PM_MRK_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_SRQ", .pme_code = 0x810c3, .pme_short_desc = "LSU0 marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5_PME_PM_MRK_LSU_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU_FLUSH_ULD", .pme_code = 0x481090, .pme_short_desc = "Marked unaligned load flushes", .pme_long_desc = "A marked load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ POWER5_PME_PM_INST_DISP_ATTEMPT ] = { .pme_name = "PM_INST_DISP_ATTEMPT", .pme_code = 0x120e1, .pme_short_desc = "Instructions dispatch attempted", .pme_long_desc = "Number of PowerPC Instructions dispatched (attempted, not filtered by success.", }, [ POWER5_PME_PM_INST_FROM_RMEM ] = { .pme_name = "PM_INST_FROM_RMEM", .pme_code = 0x422086, .pme_short_desc = "Instruction fetched from remote memory", .pme_long_desc = "An instruction fetch group was fetched from memory attached to a different module than this proccessor is located on. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_ST_REF_L1_LSU0 ] = { .pme_name = "PM_ST_REF_L1_LSU0", .pme_code = 0xc10c1, .pme_short_desc = "LSU0 L1 D cache store references", .pme_long_desc = "Store references to the Data Cache by LSU0.", }, [ POWER5_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x800c2, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "Total D-ERAT Misses by LSU0. Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ POWER5_PME_PM_L2SB_RCLD_DISP ] = { .pme_name = "PM_L2SB_RCLD_DISP", .pme_code = 0x701c1, .pme_short_desc = "L2 slice B RC load dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Load was attempted", }, [ POWER5_PME_PM_FPU_STALL3 ] = { .pme_name = "PM_FPU_STALL3", .pme_code = 0x202088, .pme_short_desc = "FPU stalled in pipe3", .pme_long_desc = "FPU has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_BR_PRED_CR ] = { .pme_name = "PM_BR_PRED_CR", .pme_code = 0x230e2, .pme_short_desc = "A conditional branch was predicted, CR prediction", .pme_long_desc = "A conditional branch instruction was predicted as taken or not taken.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1c7087, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a marked load.", }, [ POWER5_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0xc00c3, .pme_short_desc = "LSU0 SRQ lhs flushes", .pme_long_desc = "A store was flushed by unit 0 because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5_PME_PM_FAB_PNtoNN_DIRECT ] = { .pme_name = "PM_FAB_PNtoNN_DIRECT", .pme_code = 0x703c7, .pme_short_desc = "PN to NN beat went straight to its destination", .pme_long_desc = "Fabric Data beats that the base chip takes the inbound PN data and passes it through to the outbound NN bus without going into a sidecar. The signal is delivered at FBC speed and the count must be scaled.", }, [ POWER5_PME_PM_IOPS_CMPL ] = { .pme_name = "PM_IOPS_CMPL", .pme_code = 0x1, .pme_short_desc = "Internal operations completed", .pme_long_desc = "Number of internal operations that completed.", }, [ POWER5_PME_PM_L2SC_SHR_INV ] = { .pme_name = "PM_L2SC_SHR_INV", .pme_code = 0x710c2, .pme_short_desc = "L2 slice C transition from shared to invalid", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L, or Tagged) to the Invalid state. This transition was caused by any external snoop request. The event is provided on each of the three slices A, B, and C. NOTE: For this event to be useful the tablewalk duration event should also be counted.", }, [ POWER5_PME_PM_L2SA_RCST_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SA_RCST_DISP_FAIL_OTHER", .pme_code = 0x732e0, .pme_short_desc = "L2 slice A RC store dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a store failed for some reason other than Full or Collision conditions. Rejected dispatches do not count because they have not yet been attempted.", }, [ POWER5_PME_PM_L2SA_RCST_DISP ] = { .pme_name = "PM_L2SA_RCST_DISP", .pme_code = 0x702c0, .pme_short_desc = "L2 slice A RC store dispatch attempt", .pme_long_desc = "A Read/Claim dispatch for a Store was attempted.", }, [ POWER5_PME_PM_SNOOP_RETRY_AB_COLLISION ] = { .pme_name = "PM_SNOOP_RETRY_AB_COLLISION", .pme_code = 0x735e6, .pme_short_desc = "Snoop retry due to a b collision", .pme_long_desc = "Snoop retry due to a b collision", }, [ POWER5_PME_PM_FAB_PNtoVN_SIDECAR ] = { .pme_name = "PM_FAB_PNtoVN_SIDECAR", .pme_code = 0x733e7, .pme_short_desc = "PN to VN beat went to sidecar first", .pme_long_desc = "Fabric data beats that the base chip takes the inbound PN data and forwards it on to the outbound VN data bus after going into a sidecar first. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0xc30e6, .pme_short_desc = "LMQ slot 0 allocated", .pme_long_desc = "The first entry in the LMQ was allocated.", }, [ POWER5_PME_PM_LSU0_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_LMQ_FULL", .pme_code = 0xc60e1, .pme_short_desc = "LSU0 reject due to LMQ full or missed data coming", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because the Load Miss Queue was full. The LMQ has eight entries. If all eight entries are full, subsequent load instructions are rejected.", }, [ POWER5_PME_PM_SNOOP_PW_RETRY_RQ ] = { .pme_name = "PM_SNOOP_PW_RETRY_RQ", .pme_code = 0x707c6, .pme_short_desc = "Snoop partial-write retry due to collision with active read queue", .pme_long_desc = "A snoop request for a partial write to memory was retried because it matched the cache line of an active read. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_DTLB_REF ] = { .pme_name = "PM_DTLB_REF", .pme_code = 0x2c4090, .pme_short_desc = "Data TLB references", .pme_long_desc = "Total number of Data TLB references for all page sizes. Page size is determined at TLB reload time.", }, [ POWER5_PME_PM_PTEG_FROM_L3 ] = { .pme_name = "PM_PTEG_FROM_L3", .pme_code = 0x18308e, .pme_short_desc = "PTEG loaded from L3", .pme_long_desc = "A Page Table Entry was loaded into the TLB from the local L3 due to a demand load.", }, [ POWER5_PME_PM_FAB_M1toVNorNN_SIDECAR_EMPTY ] = { .pme_name = "PM_FAB_M1toVNorNN_SIDECAR_EMPTY", .pme_code = 0x712c7, .pme_short_desc = "M1 to VN/NN sidecar empty", .pme_long_desc = "Fabric cycles when the Minus-1 jump sidecar (sidecars for mcm to mcm data transfer) is empty. The signal is delivered at FBC speed and the count must be scaled accordingly.", }, [ POWER5_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x400015, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "Cycles the Store Request Queue is empty", }, [ POWER5_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0x20e6, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "FPU1 has executed a Floating Point Store instruction.", }, [ POWER5_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0xc30e5, .pme_short_desc = "LMQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle when the first entry in the LMQ is valid. The LMQ had eight entries that are allocated FIFO", }, [ POWER5_PME_PM_GCT_USAGE_00to59_CYC ] = { .pme_name = "PM_GCT_USAGE_00to59_CYC", .pme_code = 0x10001f, .pme_short_desc = "Cycles GCT less than 60% full", .pme_long_desc = "Cycles when the Global Completion Table has fewer than 60% of its slots used. The GCT has 20 entries shared between threads.", }, [ POWER5_PME_PM_DATA_FROM_L2MISS ] = { .pme_name = "PM_DATA_FROM_L2MISS", .pme_code = 0x3c309b, .pme_short_desc = "Data loaded missed L2", .pme_long_desc = "The processor's Data Cache was reloaded but not from the local L2.", }, [ POWER5_PME_PM_GRP_DISP_BLK_SB_CYC ] = { .pme_name = "PM_GRP_DISP_BLK_SB_CYC", .pme_code = 0x130e1, .pme_short_desc = "Cycles group dispatch blocked by scoreboard", .pme_long_desc = "A scoreboard operation on a non-renamed resource has blocked dispatch.", }, [ POWER5_PME_PM_FPU_FMOV_FEST ] = { .pme_name = "PM_FPU_FMOV_FEST", .pme_code = 0x301088, .pme_short_desc = "FPU executed FMOV or FEST instructions", .pme_long_desc = "The floating point unit has executed a move kind of instruction or one of the estimate instructions. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ.. Combined Unit 0 + Unit 1.", }, [ POWER5_PME_PM_XER_MAP_FULL_CYC ] = { .pme_name = "PM_XER_MAP_FULL_CYC", .pme_code = 0x100c2, .pme_short_desc = "Cycles XER mapper full", .pme_long_desc = "The XER mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_FLUSH_SB ] = { .pme_name = "PM_FLUSH_SB", .pme_code = 0x330e2, .pme_short_desc = "Flush caused by scoreboard operation", .pme_long_desc = "This thread has been flushed at dispatch because its scoreboard bit is set indicating that a non-renamed resource is being updated. This allows the other thread to have more machine resources for it to make progress while this thread is stalled.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L375_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L375_SHR", .pme_code = 0x3c709e, .pme_short_desc = "Marked data loaded from L3.75 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on a different module than this processor is located due to a marked load.", }, [ POWER5_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x400013, .pme_short_desc = "Marked group completed", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ POWER5_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Suspended", .pme_long_desc = "The counter is suspended (does not count).", }, [ POWER5_PME_PM_GRP_IC_MISS_BR_REDIR_NONSPEC ] = { .pme_name = "PM_GRP_IC_MISS_BR_REDIR_NONSPEC", .pme_code = 0x120e5, .pme_short_desc = "Group experienced non-speculative I cache miss or branch redirect", .pme_long_desc = "Group experienced non-speculative I cache miss or branch redirect", }, [ POWER5_PME_PM_SNOOP_RD_RETRY_QFULL ] = { .pme_name = "PM_SNOOP_RD_RETRY_QFULL", .pme_code = 0x700c6, .pme_short_desc = "Snoop read retry due to read queue full", .pme_long_desc = "A snoop request for a read from memory was retried because the read queues were full. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_L3SB_MOD_INV ] = { .pme_name = "PM_L3SB_MOD_INV", .pme_code = 0x730e4, .pme_short_desc = "L3 slice B transition from modified to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is truly M in this L3 (i.e. L3 going M=>I). Mu|Me are not included since they are formed due to a prev read op. Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_DATA_FROM_L35_SHR ] = { .pme_name = "PM_DATA_FROM_L35_SHR", .pme_code = 0x1c309e, .pme_short_desc = "Data loaded from L3.5 shared", .pme_long_desc = "The processor's Data Cache was reloaded with shared (S) data from the L3 of a chip on the same module as this processor is located due to a demand load.", }, [ POWER5_PME_PM_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_LD_MISS_L1_LSU1", .pme_code = 0xc10c6, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache, by unit 1.", }, [ POWER5_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x820e1, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ POWER5_PME_PM_DC_PREF_DST ] = { .pme_name = "PM_DC_PREF_DST", .pme_code = 0x830e6, .pme_short_desc = "DST (Data Stream Touch) stream start", .pme_long_desc = "A prefetch stream was started using the DST instruction.", }, [ POWER5_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x200002, .pme_short_desc = "Group dispatches", .pme_long_desc = "A group was dispatched", }, [ POWER5_PME_PM_L2SA_RCLD_DISP_FAIL_ADDR ] = { .pme_name = "PM_L2SA_RCLD_DISP_FAIL_ADDR", .pme_code = 0x711c0, .pme_short_desc = "L2 slice A RC load dispatch attempt failed due to address collision with RC/CO/SN/SQ", .pme_long_desc = "A Read/Claim dispatch for a load failed because of an address conflict. Two RC machines will never both work on the same line or line in the same congruence class at the same time.", }, [ POWER5_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0x30e0, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "FPU0 has executed FPSCR move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*, mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1c3087, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "The processor's Data Cache was reloaded from the local L2 due to a demand load.", }, [ POWER5_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0x20e4, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "FPU1 has encountered a denormalized operand.", }, [ POWER5_PME_PM_FPU_1FLOP ] = { .pme_name = "PM_FPU_1FLOP", .pme_code = 0x100090, .pme_short_desc = "FPU executed one flop instruction", .pme_long_desc = "The floating point unit has executed an add, mult, sub, compare, fsel, fneg, fabs, fnabs, fres, or frsqrte kind of instruction. These are single FLOP operations.", }, [ POWER5_PME_PM_L2SC_RCLD_DISP_FAIL_OTHER ] = { .pme_name = "PM_L2SC_RCLD_DISP_FAIL_OTHER", .pme_code = 0x731e2, .pme_short_desc = "L2 slice C RC load dispatch attempt failed due to other reasons", .pme_long_desc = "A Read/Claim dispatch for a load failed for some reason other than Full or Collision conditions.", }, [ POWER5_PME_PM_L2SC_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SC_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e2, .pme_short_desc = "L2 slice C RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5_PME_PM_FPU0_FSQRT ] = { .pme_name = "PM_FPU0_FSQRT", .pme_code = 0xc2, .pme_short_desc = "FPU0 executed FSQRT instruction", .pme_long_desc = "FPU0 has executed a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ POWER5_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x4c1090, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Load references to the Level 1 Data Cache. Combined unit 0 + 1.", }, [ POWER5_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x22208d, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ POWER5_PME_PM_TLBIE_HELD ] = { .pme_name = "PM_TLBIE_HELD", .pme_code = 0x130e4, .pme_short_desc = "TLBIE held at dispatch", .pme_long_desc = "Cycles a TLBIE instruction was held at dispatch.", }, [ POWER5_PME_PM_DC_PREF_OUT_OF_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_OF_STREAMS", .pme_code = 0xc50c2, .pme_short_desc = "D cache out of prefetch streams", .pme_long_desc = "A new prefetch stream was detected but no more stream entries were available.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L25_MOD_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD_CYC", .pme_code = 0x4c70a2, .pme_short_desc = "Marked load latency from L2.5 modified", .pme_long_desc = "Cycles a marked load waited for data from this level of the storage system. Counting begins when a marked load misses the data cache and ends when the data is reloaded into the data cache. To calculate average latency divide this count by the number of marked misses to the same level.", }, [ POWER5_PME_PM_MRK_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_SRQ", .pme_code = 0x810c7, .pme_short_desc = "LSU1 marked SRQ lhs flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ POWER5_PME_PM_MEM_RQ_DISP_Q0to3 ] = { .pme_name = "PM_MEM_RQ_DISP_Q0to3", .pme_code = 0x702c6, .pme_short_desc = "Memory read queue dispatched to queues 0-3", .pme_long_desc = "A memory operation was dispatched to read queue 0,1,2, or 3. This event is sent from the Memory Controller clock domain and must be scaled accordingly.", }, [ POWER5_PME_PM_ST_REF_L1_LSU1 ] = { .pme_name = "PM_ST_REF_L1_LSU1", .pme_code = 0xc10c5, .pme_short_desc = "LSU1 L1 D cache store references", .pme_long_desc = "Store references to the Data Cache by LSU1.", }, [ POWER5_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x182088, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ POWER5_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x230e7, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "Cycles that a cache line was written to the instruction cache.", }, [ POWER5_PME_PM_L2SC_ST_REQ ] = { .pme_name = "PM_L2SC_ST_REQ", .pme_code = 0x723e2, .pme_short_desc = "L2 slice C store requests", .pme_long_desc = "A store request as seen at the L2 directory has been made from the core. Stores are counted after gathering in the L2 store queues. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_CMPLU_STALL_FDIV ] = { .pme_name = "PM_CMPLU_STALL_FDIV", .pme_code = 0x21109b, .pme_short_desc = "Completion stall caused by FDIV or FQRT instruction", .pme_long_desc = "Following a completion stall (any period when no groups completed) the last instruction to finish before completion resumes was a floating point divide or square root instruction. This is a subset of PM_CMPLU_STALL_FPU.", }, [ POWER5_PME_PM_THRD_SEL_OVER_CLB_EMPTY ] = { .pme_name = "PM_THRD_SEL_OVER_CLB_EMPTY", .pme_code = 0x410c2, .pme_short_desc = "Thread selection overrides caused by CLB empty", .pme_long_desc = "Thread selection was overridden because one thread's CLB was empty.", }, [ POWER5_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x230e5, .pme_short_desc = "Branch mispredictions due to CR bit setting", .pme_long_desc = "A conditional branch instruction was incorrectly predicted as taken or not taken. The branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This will result in a branch redirect flush if not overfidden by a flush of an older instruction.", }, [ POWER5_PME_PM_L3SB_MOD_TAG ] = { .pme_name = "PM_L3SB_MOD_TAG", .pme_code = 0x720e4, .pme_short_desc = "L3 slice B transition from modified to TAG", .pme_long_desc = "L3 snooper detects someone doing a read to a line that is truly M in this L3(i.e. L3 going M->T or M->I(go_Mu case); Mu|Me are not included since they are formed due to a prev read op). Tx is not included since it is considered shared at this point.", }, [ POWER5_PME_PM_MRK_DATA_FROM_L2MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS", .pme_code = 0x3c709b, .pme_short_desc = "Marked data loaded missed L2", .pme_long_desc = "DL1 was reloaded from beyond L2 due to a marked demand load.", }, [ POWER5_PME_PM_LSU_REJECT_SRQ ] = { .pme_name = "PM_LSU_REJECT_SRQ", .pme_code = 0x1c6088, .pme_short_desc = "LSU SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x3c1088, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Load references that miss the Level 1 Data cache. Combined unit 0 + 1.", }, [ POWER5_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x32208d, .pme_short_desc = "Instruction fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch groups can contain up to 8 instructions", }, [ POWER5_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0xc10c7, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ POWER5_PME_PM_STCX_PASS ] = { .pme_name = "PM_STCX_PASS", .pme_code = 0x820e5, .pme_short_desc = "Stcx passes", .pme_long_desc = "A stcx (stwcx or stdcx) instruction was successful", }, [ POWER5_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x110c3, .pme_short_desc = "Cycles SRQ full", .pme_long_desc = "Cycles the Store Request Queue is full.", }, [ POWER5_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x401088, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result. This only indicates finish, not completion. Combined Unit 0 + Unit 1. Floating Point Stores are included in this count but not Floating Point Loads., , , XYZs", }, [ POWER5_PME_PM_L2SA_SHR_MOD ] = { .pme_name = "PM_L2SA_SHR_MOD", .pme_code = 0x700c0, .pme_short_desc = "L2 slice A transition from shared to modified", .pme_long_desc = "A cache line in the local L2 directory made a state transition from Shared (Shared, Shared L , or Tagged) to the Modified state. This transition was caused by a store from either of the two local CPUs to a cache line in any of the Shared states. The event is provided on each of the three slices A, B, and C. ", }, [ POWER5_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0x1c2088, .pme_short_desc = "SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load. A load that misses L1 but becomes a store forward is treated as a load miss and it causes the DL1 load miss event to be counted. It does not go into the LMQ. If a load that hits L1 but becomes a store forward, then it's not treated as a load miss. Combined Unit 0 + 1.", }, [ POWER5_PME_PM_0INST_CLB_CYC ] = { .pme_name = "PM_0INST_CLB_CYC", .pme_code = 0x400c0, .pme_short_desc = "Cycles no instructions in CLB", .pme_long_desc = "The cache line buffer (CLB) is a 6-deep, 4-wide instruction buffer. Fullness is reported on a cycle basis with each event representing the number of cycles the CLB had the corresponding number of entries occupied. These events give a real time history of the number of instruction buffers used, but not the number of PowerPC instructions within those buffers. Each thread has its own set of CLB; these events are thread specific.", }, [ POWER5_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x130e2, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result. Instructions that finish may not necessary complete.", }, [ POWER5_PME_PM_L2SB_RCST_DISP_FAIL_RC_FULL ] = { .pme_name = "PM_L2SB_RCST_DISP_FAIL_RC_FULL", .pme_code = 0x722e1, .pme_short_desc = "L2 slice B RC store dispatch attempt failed due to all RC full", .pme_long_desc = "A Read/Claim dispatch for a store failed because all RC machines are busy.", }, [ POWER5_PME_PM_THRD_GRP_CMPL_BOTH_CYC ] = { .pme_name = "PM_THRD_GRP_CMPL_BOTH_CYC", .pme_code = 0x200013, .pme_short_desc = "Cycles group completed by both threads", .pme_long_desc = "Cycles that both threads completed.", }, [ POWER5_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x10001a, .pme_short_desc = "PMC5 Overflow", .pme_long_desc = "Overflows from PMC5 are counted. This effectively widens the PMC. The Overflow from the original PMC will not trigger an exception even if the PMU is configured to generate exceptions on overflow.", }, [ POWER5_PME_PM_FPU0_FDIV ] = { .pme_name = "PM_FPU0_FDIV", .pme_code = 0xc0, .pme_short_desc = "FPU0 executed FDIV instruction", .pme_long_desc = "FPU0 has executed a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ POWER5_PME_PM_PTEG_FROM_L375_SHR ] = { .pme_name = "PM_PTEG_FROM_L375_SHR", .pme_code = 0x38309e, .pme_short_desc = "PTEG loaded from L3.75 shared", .pme_long_desc = "A Page Table Entry was loaded into the TLB with shared (S) data from the L3 of a chip on a different module than this processor is located, due to a demand load.", }, [ POWER5_PME_PM_LD_REF_L1_LSU1 ] = { .pme_name = "PM_LD_REF_L1_LSU1", .pme_code = 0xc10c4, .pme_short_desc = "LSU1 L1 D cache load references", .pme_long_desc = "Load references to Level 1 Data Cache, by unit 1.", }, [ POWER5_PME_PM_L2SA_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SA_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c0, .pme_short_desc = "L2 slice A RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", }, [ POWER5_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x20000b, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ POWER5_PME_PM_THRD_PRIO_DIFF_0_CYC ] = { .pme_name = "PM_THRD_PRIO_DIFF_0_CYC", .pme_code = 0x430e3, .pme_short_desc = "Cycles no thread priority difference", .pme_long_desc = "Cycles when this thread's priority is equal to the other thread's priority.", }, [ POWER5_PME_PM_LR_CTR_MAP_FULL_CYC ] = { .pme_name = "PM_LR_CTR_MAP_FULL_CYC", .pme_code = 0x100c6, .pme_short_desc = "Cycles LR/CTR mapper full", .pme_long_desc = "The LR/CTR mapper cannot accept any more groups. This condition will prevent dispatch groups from being dispatched. This event only indicates that the mapper was full, not that dispatch was prevented.", }, [ POWER5_PME_PM_L3SB_SHR_INV ] = { .pme_name = "PM_L3SB_SHR_INV", .pme_code = 0x710c4, .pme_short_desc = "L3 slice B transition from shared to invalid", .pme_long_desc = "L3 snooper detects someone doing a store to a line that is Sx in this L3(i.e. invalidate hit SX and dispatched).", }, [ POWER5_PME_PM_DATA_FROM_RMEM ] = { .pme_name = "PM_DATA_FROM_RMEM", .pme_code = 0x1c30a1, .pme_short_desc = "Data loaded from remote memory", .pme_long_desc = "The processor's Data Cache was reloaded from memory attached to a different module than this proccessor is located on.", }, [ POWER5_PME_PM_DATA_FROM_L275_MOD ] = { .pme_name = "PM_DATA_FROM_L275_MOD", .pme_code = 0x1c30a3, .pme_short_desc = "Data loaded from L2.75 modified", .pme_long_desc = "The processor's Data Cache was reloaded with modified (M) data from the L2 on a different module than this processor is located due to a demand load. ", }, [ POWER5_PME_PM_LSU0_REJECT_SRQ ] = { .pme_name = "PM_LSU0_REJECT_SRQ", .pme_code = 0xc60e0, .pme_short_desc = "LSU0 SRQ lhs rejects", .pme_long_desc = "Total cycles the Load Store Unit 0 is busy rejecting instructions because of Load Hit Store conditions. Loads are rejected when data is needed from a previous store instruction but store forwarding is not possible because the data is not fully contained in the Store Data Queue or is not yet available in the Store Data Queue.", }, [ POWER5_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x800c6, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ POWER5_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x400014, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ POWER5_PME_PM_DTLB_MISS_16M ] = { .pme_name = "PM_DTLB_MISS_16M", .pme_code = 0xc40c4, .pme_short_desc = "Data TLB miss for 16M page", .pme_long_desc = "Data TLB references to 16MB pages that missed the TLB. Page size is determined at TLB reload time.", }, [ POWER5_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0xc00c1, .pme_short_desc = "LSU0 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4K boundary).", }, [ POWER5_PME_PM_L2SC_MOD_TAG ] = { .pme_name = "PM_L2SC_MOD_TAG", .pme_code = 0x720e2, .pme_short_desc = "L2 slice C transition from modified to tagged", .pme_long_desc = "A cache line in the local L2 directory made a state transition from the Modified state to the Tagged state. This transition was caused by a read snoop request that hit against a modified entry in the local L2. The event is provided on each of the three slices A, B, and C.", }, [ POWER5_PME_PM_L2SB_RC_DISP_FAIL_CO_BUSY ] = { .pme_name = "PM_L2SB_RC_DISP_FAIL_CO_BUSY", .pme_code = 0x703c1, .pme_short_desc = "L2 slice B RC dispatch attempt failed due to RC/CO pair chosen was miss and CO already busy", .pme_long_desc = "A Read/Claim Dispatch was rejected at dispatch because the Castout Machine was busy. In the case of an RC starting up on a miss and the victim is valid, the CO machine must be available for the RC to process the access. If the CO is still busy working on an old castout, then the RC must not-ack the access if it is a miss(re-issued by the CIU). If it is a miss and the CO is available to process the castout, the RC will accept the access. Once the RC has finished, it can restart and process new accesses that result in a hit (or miss that doesn't need a CO) even though the CO is still processing a castout from a previous access.", } }; #endif papi-5.3.0/src/libpfm4/lib/events/sparc_ultra3_events.h0000600003276200002170000002456412247131124022507 0ustar ralphundrgradstatic const sparc_entry_t ultra3_pe[] = { /* These two must always be first. */ { .name = "Cycle_cnt", .desc = "Accumulated cycles", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x0, }, { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S0 | PME_CTRL_S1, .code = 0x1, }, /* PIC0 events common to all UltraSPARC processors */ { .name = "Dispatch0_IC_miss", .desc = "I-buffer is empty from I-Cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, { .name = "IC_ref", .desc = "I-cache refrences", .ctrl = PME_CTRL_S0, .code = 0x8, }, { .name = "DC_rd", .desc = "D-cache read references (including accesses that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0x9, }, { .name = "DC_wr", .desc = "D-cache store accesses (including cacheable stores that subsequently trap)", .ctrl = PME_CTRL_S0, .code = 0xa, }, { .name = "EC_ref", .desc = "E-cache references", .ctrl = PME_CTRL_S0, .code = 0xc, }, { .name = "EC_snoop_inv", .desc = "L2-cache invalidates generated from a snoop by a remote processor", .ctrl = PME_CTRL_S0, .code = 0xe, }, /* PIC1 events common to all UltraSPARC processors */ { .name = "Dispatch0_mispred", .desc = "I-buffer is empty from Branch misprediction", .ctrl = PME_CTRL_S1, .code = 0x2, }, { .name = "EC_wb", .desc = "Dirty sub-blocks that produce writebacks due to L2-cache miss events", .ctrl = PME_CTRL_S1, .code = 0xd, }, { .name = "EC_snoop_cb", .desc = "L2-cache copybacks generated from a snoop by a remote processor", .ctrl = PME_CTRL_S1, .code = 0xe, }, /* PIC0 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "Dispatch0_br_target", .desc = "I-buffer is empty due to a branch target address calculation", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "Dispatch0_2nd_br", .desc = "Stall cycles due to having two branch instructions line-up in one 4-instruction group causing the second branch in the group to be re-fetched, delaying it's entrance into the I-buffer", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "Rstall_storeQ", .desc = "R-stage stall for a store instruction which is the next instruction to be executed, but it stailled due to the store queue being full", .ctrl = PME_CTRL_S0, .code = 0x5, }, { .name = "Rstall_IU_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding integer instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S0, .code = 0x6, }, { .name = "EC_write_hit_RTO", .desc = "W-cache exclusive requests that hit L2-cache in S, O, or Os state and thus, do a read-to-own bus transaction", .ctrl = PME_CTRL_S0, .code = 0xd, }, { .name = "EC_rd_miss", .desc = "L2-cache miss events (including atomics) from D-cache events", .ctrl = PME_CTRL_S0, .code = 0xf, }, { .name = "PC_port0_rd", .desc = "P-cache cacheable FP loads to the first port (general purpose load path to D-cache and P-cache via MS pipeline)", .ctrl = PME_CTRL_S0, .code = 0x10, }, { .name = "SI_snoop", .desc = "Counts snoops from remote processor(s) including RTS, RTSR, RTO, RTOR, RS, RSR, RTSM, and WS", .ctrl = PME_CTRL_S0, .code = 0x11, }, { .name = "SI_ciq_flow", .desc = "Counts system clock cycles when the flow control (PauseOut) signal is asserted", .ctrl = PME_CTRL_S0, .code = 0x12, }, { .name = "SI_owned", .desc = "Counts events where owned_in is asserted on bus requests from the local processor", .ctrl = PME_CTRL_S0, .code = 0x13, }, { .name = "SW_count0", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S0, .code = 0x14, }, { .name = "IU_Stat_Br_miss_taken", .desc = "Retired branches that were predicted to be taken, but in fact were not taken", .ctrl = PME_CTRL_S0, .code = 0x15, }, { .name = "IU_Stat_Br_Count_taken", .desc = "Retired taken branches", .ctrl = PME_CTRL_S0, .code = 0x16, }, { .name = "Dispatch0_rs_mispred", .desc = "I-buffer is empty due to a Return Address Stack misprediction", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "FA_pipe_completion", .desc = "Instructions that complete execution on the FPG ALU pipelines", .ctrl = PME_CTRL_S0, .code = 0x18, }, /* PIC1 events common to all UltraSPARC-III/III+/IIIi processors */ { .name = "IC_miss_cancelled", .desc = "I-cache misses cancelled due to mis-speculation, recycle, or other events", .ctrl = PME_CTRL_S1, .code = 0x3, }, { .name = "Re_FPU_bypass", .desc = "Stall due to recirculation when an FPU bypass condition that does not have a direct bypass path occurs", .ctrl = PME_CTRL_S1, .code = 0x5, }, { .name = "Re_DC_miss", .desc = "Stall due to loads that miss D-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x6, }, { .name = "Re_EC_miss", .desc = "Stall due to loads that miss L2-cache and get recirculated", .ctrl = PME_CTRL_S1, .code = 0x7, }, { .name = "IC_miss", .desc = "I-cache misses, including fetches from mis-speculated execution paths which are later cancelled", .ctrl = PME_CTRL_S1, .code = 0x8, }, { .name = "DC_rd_miss", .desc = "Recirculated loads that miss the D-cache", .ctrl = PME_CTRL_S1, .code = 0x9, }, { .name = "DC_wr_miss", .desc = "D-cache store accesses that miss D-cache", .ctrl = PME_CTRL_S1, .code = 0xa, }, { .name = "Rstall_FP_use", .desc = "R-stage stall for an event that the next instruction to be executed depends on the result of a preceeding floating-point instruction in the pipeline that is not yet available", .ctrl = PME_CTRL_S1, .code = 0xb, }, { .name = "EC_misses", .desc = "E-cache misses", .ctrl = PME_CTRL_S1, .code = 0xc, }, { .name = "EC_ic_miss", .desc = "L2-cache read misses from I-cache requests", .ctrl = PME_CTRL_S1, .code = 0xf, }, { .name = "Re_PC_miss", .desc = "Stall due to recirculation when a prefetch cache miss occurs on a prefetch predicted second load", .ctrl = PME_CTRL_S1, .code = 0x10, }, { .name = "ITLB_miss", .desc = "I-TLB miss traps taken", .ctrl = PME_CTRL_S1, .code = 0x11, }, { .name = "DTLB_miss", .desc = "Memory reference instructions which trap due to D-TLB miss", .ctrl = PME_CTRL_S1, .code = 0x12, }, { .name = "WC_miss", .desc = "W-cache misses", .ctrl = PME_CTRL_S1, .code = 0x13, }, { .name = "WC_snoop_cb", .desc = "W-cache copybacks generated by a snoop from a remote processor", .ctrl = PME_CTRL_S1, .code = 0x14, }, { .name = "WC_scrubbed", .desc = "W-cache hits to clean lines", .ctrl = PME_CTRL_S1, .code = 0x15, }, { .name = "WC_wb_wo_read", .desc = "W-cache writebacks not requiring a read", .ctrl = PME_CTRL_S1, .code = 0x16, }, { .name = "PC_soft_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a software-prefetch instruction", .ctrl = PME_CTRL_S1, .code = 0x18, }, { .name = "PC_snoop_inv", .desc = "P-cache invalidates that were generated by a snoop from a remote processor and stores by a local processor", .ctrl = PME_CTRL_S1, .code = 0x19, }, { .name = "PC_hard_hit", .desc = "FP loads that hit a P-cache line that was prefetched by a hardware prefetch", .ctrl = PME_CTRL_S1, .code = 0x1a, }, { .name = "PC_port1_rd", .desc = "P-cache cacheable FP loads to the second port (memory and out-of-pipeline instruction execution loads via the A0 and A1 pipelines)", .ctrl = PME_CTRL_S1, .code = 0x1b, }, { .name = "SW_count1", .desc = "Counts software-generated occurrences of 'sethi %hi(0xfc000), %g0' instruction", .ctrl = PME_CTRL_S1, .code = 0x1c, }, { .name = "IU_Stat_Br_miss_untaken", .desc = "Retired branches that were predicted to be untaken, but in fact were taken", .ctrl = PME_CTRL_S1, .code = 0x1d, }, { .name = "IU_Stat_Br_Count_untaken", .desc = "Retired untaken branches", .ctrl = PME_CTRL_S1, .code = 0x1e, }, { .name = "PC_MS_miss", .desc = "FP loads through the MS pipeline that miss P-cache", .ctrl = PME_CTRL_S1, .code = 0x1f, }, { .name = "Re_RAW_miss", .desc = "Stall due to recirculation when there is a load in the E-stage which has a non-bypassable read-after-write hazard with an earlier store instruction", .ctrl = PME_CTRL_S1, .code = 0x26, }, { .name = "FM_pipe_completion", .desc = "Instructions that complete execution on the FPG Multiply pipelines", .ctrl = PME_CTRL_S0, .code = 0x27, }, /* PIC0 memory controller events common to UltraSPARC-III/III+ processors */ { .name = "MC_reads_0", .desc = "Read requests completed to memory bank 0", .ctrl = PME_CTRL_S0, .code = 0x20, }, { .name = "MC_reads_1", .desc = "Read requests completed to memory bank 1", .ctrl = PME_CTRL_S0, .code = 0x21, }, { .name = "MC_reads_2", .desc = "Read requests completed to memory bank 2", .ctrl = PME_CTRL_S0, .code = 0x22, }, { .name = "MC_reads_3", .desc = "Read requests completed to memory bank 3", .ctrl = PME_CTRL_S0, .code = 0x23, }, { .name = "MC_stalls_0", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 0 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x24, }, { .name = "MC_stalls_2", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 2 was busy with a previous request", .ctrl = PME_CTRL_S0, .code = 0x25, }, /* PIC1 memory controller events common to all UltraSPARC-III/III+ processors */ { .name = "MC_writes_0", .desc = "Write requests completed to memory bank 0", .ctrl = PME_CTRL_S1, .code = 0x20, }, { .name = "MC_writes_1", .desc = "Write requests completed to memory bank 1", .ctrl = PME_CTRL_S1, .code = 0x21, }, { .name = "MC_writes_2", .desc = "Write requests completed to memory bank 2", .ctrl = PME_CTRL_S1, .code = 0x22, }, { .name = "MC_writes_3", .desc = "Write requests completed to memory bank 3", .ctrl = PME_CTRL_S1, .code = 0x23, }, { .name = "MC_stalls_1", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 1 was busy with a previous request", .ctrl = PME_CTRL_S1, .code = 0x24, }, { .name = "MC_stalls_3", .desc = "Clock cycles that requests were stalled in the MCU queues because bank 3 was busy with a previous request", .ctrl = PME_CTRL_S1, .code = 0x25, }, }; #define PME_SPARC_ULTRA3_EVENT_COUNT (sizeof(ultra3_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/amd64_events_fam11h.h0000600003276200002170000011517712247131123022155 0ustar ralphundrgrad/* * Copyright (c) 2012 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: amd64_fam11h (AMD64 Fam11h) */ static const amd64_umask_t amd64_fam11h_dispatched_fpu[]={ { .uname = "OPS_ADD", .udesc = "Add pipe ops excluding load ops and SSE move ops", .ucode = 0x1, }, { .uname = "OPS_MULTIPLY", .udesc = "Multiply pipe ops excluding load ops and SSE move ops", .ucode = 0x2, }, { .uname = "OPS_STORE", .udesc = "Store pipe ops excluding load ops and SSE move ops", .ucode = 0x4, }, { .uname = "OPS_ADD_PIPE_LOAD_OPS", .udesc = "Add pipe load ops and SSE move ops", .ucode = 0x8, }, { .uname = "OPS_MULTIPLY_PIPE_LOAD_OPS", .udesc = "Multiply pipe load ops and SSE move ops", .ucode = 0x10, }, { .uname = "OPS_STORE_PIPE_LOAD_OPS", .udesc = "Store pipe load ops and SSE move ops", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_locked_ops[]={ { .uname = "EXECUTED", .udesc = "The number of locked instructions executed", .ucode = 0x1, }, { .uname = "CYCLES_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in speculative phase", .ucode = 0x2, }, { .uname = "CYCLES_NON_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in non-speculative phase (including cache miss penalty)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_data_cache_refills[]={ { .uname = "SYSTEM", .udesc = "Refill from the Northbridge", .ucode = 0x1, }, { .uname = "L2_SHARED", .udesc = "Shared-state line from L2", .ucode = 0x2, }, { .uname = "L2_EXCLUSIVE", .udesc = "Exclusive-state line from L2", .ucode = 0x4, }, { .uname = "L2_OWNED", .udesc = "Owned-state line from L2", .ucode = 0x8, }, { .uname = "L2_MODIFIED", .udesc = "Modified-state line from L2", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_data_cache_refills_from_system[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_data_cache_lines_evicted[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_scrubber_single_bit_ecc_errors[]={ { .uname = "SCRUBBER_ERROR", .udesc = "Scrubber error", .ucode = 0x1, }, { .uname = "PIGGYBACK_ERROR", .udesc = "Piggyback scrubber errors", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_dcache_misses_by_locked_instructions[]={ { .uname = "DATA_CACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .udesc = "Data cache misses by locked instructions", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to write-combining (WC) memory or WC buffer flushes to WB memory", .ucode = 0x2, }, { .uname = "STREAMING_STORE", .udesc = "Streaming store (SS) requests", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x83, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_data_prefetches[]={ { .uname = "CANCELLED", .udesc = "Cancelled prefetches", .ucode = 0x1, }, { .uname = "ATTEMPTED", .udesc = "Prefetch attempts", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_system_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "DATA_ERROR", .udesc = "Data Error", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x17, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_quadwords_written_to_system[]={ { .uname = "QUADWORD_WRITE_TRANSFER", .udesc = "Quadword write transfer", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB fill (page table walks)", .ucode = 0x4, }, { .uname = "SNOOP", .udesc = "Tag snoop request", .ucode = 0x8, }, { .uname = "CANCELLED", .udesc = "Cancelled request", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas EventSelect 041h does not)", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB page table walk", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_l2_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills (victims from L1 caches, TLB page table walks and data prefetches)", .ucode = 0x1, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_retired_mmx_and_fp_instructions[]={ { .uname = "X87", .udesc = "X87 instructions", .ucode = 0x1, }, { .uname = "MMX_AND_3DNOW", .udesc = "MMX and 3DNow! instructions", .ucode = 0x2, }, { .uname = "PACKED_SSE_AND_SSE2", .udesc = "Packed SSE and SSE2 instructions", .ucode = 0x4, }, { .uname = "SCALAR_SSE_AND_SSE2", .udesc = "Scalar SSE and SSE2 instructions", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_retired_fastpath_double_op_instructions[]={ { .uname = "POSITION_0", .udesc = "With low op in position 0", .ucode = 0x1, }, { .uname = "POSITION_1", .udesc = "With low op in position 1", .ucode = 0x2, }, { .uname = "POSITION_2", .udesc = "With low op in position 2", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_interrupt_events[]={ { .uname = "FIXED_AND_LPA", .udesc = "Fixed and LPA", .ucode = 0x1, }, { .uname = "LPA", .udesc = "LPA", .ucode = 0x2, }, { .uname = "SMI", .udesc = "SMI", .ucode = 0x4, }, { .uname = "NMI", .udesc = "NMI", .ucode = 0x8, }, { .uname = "INIT", .udesc = "INIT", .ucode = 0x10, }, { .uname = "STARTUP", .udesc = "STARTUP", .ucode = 0x20, }, { .uname = "INT", .udesc = "INT", .ucode = 0x40, }, { .uname = "EOI", .udesc = "EOI", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_sideband_signals[]={ { .uname = "HALT", .udesc = "HALT", .ucode = 0x1, }, { .uname = "STOPGRANT", .udesc = "STOPGRANT", .ucode = 0x2, }, { .uname = "SHUTDOWN", .udesc = "SHUTDOWN", .ucode = 0x4, }, { .uname = "WBINVD", .udesc = "WBINVD", .ucode = 0x8, }, { .uname = "INVD", .udesc = "INVD", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_fpu_exceptions[]={ { .uname = "X87_RECLASS_MICROFAULTS", .udesc = "X87 reclass microfaults", .ucode = 0x1, }, { .uname = "SSE_RETYPE_MICROFAULTS", .udesc = "SSE retype microfaults", .ucode = 0x2, }, { .uname = "SSE_RECLASS_MICROFAULTS", .udesc = "SSE reclass microfaults", .ucode = 0x4, }, { .uname = "SSE_AND_X87_MICROTRAPS", .udesc = "SSE and x87 microtraps", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_dram_accesses[]={ { .uname = "DCT0_PAGE_HIT", .udesc = "DCT0 Page hit", .ucode = 0x1, }, { .uname = "DCT0_PAGE_MISS", .udesc = "DCT0 Page Miss", .ucode = 0x2, }, { .uname = "DCT0_PAGE_CONFLICT", .udesc = "DCT0 Page Conflict", .ucode = 0x4, }, { .uname = "DCT1_PAGE_HIT", .udesc = "DCT1 Page hit", .ucode = 0x8, }, { .uname = "DCT1_PAGE_MISS", .udesc = "DCT1 Page Miss", .ucode = 0x10, }, { .uname = "DCT1_PAGE_CONFLICT", .udesc = "DCT1 Page Conflict", .ucode = 0x20, }, { .uname = "WRITE_REQUEST", .udesc = "Write request.", .ucode = 0x40, }, { .uname = "READ_REQUEST", .udesc = "Read request.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_dram_controller_page_table_events[]={ { .uname = "DCT_PAGE_TABLE_OVERFLOW", .udesc = "DCT Page Table Overflow", .ucode = 0x1, }, { .uname = "STALE_TABLE_ENTRY_HITS", .udesc = "Number of stale table entry hits. (hit on a page closed too soon).", .ucode = 0x2, }, { .uname = "PAGE_TABLE_IDLE_CYCLE_LIMIT_INCREMENTED", .udesc = "Page table idle cycle limit incremented.", .ucode = 0x4, }, { .uname = "PAGE_TABLE_IDLE_CYCLE_LIMIT_DECREMENTED", .udesc = "Page table idle cycle limit decremented.", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_memory_controller_turnarounds[]={ { .uname = "DCT0_READ_TO_WRITE", .udesc = "DCT0 read-to-write turnaround.", .ucode = 0x1, }, { .uname = "DCT0_WRITE_TO_READ", .udesc = "DCT0 write-to-read turnaround", .ucode = 0x2, }, { .uname = "DCT0_DIMM", .udesc = "DCT0 DIMM (chip select) turnaround", .ucode = 0x4, }, { .uname = "DCT1_READ_TO_WRITE", .udesc = "DCT1 read-to-write turnaround.", .ucode = 0x8, }, { .uname = "DCT1_WRITE_TO_READ", .udesc = "DCT1 write-to-read turnaround", .ucode = 0x10, }, { .uname = "DCT1_DIMM", .udesc = "DCT1 DIMM (chip select) turnaround", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_memory_rbd_queue[]={ { .uname = "COUNTER_REACHED", .udesc = "F2x[1,0]94[DcqBypassMax] counter reached.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x4, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_thermal_status[]={ { .uname = "MEMHOT_L_ASSERTIONS", .udesc = "Number of clocks MEMHOT_L is asserted.", .ucode = 0x1, }, { .uname = "HTC_TRANSITIONS", .udesc = "Number of times the HTC transitions from inactive to active.", .ucode = 0x4, }, { .uname = "CLOCKS_HTC_P_STATE_INACTIVE", .udesc = "Number of clocks HTC P-state is inactive.", .ucode = 0x20, }, { .uname = "CLOCKS_HTC_P_STATE_ACTIVE", .udesc = "Number of clocks HTC P-state is active", .ucode = 0x40, }, { .uname = "PROCHOT_L_ASSERTIONS", .udesc = "PROCHOT_L asserted by an external source and the assertion causes a P-state change.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xe5, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_cpu_io_requests_to_memory_io[]={ { .uname = "I_O_TO_I_O", .udesc = "IO to IO", .ucode = 0xa1, .uflags= AMD64_FL_NCOMBO, }, { .uname = "I_O_TO_MEM", .udesc = "IO to Mem", .ucode = 0xa2, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CPU_TO_I_O", .udesc = "CPU to IO", .ucode = 0xa4, .uflags= AMD64_FL_NCOMBO, }, { .uname = "CPU_TO_MEM", .udesc = "CPU to Mem", .ucode = 0xa8, .uflags= AMD64_FL_NCOMBO, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xaf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_cache_block[]={ { .uname = "VICTIM_WRITEBACK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "DCACHE_LOAD_MISS", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "SHARED_ICACHE_REFILL", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "READ_TO_DIRTY", .udesc = "Change-to-Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_sized_commands[]={ { .uname = "NON_POSTED_WRITE_BYTE", .udesc = "Non-Posted SzWr Byte (1-32 bytes) Legacy or mapped IO, typically 1-4 bytes", .ucode = 0x1, }, { .uname = "NON_POSTED_WRITE_DWORD", .udesc = "Non-Posted SzWr DW (1-16 dwords) Legacy or mapped IO, typically 1 DWORD", .ucode = 0x2, }, { .uname = "POSTED_WRITE_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes) Subcache-line DMA writes, size varies; also flushes of partially-filled Write Combining buffer", .ucode = 0x4, }, { .uname = "POSTED_WRITE_DWORD", .udesc = "Posted SzWr DW (1-16 dwords) Block-oriented DMA writes, often cache-line sized; also processor Write Combining buffer flushes", .ucode = 0x8, }, { .uname = "READ_BYTE_4_BYTES", .udesc = "SzRd Byte (4 bytes) Legacy or mapped IO", .ucode = 0x10, }, { .uname = "READ_DWORD_1_16_DWORDS", .udesc = "SzRd DW (1-16 dwords) Block-oriented DMA reads, typically cache-line size", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_probe[]={ { .uname = "MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "HIT_DIRTY_NO_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_DISPLAY_REFRESH_READS", .udesc = "Upstream display refresh/ISOC reads.", .ucode = 0x10, }, { .uname = "UPSTREAM_NON_DISPLAY_REFRESH_READS", .udesc = "Upstream non-display refresh reads.", .ucode = 0x20, }, { .uname = "UPSTREAM_ISOC_WRITES", .udesc = "Upstream ISOC writes.", .ucode = 0x40, }, { .uname = "UPSTREAM_NON_ISOC_WRITES", .udesc = "Upstream non-ISOC writes.", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_dev[]={ { .uname = "DEV_HIT", .udesc = "DEV hit", .ucode = 0x10, }, { .uname = "DEV_MISS", .udesc = "DEV miss", .ucode = 0x20, }, { .uname = "DEV_ERROR", .udesc = "DEV error", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x70, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_memory_controller_requests[]={ { .uname = "32_BYTES_WRITES", .udesc = "32 Bytes Sized Writes", .ucode = 0x8, }, { .uname = "64_BYTES_WRITES", .udesc = "64 Bytes Sized Writes", .ucode = 0x10, }, { .uname = "32_BYTES_READS", .udesc = "32 Bytes Sized Reads", .ucode = 0x20, }, { .uname = "64_BYTES_READS", .udesc = "64 Byte Sized Reads", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x78, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam11h_hypertransport_link0[]={ { .uname = "COMMAND_DWORD_SENT", .udesc = "Command DWORD sent", .ucode = 0x1, .grpid = 0, }, { .uname = "ADDRESS_DWORD_SENT", .udesc = "Address DWORD sent", .ucode = 0x2, .grpid = 0, }, { .uname = "DATA_DWORD_SENT", .udesc = "Data DWORD sent", .ucode = 0x4, .grpid = 0, }, { .uname = "BUFFER_RELEASE_DWORD_SENT", .udesc = "Buffer release DWORD sent", .ucode = 0x8, .grpid = 0, }, { .uname = "NOP_DWORD_SENT", .udesc = "Nop DW sent (idle)", .ucode = 0x10, .grpid = 0, }, { .uname = "PER_PACKET_CRC_SENT", .udesc = "Per packet CRC sent", .ucode = 0x20, .grpid = 0, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, }; static const amd64_entry_t amd64_fam11h_pe[]={ { .name = "DISPATCHED_FPU", .desc = "Dispatched FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_dispatched_fpu), .ngrp = 1, .umasks = amd64_fam11h_dispatched_fpu, }, { .name = "CYCLES_NO_FPU_OPS_RETIRED", .desc = "Cycles in which the FPU is Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1, }, { .name = "DISPATCHED_FPU_OPS_FAST_FLAG", .desc = "Dispatched Fast Flag FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_segment_register_loads), .ngrp = 1, .umasks = amd64_fam11h_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline Restart Due to Self-Modifying Code", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline Restart Due to Probe Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x22, }, { .name = "LS_BUFFER_2_FULL_CYCLES", .desc = "LS Buffer 2 Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x23, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_locked_ops), .ngrp = 1, .umasks = amd64_fam11h_locked_ops, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x27, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2 or System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_data_cache_refills), .ngrp = 1, .umasks = amd64_fam11h_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_SYSTEM", .desc = "Data Cache Refills from the System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_fam11h_data_cache_refills_from_system, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_data_cache_lines_evicted), .ngrp = 1, .umasks = amd64_fam11h_data_cache_lines_evicted, }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "Number of data cache accesses that miss in L1 DTLB and hit in L2 DTLB", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x45, }, { .name = "L1_DTLB_AND_L2_DTLB_MISS", .desc = "Number of data cache accesses that miss both the L1 and L2 DTLBs", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x46, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x47, }, { .name = "MICROARCHITECTURAL_LATE_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Late Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x48, }, { .name = "MICROARCHITECTURAL_EARLY_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Early Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x49, }, { .name = "SCRUBBER_SINGLE_BIT_ECC_ERRORS", .desc = "Single-bit ECC Errors Recorded by Scrubber", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_scrubber_single_bit_ecc_errors), .ngrp = 1, .umasks = amd64_fam11h_scrubber_single_bit_ecc_errors, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_fam11h_prefetch_instructions_dispatched, }, { .name = "DCACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .desc = "DCACHE Misses by Locked Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_dcache_misses_by_locked_instructions), .ngrp = 1, .umasks = amd64_fam11h_dcache_misses_by_locked_instructions, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_memory_requests), .ngrp = 1, .umasks = amd64_fam11h_memory_requests, }, { .name = "DATA_PREFETCHES", .desc = "Data Prefetcher", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_data_prefetches), .ngrp = 1, .umasks = amd64_fam11h_data_prefetches, }, { .name = "SYSTEM_READ_RESPONSES", .desc = "System Read Responses by Coherency State", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_system_read_responses), .ngrp = 1, .umasks = amd64_fam11h_system_read_responses, }, { .name = "QUADWORDS_WRITTEN_TO_SYSTEM", .desc = "Quawords Written to System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_quadwords_written_to_system), .ngrp = 1, .umasks = amd64_fam11h_quadwords_written_to_system, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x76, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_requests_to_l2), .ngrp = 1, .umasks = amd64_fam11h_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_l2_cache_miss), .ngrp = 1, .umasks = amd64_fam11h_l2_cache_miss, }, { .name = "L2_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_l2_fill_writeback), .ngrp = 1, .umasks = amd64_fam11h_l2_fill_writeback, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss and L2 ITLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x85, }, { .name = "PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE", .desc = "Pipeline Restart Due to Instruction Stream Probe", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x86, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x89, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xca, }, { .name = "RETIRED_MMX_AND_FP_INSTRUCTIONS", .desc = "Retired MMX/FP Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_retired_mmx_and_fp_instructions), .ngrp = 1, .umasks = amd64_fam11h_retired_mmx_and_fp_instructions, }, { .name = "RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS", .desc = "Retired Fastpath Double Op Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_retired_fastpath_double_op_instructions), .ngrp = 1, .umasks = amd64_fam11h_retired_fastpath_double_op_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcf, }, { .name = "DECODER_EMPTY", .desc = "Decoder Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd0, }, { .name = "DISPATCH_STALLS", .desc = "Dispatch Stalls", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd1, }, { .name = "DISPATCH_STALL_FOR_BRANCH_ABORT", .desc = "Dispatch Stall for Branch Abort to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd2, }, { .name = "DISPATCH_STALL_FOR_SERIALIZATION", .desc = "Dispatch Stall for Serialization", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd3, }, { .name = "DISPATCH_STALL_FOR_SEGMENT_LOAD", .desc = "Dispatch Stall for Segment Load", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd4, }, { .name = "DISPATCH_STALL_FOR_REORDER_BUFFER_FULL", .desc = "Dispatch Stall for Reorder Buffer Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd5, }, { .name = "DISPATCH_STALL_FOR_RESERVATION_STATION_FULL", .desc = "Dispatch Stall for Reservation Station Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd6, }, { .name = "DISPATCH_STALL_FOR_FPU_FULL", .desc = "Dispatch Stall for FPU Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd7, }, { .name = "DISPATCH_STALL_FOR_LS_FULL", .desc = "Dispatch Stall for LS Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd8, }, { .name = "DISPATCH_STALL_WAITING_FOR_ALL_QUIET", .desc = "Dispatch Stall Waiting for All Quiet", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd9, }, { .name = "DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RSYNC", .desc = "Dispatch Stall for Far Transfer or Resync to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xda, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_fpu_exceptions), .ngrp = 1, .umasks = amd64_fam11h_fpu_exceptions, }, { .name = "DR0_BREAKPOINT_MATCHES", .desc = "DR0 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINT_MATCHES", .desc = "DR1 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINT_MATCHES", .desc = "DR2 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINT_MATCHES", .desc = "DR3 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdf, }, { .name = "DRAM_ACCESSES", .desc = "DRAM Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_dram_accesses), .ngrp = 1, .umasks = amd64_fam11h_dram_accesses, }, { .name = "DRAM_CONTROLLER_PAGE_TABLE_EVENTS", .desc = "DRAM Controller Page Table Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_dram_controller_page_table_events), .ngrp = 1, .umasks = amd64_fam11h_dram_controller_page_table_events, }, { .name = "MEMORY_CONTROLLER_TURNAROUNDS", .desc = "Memory Controller Turnarounds", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_memory_controller_turnarounds), .ngrp = 1, .umasks = amd64_fam11h_memory_controller_turnarounds, }, { .name = "MEMORY_CONTROLLER_RBD_QUEUE", .desc = "Memory Controller RBD Queue Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_memory_rbd_queue), .ngrp = 1, .umasks = amd64_fam11h_memory_rbd_queue, }, { .name = "THERMAL_STATUS", .desc = "Thermal Status", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_thermal_status), .ngrp = 1, .umasks = amd64_fam11h_thermal_status, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_fam11h_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK", .desc = "Cache Block Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_cache_block), .ngrp = 1, .umasks = amd64_fam11h_cache_block, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_sized_commands), .ngrp = 1, .umasks = amd64_fam11h_sized_commands, }, { .name = "PROBE", .desc = "Probe Responses and Upstream Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_probe), .ngrp = 1, .umasks = amd64_fam11h_probe, }, { .name = "DEV", .desc = "DEV Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_dev), .ngrp = 1, .umasks = amd64_fam11h_dev, }, { .name = "HYPERTRANSPORT_LINK0", .desc = "HyperTransport Link 0 Transmit Bandwidth", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xf6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_hypertransport_link0), .ngrp = 1, .umasks = amd64_fam11h_hypertransport_link0, }, { .name = "MEMORY_CONTROLLER_REQUESTS", .desc = "Memory Controller Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1f0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_memory_controller_requests), .ngrp = 1, .umasks = amd64_fam11h_memory_controller_requests, }, { .name = "SIDEBAND_SIGNALS", .desc = "Sideband Signals and Special Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_sideband_signals), .ngrp = 1, .umasks = amd64_fam11h_sideband_signals, }, { .name = "Interrupt Events", .desc = "Interrupt Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1ea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam11h_interrupt_events), .ngrp = 1, .umasks = amd64_fam11h_interrupt_events, }, }; papi-5.3.0/src/libpfm4/lib/events/intel_nhm_events.h0000600003276200002170000021677412247131123022067 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: nhm (Intel Nehalem) */ static const intel_x86_umask_t nhm_arith[]={ { .uname = "CYCLES_DIV_BUSY", .udesc = "Counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE.", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIV", .udesc = "Counts the number of divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE.", .uequiv = "CYCLES_DIV_BUSY:c=1:i=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MUL", .udesc = "Counts the number of multiply operations executed. This includes integer as well as floating point multiply operations but excludes DPPS mul and MPSAD.", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_baclear[]={ { .uname = "BAD_TARGET", .udesc = "BACLEAR asserted with bad target address", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CLEAR", .udesc = "BACLEAR asserted, regardless of cause", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_bpu_clears[]={ { .uname = "EARLY", .udesc = "Early Branch Prediciton Unit clears", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LATE", .udesc = "Late Branch Prediction Unit clears", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "Count any Branch Prediction Unit clears", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_br_inst_exec[]={ { .uname = "ANY", .udesc = "Branch instructions executed", .ucode = 0x7f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "COND", .udesc = "Conditional branch instructions executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT", .udesc = "Unconditional branches executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT_NEAR_CALL", .udesc = "Unconditional call branches executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NEAR_CALL", .udesc = "Indirect call branches executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NON_CALL", .udesc = "Indirect non call branches executed", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NEAR_CALLS", .udesc = "Call branches executed", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NON_CALLS", .udesc = "All non call branches executed", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RETURN_NEAR", .udesc = "Indirect return branches executed", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN", .udesc = "Taken branches executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_br_inst_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "Retired branch instructions (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "Retired conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "Retired near call instructions (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_br_misp_exec[]={ { .uname = "ANY", .udesc = "Mispredicted branches executed", .ucode = 0x7f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "COND", .udesc = "Mispredicted conditional branches executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT", .udesc = "Mispredicted unconditional branches executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT_NEAR_CALL", .udesc = "Mispredicted non call branches executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NEAR_CALL", .udesc = "Mispredicted indirect call branches executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NON_CALL", .udesc = "Mispredicted indirect non call branches executed", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NEAR_CALLS", .udesc = "Mispredicted call branches executed", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NON_CALLS", .udesc = "Mispredicted non call branches executed", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RETURN_NEAR", .udesc = "Mispredicted return branches executed", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN", .udesc = "Mispredicted taken branches executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_br_misp_retired[]={ { .uname = "NEAR_CALL", .udesc = "Counts mispredicted direct and indirect near unconditional retired calls", .ucode = 0x200, .uflags= INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_cache_lock_cycles[]={ { .uname = "L1D", .udesc = "Cycles L1D locked", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_L2", .udesc = "Cycles L1D and L2 locked", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_cpu_clk_unhalted[]={ { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted (programmable counter)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "REF_P", .udesc = "Reference base clock (133 Mhz) cycles when thread is not halted", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TOTAL_CYCLES", .udesc = "Total number of elapsed cycles. Does not work when C-state enabled", .uequiv = "THREAD_P:c=2:i=1", .ucode = 0x0 | INTEL_X86_MOD_INV | (0x2 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_dtlb_load_misses[]={ { .uname = "ANY", .udesc = "DTLB load misses", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PDE_MISS", .udesc = "DTLB load miss caused by low part of address", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "DTLB load miss page walks complete", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "DTLB second level hit", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PDP_MISS", .udesc = "Number of DTLB cache load misses where the high part of the linear to physical address translation was missed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LARGE_WALK_COMPLETED", .udesc = "Counts number of completed large page walks due to load miss in the STLB", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_dtlb_misses[]={ { .uname = "ANY", .udesc = "DTLB misses", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "STLB_HIT", .udesc = "DTLB first level misses but second level hit", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "DTLB miss page walks", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PDE_MISS", .udesc = "Number of DTLB cache misses where the low part of the linear to physical address translation was missed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PDP_MISS", .udesc = "Number of DTLB misses where the high part of the linear to physical address translation was missed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LARGE_WALK_COMPLETED", .udesc = "Counts number of completed large page walks due to misses in the STLB", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_ept[]={ { .uname = "EPDE_MISS", .udesc = "Extended Page Directory Entry miss", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "EPDPE_MISS", .udesc = "Extended Page Directory Pointer miss", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "EPDPE_HIT", .udesc = "Extended Page Directory Pointer hit", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_fp_assist[]={ { .uname = "ALL", .udesc = "Floating point assists (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "INPUT", .udesc = "Floating poiint assists for invalid input value (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "OUTPUT", .udesc = "Floating point assists for invalid output value (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_fp_comp_ops_exe[]={ { .uname = "MMX", .udesc = "MMX Uops", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_DOUBLE_PRECISION", .udesc = "SSE* FP double precision Uops", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP", .udesc = "SSE and SSE2 FP Uops", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_PACKED", .udesc = "SSE FP packed Uops", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_SCALAR", .udesc = "SSE FP scalar Uops", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SINGLE_PRECISION", .udesc = "SSE* FP single precision Uops", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE2_INTEGER", .udesc = "SSE2 integer Uops", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87", .udesc = "Computational floating-point operations executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_fp_mmx_trans[]={ { .uname = "ANY", .udesc = "All Floating Point to and from MMX transitions", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "TO_FP", .udesc = "Transitions from MMX to Floating Point instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TO_MMX", .udesc = "Transitions from Floating Point to MMX instructions", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_ifu_ivc[]={ { .uname = "FULL", .udesc = "Instruction Fetche unit victim cache full", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1I_EVICTION", .udesc = "L1 Instruction cache evictions", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_ild_stall[]={ { .uname = "ANY", .udesc = "Any Instruction Length Decoder stall cycles", .uequiv = "IQ_FULL:LCP:MRU:REGEN", .ucode = 0xf00, .uflags= INTEL_X86_DFL, }, { .uname = "IQ_FULL", .udesc = "Instruction Queue full stall cycles", .ucode = 0x400, }, { .uname = "LCP", .udesc = "Length Change Prefix stall cycles", .ucode = 0x100, }, { .uname = "MRU", .udesc = "Stall cycles due to BPU MRU bypass", .ucode = 0x200, }, { .uname = "REGEN", .udesc = "Regen stall cycles", .ucode = 0x800, }, }; static const intel_x86_umask_t nhm_inst_decoded[]={ { .uname = "DEC0", .udesc = "Instructions that must be decoded by decoder 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_inst_retired[]={ { .uname = "ANY_P", .udesc = "Instructions Retired (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "X87", .udesc = "Retired floating-point operations (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_l1d[]={ { .uname = "M_EVICT", .udesc = "L1D cache lines replaced in M state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_REPL", .udesc = "L1D cache lines allocated in the M state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_SNOOP_EVICT", .udesc = "L1D snoop eviction of cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REPL", .udesc = "L1 data cache lines allocated", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l1d_all_ref[]={ { .uname = "ANY", .udesc = "All references to the L1 data cache", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, { .uname = "CACHEABLE", .udesc = "L1 data cacheable reads and writes", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l1d_cache_ld[]={ { .uname = "E_STATE", .udesc = "L1 data cache read in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I_STATE", .udesc = "L1 data cache read in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_STATE", .udesc = "L1 data cache read in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MESI", .udesc = "L1 data cache reads", .ucode = 0xf00, .uflags= INTEL_X86_DFL, }, { .uname = "S_STATE", .udesc = "L1 data cache read in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l1d_cache_lock[]={ { .uname = "E_STATE", .udesc = "L1 data cache load locks in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HIT", .udesc = "L1 data cache load lock hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_STATE", .udesc = "L1 data cache load locks in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L1 data cache load locks in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l1d_cache_st[]={ { .uname = "E_STATE", .udesc = "L1 data cache stores in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I_STATE", .udesc = "L1 data cache store in the I state", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_STATE", .udesc = "L1 data cache stores in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L1 data cache stores in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MESI", .udesc = "L1 data cache store in all states", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_l1d_prefetch[]={ { .uname = "MISS", .udesc = "L1D hardware prefetch misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REQUESTS", .udesc = "L1D hardware prefetch requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TRIGGERS", .udesc = "L1D hardware prefetch requests triggered", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l1d_wb_l2[]={ { .uname = "E_STATE", .udesc = "L1 writebacks to L2 in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I_STATE", .udesc = "L1 writebacks to L2 in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_STATE", .udesc = "L1 writebacks to L2 in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L1 writebacks to L2 in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MESI", .udesc = "All L1 writebacks to L2", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_l1i[]={ { .uname = "CYCLES_STALLED", .udesc = "L1I instruction fetch stall cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITS", .udesc = "L1I instruction fetch hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISSES", .udesc = "L1I instruction fetch misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "READS", .udesc = "L1I Instruction fetches", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_data_rqsts[]={ { .uname = "ANY", .udesc = "All L2 data requests", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "DEMAND_E_STATE", .udesc = "L2 data demand loads in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_I_STATE", .udesc = "L2 data demand loads in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_M_STATE", .udesc = "L2 data demand loads in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_MESI", .udesc = "L2 data demand requests", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_S_STATE", .udesc = "L2 data demand loads in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_E_STATE", .udesc = "L2 data prefetches in E state", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_I_STATE", .udesc = "L2 data prefetches in the I state (misses)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_M_STATE", .udesc = "L2 data prefetches in M state", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_MESI", .udesc = "All L2 data prefetches", .ucode = 0xf000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_S_STATE", .udesc = "L2 data prefetches in the S state", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_hw_prefetch[]={ { .uname = "HIT", .udesc = "Count L2 HW prefetcher detector hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALLOC", .udesc = "Count L2 HW prefetcher allocations", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DATA_TRIGGER", .udesc = "Count L2 HW data prefetcher triggered", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_TRIGGER", .udesc = "Count L2 HW code prefetcher triggered", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DCA_TRIGGER", .udesc = "Count L2 HW DCA prefetcher triggered", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "KICK_START", .udesc = "Count L2 HW prefetcher kick started", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_lines_in[]={ { .uname = "ANY", .udesc = "L2 lines alloacated", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "E_STATE", .udesc = "L2 lines allocated in the E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L2 lines allocated in the S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_lines_out[]={ { .uname = "ANY", .udesc = "L2 lines evicted", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "DEMAND_CLEAN", .udesc = "L2 lines evicted by a demand request", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "L2 modified lines evicted by a demand request", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_CLEAN", .udesc = "L2 lines evicted by a prefetch request", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_DIRTY", .udesc = "L2 modified lines evicted by a prefetch request", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_rqsts[]={ { .uname = "MISS", .udesc = "All L2 misses", .ucode = 0xaa00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCES", .udesc = "All L2 requests", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH_HIT", .udesc = "L2 instruction fetch hits", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH_MISS", .udesc = "L2 instruction fetch misses", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCHES", .udesc = "L2 instruction fetches", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LD_HIT", .udesc = "L2 load hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LD_MISS", .udesc = "L2 load misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOADS", .udesc = "L2 requests", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_HIT", .udesc = "L2 prefetch hits", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_MISS", .udesc = "L2 prefetch misses", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCHES", .udesc = "All L2 prefetches", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HIT", .udesc = "L2 RFO hits", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MISS", .udesc = "L2 RFO misses", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFOS", .udesc = "L2 RFO requests", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_transactions[]={ { .uname = "ANY", .udesc = "All L2 transactions", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FILL", .udesc = "L2 fill transactions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH", .udesc = "L2 instruction fetch transactions", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writeback to L2 transactions", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOAD", .udesc = "L2 Load transactions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH", .udesc = "L2 prefetch transactions", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "L2 RFO transactions", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WB", .udesc = "L2 writeback to LLC transactions", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_l2_write[]={ { .uname = "LOCK_E_STATE", .udesc = "L2 demand lock RFOs in E state", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_I_STATE", .udesc = "L2 demand lock RFOs in I state (misses)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_S_STATE", .udesc = "L2 demand lock RFOs in S state", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_HIT", .udesc = "All demand L2 lock RFOs that hit the cache", .ucode = 0xe000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_M_STATE", .udesc = "L2 demand lock RFOs in M state", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_MESI", .udesc = "All demand L2 lock RFOs", .ucode = 0xf000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HIT", .udesc = "All L2 demand store RFOs that hit the cache", .ucode = 0xe00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_I_STATE", .udesc = "L2 demand store RFOs in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_E_STATE", .udesc = "L2 demand store RFOs in the E state (exclusive)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_M_STATE", .udesc = "L2 demand store RFOs in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MESI", .udesc = "All L2 demand store RFOs", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_S_STATE", .udesc = "L2 demand store RFOs in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_large_itlb[]={ { .uname = "HIT", .udesc = "Large ITLB hit", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_load_dispatch[]={ { .uname = "ANY", .udesc = "All loads dispatched", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "MOB", .udesc = "Loads dispatched from the MOB", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RS", .udesc = "Loads dispatched that bypass the MOB", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RS_DELAYED", .udesc = "Loads dispatched from stage 305", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_longest_lat_cache[]={ { .uname = "REFERENCE", .udesc = "Longest latency cache reference", .ucode = 0x4f00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "Longest latency cache miss", .ucode = 0x4100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_lsd[]={ { .uname = "ACTIVE", .udesc = "Cycles when uops were delivered by the LSD", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "INACTIVE", .udesc = "Cycles no uops were delivered by the LSD", .uequiv = "ACTIVE:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), }, }; static const intel_x86_umask_t nhm_machine_clears[]={ { .uname = "SMC", .udesc = "Self-Modifying Code detected", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES", .udesc = "Cycles machine clear asserted", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MEM_ORDER", .udesc = "Execution pipeline restart due to Memory ordering conflicts", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FUSION_ASSIST", .udesc = "Counts the number of macro-fusion assists", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_macro_insts[]={ { .uname = "DECODED", .udesc = "Instructions decoded", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FUSIONS_DECODED", .udesc = "Macro-fused instructions decoded", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_memory_disambiguation[]={ { .uname = "RESET", .udesc = "Counts memory disambiguation reset cycles", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WATCHDOG", .udesc = "Counts the number of times the memory disambiguation watchdog kicked in", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WATCH_CYCLES", .udesc = "Counts the cycles that the memory disambiguation watchdog is active", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_mem_inst_retired[]={ { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory instructions retired above programmed clocks, minimum threshold value is 3, (Precise Event and ldlat required)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT, }, { .uname = "LOADS", .udesc = "Instructions retired which contains a load (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STORES", .udesc = "Instructions retired which contains a store (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_mem_load_retired[]={ { .uname = "DTLB_MISS", .udesc = "Retired loads that miss the DTLB (Precise Event)", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "HIT_LFB", .udesc = "Retired loads that miss L1D and hit an previously allocated LFB (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1D_HIT", .udesc = "Retired loads that hit the L1 data cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Retired loads that hit the L2 cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Retired loads that miss the L3 cache (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LLC_MISS", .udesc = "This is an alias for L3_MISS", .uequiv = "L3_MISS", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_UNSHARED_HIT", .udesc = "Retired loads that hit valid versions in the L3 cache (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LLC_UNSHARED_HIT", .udesc = "This is an alias for L3_UNSHARED_HIT", .uequiv = "L3_UNSHARED_HIT", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "OTHER_CORE_L2_HIT_HITM", .udesc = "Retired loads that hit sibling core's L2 in modified or unmodified states (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_mem_store_retired[]={ { .uname = "DTLB_MISS", .udesc = "Retired stores that miss the DTLB (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_mem_uncore_retired[]={ { .uname = "OTHER_CORE_L2_HITM", .udesc = "Load instructions retired that HIT modified data in sibling core (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_CACHE_LOCAL_HOME_HIT", .udesc = "Load instructions retired remote cache HIT data source (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_DRAM", .udesc = "Load instructions retired remote DRAM and remote home-remote cache HITM (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCAL_DRAM", .udesc = "Load instructions retired with a data source of local DRAM or locally homed remote hitm (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_DATA_MISS_UNKNOWN", .udesc = "Load instructions retired where the memory reference missed L3 and data source is unknown (Model 46 only, Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_NHM_EX, }, { .uname = "UNCACHEABLE", .udesc = "Load instructions retired where the memory reference missed L1, L2, L3 caches and to perform I/O (Model 46 only, Precise Event)", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_NHM_EX, }, }; static const intel_x86_umask_t nhm_offcore_requests[]={ { .uname = "ANY", .udesc = "All offcore requests", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY_READ", .udesc = "Offcore read requests", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_RFO", .udesc = "Offcore RFO requests", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_CODE", .udesc = "Counts number of offcore demand code read requests. Does not count L2 prefetch requests.", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_DATA", .udesc = "Offcore demand data read requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Offcore demand RFO requests", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WRITEBACK", .udesc = "Offcore L1 data cache writebacks", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNCACHED_MEM", .udesc = "Counts number of offcore uncached memory requests", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_pic_accesses[]={ { .uname = "TPR_READS", .udesc = "Counts number of TPR reads", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TPR_WRITES", .udesc = "Counts number of TPR writes", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_rat_stalls[]={ { .uname = "FLAGS", .udesc = "Flag stall cycles", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REGISTERS", .udesc = "Partial register stall cycles", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ROB_READ_PORT", .udesc = "ROB read port stalls cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SCOREBOARD", .udesc = "Scoreboard stall cycles", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "All RAT stall cycles", .ucode = 0xf00, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_resource_stalls[]={ { .uname = "FPCW", .udesc = "FPU control word write stall cycles", .ucode = 0x2000, }, { .uname = "LOAD", .udesc = "Load buffer stall cycles", .ucode = 0x200, }, { .uname = "MXCSR", .udesc = "MXCSR rename stall cycles", .ucode = 0x4000, }, { .uname = "RS_FULL", .udesc = "Reservation Station full stall cycles", .ucode = 0x400, }, { .uname = "STORE", .udesc = "Store buffer stall cycles", .ucode = 0x800, }, { .uname = "OTHER", .udesc = "Other Resource related stall cycles", .ucode = 0x8000, }, { .uname = "ROB_FULL", .udesc = "ROB full stall cycles", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "Resource related stall cycles", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_simd_int_128[]={ { .uname = "PACK", .udesc = "128 bit SIMD integer pack operations", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_ARITH", .udesc = "128 bit SIMD integer arithmetic operations", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_LOGICAL", .udesc = "128 bit SIMD integer logical operations", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_MPY", .udesc = "128 bit SIMD integer multiply operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_SHIFT", .udesc = "128 bit SIMD integer shift operations", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHUFFLE_MOVE", .udesc = "128 bit SIMD integer shuffle/move operations", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK", .udesc = "128 bit SIMD integer unpack operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_simd_int_64[]={ { .uname = "PACK", .udesc = "SIMD integer 64 bit pack operations", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_ARITH", .udesc = "SIMD integer 64 bit arithmetic operations", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_LOGICAL", .udesc = "SIMD integer 64 bit logical operations", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_MPY", .udesc = "SIMD integer 64 bit packed multiply operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_SHIFT", .udesc = "SIMD integer 64 bit shift operations", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHUFFLE_MOVE", .udesc = "SIMD integer 64 bit shuffle/move operations", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK", .udesc = "SIMD integer 64 bit unpack operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_snoop_response[]={ { .uname = "HIT", .udesc = "Thread responded HIT to snoop", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITE", .udesc = "Thread responded HITE to snoop", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITM", .udesc = "Thread responded HITM to snoop", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_sq_misc[]={ { .uname = "PROMOTION", .udesc = "Counts the number of L2 secondary misses that hit the Super Queue", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PROMOTION_POST_GO", .udesc = "Counts the number of L2 secondary misses during the Super Queue filling L2", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LRU_HINTS", .udesc = "Counts number of Super Queue LRU hints sent to L3", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "FILL_DROPPED", .udesc = "Counts the number of SQ L2 fills dropped due to L2 busy", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SPLIT_LOCK", .udesc = "Super Queue lock splits across a cache line", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_sse_mem_exec[]={ { .uname = "NTA", .udesc = "Streaming SIMD L1D NTA prefetch miss", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t nhm_ssex_uops_retired[]={ { .uname = "PACKED_DOUBLE", .udesc = "SIMD Packed-Double Uops retired (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "PACKED_SINGLE", .udesc = "SIMD Packed-Single Uops retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SCALAR_DOUBLE", .udesc = "SIMD Scalar-Double Uops retired (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SCALAR_SINGLE", .udesc = "SIMD Scalar-Single Uops retired (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "VECTOR_INTEGER", .udesc = "SIMD Vector Integer Uops retired (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_store_blocks[]={ { .uname = "AT_RET", .udesc = "Loads delayed with at-Retirement block code", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_BLOCK", .udesc = "Cacheable loads delayed with L1D block code", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NOT_STA", .udesc = "Loads delayed due to a store blocked for unknown data", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STA", .udesc = "Loads delayed due to a store blocked for an unknown address", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_uops_decoded[]={ { .uname = "ESP_FOLDING", .udesc = "Stack pointer instructions decoded", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ESP_SYNC", .udesc = "Stack pointer sync operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS", .udesc = "Uops decoded by Microcode Sequencer", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_CYCLES_ACTIVE", .udesc = "Cycles in which at least one uop is decoded by Microcode Sequencer", .uequiv = "MS:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_uops_executed[]={ { .uname = "PORT0", .udesc = "Uops executed on port 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT1", .udesc = "Uops executed on port 1", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT2_CORE", .udesc = "Uops executed on port 2 on any thread (core count only)", .ucode = 0x400 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT3_CORE", .udesc = "Uops executed on port 3 on any thread (core count only)", .ucode = 0x800 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT4_CORE", .udesc = "Uops executed on port 4 on any thread (core count only)", .ucode = 0x1000 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT5", .udesc = "Uops executed on port 5", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT015", .udesc = "Uops issued on ports 0, 1 or 5", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT234_CORE", .udesc = "Uops issued on ports 2, 3 or 4 on any thread (core count only)", .ucode = 0x8000 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT015_STALL_CYCLES", .udesc = "Cycles no Uops issued on ports 0, 1 or 5", .uequiv = "PORT015:c=1:i=1", .ucode = 0x4000 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_uops_issued[]={ { .uname = "ANY", .udesc = "Uops issued", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "STALLED_CYCLES", .udesc = "Cycles stalled no issued uops", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "FUSED", .udesc = "Fused Uops issued", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t nhm_uops_retired[]={ { .uname = "ANY", .udesc = "Uops retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "RETIRE_SLOTS", .udesc = "Retirement slots used (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ACTIVE_CYCLES", .udesc = "Cycles Uops are being retired (Precise Event)", .uequiv = "ANY:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles No Uops retired (Precise Event)", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "MACRO_FUSED", .udesc = "Macro-fused Uops retired (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t nhm_offcore_response_0[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: counts the number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 0x100, .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: counts the number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO", .ucode = 0x200, .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: counts the number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 0x400, .grpid = 0, }, { .uname = "WB", .udesc = "Request: counts the number of writeback (modified to exclusive) transactions", .ucode = 0x800, .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: counts the number of data cacheline reads generated by L2 prefetchers", .ucode = 0x1000, .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: counts the number of RFO requests generated by L2 prefetchers", .ucode = 0x2000, .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: counts the number of code reads generated by L2 prefetchers", .ucode = 0x4000, .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 0x8000, .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH", .ucode = 0x4400, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all requests umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:OTHER", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: any data read/write request", .uequiv = "DMND_DATA_RD:PF_DATA_RD:DMND_RFO:PF_RFO", .ucode = 0x3300, .grpid = 0, }, { .uname = "ANY_DATA_RD", .udesc = "Request: any data read in request", .uequiv = "DMND_DATA_RD:PF_DATA_RD", .ucode = 0x1100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO", .uequiv = "DMND_RFO:PF_RFO", .ucode = 0x2200, .grpid = 0, }, { .uname = "UNCORE_HIT", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore with no coherency actions required (snooping)", .ucode = 0x10000, .grpid = 1, }, { .uname = "OTHER_CORE_HIT_SNP", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by another core with a cross core snoop where no modified copies were found (clean)", .ucode = 0x20000, .grpid = 1, }, { .uname = "OTHER_CORE_HITM", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by another core with a cross core snoop where modified copies were found (HITM)", .ucode = 0x40000, .grpid = 1, }, { .uname = "REMOTE_CACHE_HITM", .udesc = "Response: counts L3 Hit: local or remote home requests that hit a remote L3 cacheline in modified (HITM) state", .ucode = 0x80000, .grpid = 1, }, { .uname = "REMOTE_CACHE_FWD", .udesc = "Response: counts L3 Miss: local homed requests that missed the L3 cache and was serviced by forwarded data following a cross package snoop where no modified copies found. (Remote home requests are not counted)", .ucode = 0x100000, .grpid = 1, }, { .uname = "REMOTE_DRAM", .udesc = "Response: counts L3 Miss: remote home requests that missed the L3 cache and were serviced by remote DRAM", .ucode = 0x200000, .grpid = 1, }, { .uname = "LOCAL_DRAM", .udesc = "Response: counts L3 Miss: local home requests that missed the L3 cache and were serviced by local DRAM", .ucode = 0x400000, .grpid = 1, }, { .uname = "NON_DRAM", .udesc = "Response: Non-DRAM requests that were serviced by IOH", .ucode = 0x800000, .grpid = 1, }, { .uname = "ANY_CACHE_DRAM", .udesc = "Response: requests serviced by any source but IOH", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_FWD:REMOTE_CACHE_HITM:REMOTE_DRAM:LOCAL_DRAM", .ucode = 0x7f0000, .grpid = 1, }, { .uname = "ANY_DRAM", .udesc = "Response: requests serviced by local or remote DRAM", .uequiv = "REMOTE_DRAM:LOCAL_DRAM", .ucode = 0x600000, .grpid = 1, }, { .uname = "ANY_LLC_MISS", .udesc = "Response: requests that missed in L3", .uequiv = "REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM:LOCAL_DRAM:NON_DRAM", .ucode = 0xf80000, .grpid = 1, }, { .uname = "LOCAL_CACHE_DRAM", .udesc = "Response: requests hit local core or uncore caches or local DRAM", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:LOCAL_DRAM", .ucode = 0x470000, .grpid = 1, }, { .uname = "REMOTE_CACHE_DRAM", .udesc = "Response: requests that miss L3 and hit remote caches or DRAM", .uequiv = "REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM", .ucode = 0x380000, .grpid = 1, }, { .uname = "ANY_RESPONSE", .udesc = "Response: combination of all response umasks", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_FWD:REMOTE_CACHE_HITM:REMOTE_DRAM:LOCAL_DRAM:NON_DRAM", .ucode = 0xff0000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, }; static const intel_x86_entry_t intel_nhm_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "INSTRUCTION_RETIRED", .desc = "Count the number of instructions at retirement", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "LLC_REFERENCES", .desc = "Count each request originating equiv the core to reference a cache line in the last level cache. The count may include speculation, but excludes cache line fills due to hardware prefetch. Alias to L2_RQSTS:SELF_DEMAND_MESI", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x4f2e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for LLC_REFERENCES", .modmsk = INTEL_V3_ATTRS, .equiv = "LLC_REFERENCES", .cntmsk = 0xf, .code = 0x4f2e, }, { .name = "LLC_MISSES", .desc = "Count each cache miss condition for references to the last level cache. The event count may include speculation, but excludes cache line fills due to hardware prefetch. Alias to event L2_RQSTS:SELF_DEMAND_I_STATE", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an equiv for LLC_MISSES", .modmsk = INTEL_V3_ATTRS, .equiv = "LLC_MISSES", .cntmsk = 0xf, .code = 0x412e, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction.", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xf, .code = 0xc4, }, { .name = "ARITH", .desc = "Counts arithmetic multiply and divide operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x14, .numasks = LIBPFM_ARRAY_SIZE(nhm_arith), .ngrp = 1, .umasks = nhm_arith, }, { .name = "BACLEAR", .desc = "Branch address calculator", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(nhm_baclear), .ngrp = 1, .umasks = nhm_baclear, }, { .name = "BACLEAR_FORCE_IQ", .desc = "Instruction queue forced BACLEAR", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1a7, }, { .name = "BOGUS_BR", .desc = "Counts the number of bogus branches.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1e4, }, { .name = "BPU_CLEARS", .desc = "Branch prediction Unit clears", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(nhm_bpu_clears), .ngrp = 1, .umasks = nhm_bpu_clears, }, { .name = "BPU_MISSED_CALL_RET", .desc = "Branch prediction unit missed call or return", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1e5, }, { .name = "BR_INST_DECODED", .desc = "Branch instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1e0, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x88, .numasks = LIBPFM_ARRAY_SIZE(nhm_br_inst_exec), .ngrp = 1, .umasks = nhm_br_inst_exec, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_br_inst_retired), .ngrp = 1, .umasks = nhm_br_inst_retired, }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x89, .numasks = LIBPFM_ARRAY_SIZE(nhm_br_misp_exec), .ngrp = 1, .umasks = nhm_br_misp_exec, }, { .name = "BR_MISP_RETIRED", .desc = "Count Mispredicted Branch Activity", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc5, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_br_misp_retired), .ngrp = 1, .umasks = nhm_br_misp_retired, }, { .name = "CACHE_LOCK_CYCLES", .desc = "Cache lock cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(nhm_cache_lock_cycles), .ngrp = 1, .umasks = nhm_cache_lock_cycles, }, { .name = "CPU_CLK_UNHALTED", .desc = "Cycles when processor is not in halted state", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(nhm_cpu_clk_unhalted), .ngrp = 1, .umasks = nhm_cpu_clk_unhalted, }, { .name = "DTLB_LOAD_MISSES", .desc = "Data TLB load misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(nhm_dtlb_load_misses), .ngrp = 1, .umasks = nhm_dtlb_load_misses, }, { .name = "DTLB_MISSES", .desc = "Data TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(nhm_dtlb_misses), .ngrp = 1, .umasks = nhm_dtlb_misses, }, { .name = "EPT", .desc = "Extended Page Directory", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x4f, .numasks = LIBPFM_ARRAY_SIZE(nhm_ept), .ngrp = 1, .umasks = nhm_ept, }, { .name = "ES_REG_RENAMES", .desc = "ES segment renames", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1d5, }, { .name = "FP_ASSIST", .desc = "Floating point assists", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf7, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_fp_assist), .ngrp = 1, .umasks = nhm_fp_assist, }, { .name = "FP_COMP_OPS_EXE", .desc = "Floating poing computational micro-ops", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(nhm_fp_comp_ops_exe), .ngrp = 1, .umasks = nhm_fp_comp_ops_exe, }, { .name = "FP_MMX_TRANS", .desc = "Floating Point to and from MMX transitions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(nhm_fp_mmx_trans), .ngrp = 1, .umasks = nhm_fp_mmx_trans, }, { .name = "IFU_IVC", .desc = "Instruction Fetch unit victim cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x81, .numasks = LIBPFM_ARRAY_SIZE(nhm_ifu_ivc), .ngrp = 1, .umasks = nhm_ifu_ivc, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(nhm_ild_stall), .ngrp = 1, .umasks = nhm_ild_stall, }, { .name = "INST_DECODED", .desc = "Instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x18, .numasks = LIBPFM_ARRAY_SIZE(nhm_inst_decoded), .ngrp = 1, .umasks = nhm_inst_decoded, }, { .name = "INST_QUEUE_WRITES", .desc = "Instructions written to instruction queue.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x117, }, { .name = "INST_QUEUE_WRITE_CYCLES", .desc = "Cycles instructions are written to the instruction queue", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x11e, }, { .name = "INST_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_inst_retired), .ngrp = 1, .umasks = nhm_inst_retired, }, { .name = "IO_TRANSACTIONS", .desc = "I/O transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x16c, }, { .name = "ITLB_FLUSH", .desc = "Counts the number of ITLB flushes", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1ae, }, { .name = "ITLB_MISSES", .desc = "Instruction TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(nhm_dtlb_misses), .ngrp = 1, .umasks = nhm_dtlb_misses, /* identical to actual umasks list for this event */ }, { .name = "ITLB_MISS_RETIRED", .desc = "Retired instructions that missed the ITLB (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x20c8, .flags= INTEL_X86_PEBS, }, { .name = "L1D", .desc = "L1D cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x51, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d), .ngrp = 1, .umasks = nhm_l1d, }, { .name = "L1D_ALL_REF", .desc = "L1D references", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_all_ref), .ngrp = 1, .umasks = nhm_l1d_all_ref, }, { .name = "L1D_CACHE_LD", .desc = "L1D cacheable loads. WARNING: event may overcount loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_cache_ld), .ngrp = 1, .umasks = nhm_l1d_cache_ld, }, { .name = "L1D_CACHE_LOCK", .desc = "L1 data cache load lock", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_cache_lock), .ngrp = 1, .umasks = nhm_l1d_cache_lock, }, { .name = "L1D_CACHE_LOCK_FB_HIT", .desc = "L1D load lock accepted in fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x153, }, { .name = "L1D_CACHE_PREFETCH_LOCK_FB_HIT", .desc = "L1D prefetch load lock accepted in fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x152, }, { .name = "L1D_CACHE_ST", .desc = "L1 data cache stores", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_cache_st), .ngrp = 1, .umasks = nhm_l1d_cache_st, }, { .name = "L1D_PREFETCH", .desc = "L1D hardware prefetch", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x4e, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_prefetch), .ngrp = 1, .umasks = nhm_l1d_prefetch, }, { .name = "L1D_WB_L2", .desc = "L1 writebacks to L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1d_wb_l2), .ngrp = 1, .umasks = nhm_l1d_wb_l2, }, { .name = "L1I", .desc = "L1I instruction fetches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(nhm_l1i), .ngrp = 1, .umasks = nhm_l1i, }, { .name = "L1I_OPPORTUNISTIC_HITS", .desc = "Opportunistic hits in streaming", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x183, }, { .name = "L2_DATA_RQSTS", .desc = "L2 data requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_data_rqsts), .ngrp = 1, .umasks = nhm_l2_data_rqsts, }, { .name = "L2_HW_PREFETCH", .desc = "L2 HW prefetches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf3, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_hw_prefetch), .ngrp = 1, .umasks = nhm_l2_hw_prefetch, }, { .name = "L2_LINES_IN", .desc = "L2 lines alloacated", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf1, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_lines_in), .ngrp = 1, .umasks = nhm_l2_lines_in, }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf2, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_lines_out), .ngrp = 1, .umasks = nhm_l2_lines_out, }, { .name = "L2_RQSTS", .desc = "L2 requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_rqsts), .ngrp = 1, .umasks = nhm_l2_rqsts, }, { .name = "L2_TRANSACTIONS", .desc = "L2 transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf0, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_transactions), .ngrp = 1, .umasks = nhm_l2_transactions, }, { .name = "L2_WRITE", .desc = "L2 demand lock/store RFO", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(nhm_l2_write), .ngrp = 1, .umasks = nhm_l2_write, }, { .name = "LARGE_ITLB", .desc = "Large instruction TLB", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x82, .numasks = LIBPFM_ARRAY_SIZE(nhm_large_itlb), .ngrp = 1, .umasks = nhm_large_itlb, }, { .name = "LOAD_DISPATCH", .desc = "Loads dispatched", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x13, .numasks = LIBPFM_ARRAY_SIZE(nhm_load_dispatch), .ngrp = 1, .umasks = nhm_load_dispatch, }, { .name = "LOAD_HIT_PRE", .desc = "Load operations conflicting with software prefetches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x14c, }, { .name = "LONGEST_LAT_CACHE", .desc = "Longest latency cache reference", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(nhm_longest_lat_cache), .ngrp = 1, .umasks = nhm_longest_lat_cache, }, { .name = "LSD", .desc = "Loop stream detector", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xa8, .numasks = LIBPFM_ARRAY_SIZE(nhm_lsd), .ngrp = 1, .umasks = nhm_lsd, }, { .name = "MACHINE_CLEARS", .desc = "Machine Clear", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(nhm_machine_clears), .ngrp = 1, .umasks = nhm_machine_clears, }, { .name = "MACRO_INSTS", .desc = "Macro-fused instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd0, .numasks = LIBPFM_ARRAY_SIZE(nhm_macro_insts), .ngrp = 1, .umasks = nhm_macro_insts, }, { .name = "MEMORY_DISAMBIGUATION", .desc = "Memory Disambiguation Activity", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x9, .numasks = LIBPFM_ARRAY_SIZE(nhm_memory_disambiguation), .ngrp = 1, .umasks = nhm_memory_disambiguation, }, { .name = "MEM_INST_RETIRED", .desc = "Memory instructions retired", .modmsk = INTEL_V3_ATTRS | _INTEL_X86_ATTR_LDLAT, .cntmsk = 0xf, .code = 0xb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_mem_inst_retired), .ngrp = 1, .umasks = nhm_mem_inst_retired, }, { .name = "MEM_LOAD_RETIRED", .desc = "Retired loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xcb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_mem_load_retired), .ngrp = 1, .umasks = nhm_mem_load_retired, }, { .name = "MEM_STORE_RETIRED", .desc = "Retired stores", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_mem_store_retired), .ngrp = 1, .umasks = nhm_mem_store_retired, }, { .name = "MEM_UNCORE_RETIRED", .desc = "Load instructions retired which hit offcore", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_mem_uncore_retired), .ngrp = 1, .umasks = nhm_mem_uncore_retired, }, { .name = "OFFCORE_REQUESTS", .desc = "Offcore memory requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(nhm_offcore_requests), .ngrp = 1, .umasks = nhm_offcore_requests, }, { .name = "OFFCORE_REQUESTS_SQ_FULL", .desc = "Counts cycles the Offcore Request buffer or Super Queue is full.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b2, }, { .name = "PARTIAL_ADDRESS_ALIAS", .desc = "False dependencies due to partial address froming", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x107, }, { .name = "PIC_ACCESSES", .desc = "Programmable interrupt controller", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xba, .numasks = LIBPFM_ARRAY_SIZE(nhm_pic_accesses), .ngrp = 1, .umasks = nhm_pic_accesses, }, { .name = "RAT_STALLS", .desc = "Register allocation table stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd2, .numasks = LIBPFM_ARRAY_SIZE(nhm_rat_stalls), .ngrp = 1, .umasks = nhm_rat_stalls, }, { .name = "RESOURCE_STALLS", .desc = "Processor stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xa2, .numasks = LIBPFM_ARRAY_SIZE(nhm_resource_stalls), .ngrp = 1, .umasks = nhm_resource_stalls, }, { .name = "SEG_RENAME_STALLS", .desc = "Segment rename stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1d4, }, { .name = "SEGMENT_REG_LOADS", .desc = "Counts number of segment register loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1f8, }, { .name = "SIMD_INT_128", .desc = "128 bit SIMD integer operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x12, .numasks = LIBPFM_ARRAY_SIZE(nhm_simd_int_128), .ngrp = 1, .umasks = nhm_simd_int_128, }, { .name = "SIMD_INT_64", .desc = "64 bit SIMD integer operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xfd, .numasks = LIBPFM_ARRAY_SIZE(nhm_simd_int_64), .ngrp = 1, .umasks = nhm_simd_int_64, }, { .name = "SNOOP_RESPONSE", .desc = "Snoop", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb8, .numasks = LIBPFM_ARRAY_SIZE(nhm_snoop_response), .ngrp = 1, .umasks = nhm_snoop_response, }, { .name = "SQ_FULL_STALL_CYCLES", .desc = "Counts cycles the Offcore Request buffer or Super Queue is full and request(s) are outstanding.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1f6, }, { .name = "SQ_MISC", .desc = "Super Queue Activity Related to L2 Cache Access", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf4, .numasks = LIBPFM_ARRAY_SIZE(nhm_sq_misc), .ngrp = 1, .umasks = nhm_sq_misc, }, { .name = "SSE_MEM_EXEC", .desc = "Streaming SIMD executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(nhm_sse_mem_exec), .ngrp = 1, .umasks = nhm_sse_mem_exec, }, { .name = "SSEX_UOPS_RETIRED", .desc = "SIMD micro-ops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc7, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_ssex_uops_retired), .ngrp = 1, .umasks = nhm_ssex_uops_retired, }, { .name = "STORE_BLOCKS", .desc = "Delayed loads", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(nhm_store_blocks), .ngrp = 1, .umasks = nhm_store_blocks, }, { .name = "TWO_UOP_INSTS_DECODED", .desc = "Two micro-ops instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x119, }, { .name = "UOPS_DECODED_DEC0", .desc = "Micro-ops decoded by decoder 0", .modmsk =0x0, .cntmsk = 0xf, .code = 0x13d, }, { .name = "UOPS_DECODED", .desc = "Micro-ops decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd1, .numasks = LIBPFM_ARRAY_SIZE(nhm_uops_decoded), .ngrp = 1, .umasks = nhm_uops_decoded, }, { .name = "UOPS_EXECUTED", .desc = "Micro-ops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(nhm_uops_executed), .ngrp = 1, .umasks = nhm_uops_executed, }, { .name = "UOPS_ISSUED", .desc = "Micro-ops issued", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe, .numasks = LIBPFM_ARRAY_SIZE(nhm_uops_issued), .ngrp = 1, .umasks = nhm_uops_issued, }, { .name = "UOPS_RETIRED", .desc = "Micro-ops retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(nhm_uops_retired), .ngrp = 1, .umasks = nhm_uops_retired, }, { .name = "UOP_UNFUSION", .desc = "Micro-ops unfusions due to FP exceptions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1db, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response 0 (must provide at least one request and one response umasks)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(nhm_offcore_response_0), .ngrp = 2, .umasks = nhm_offcore_response_0, }, }; papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_ha_events.h0000600003276200002170000003573112247131123023721 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snbep_unc_ha (Intel SandyBridge-EP HA uncore PMU) */ static const intel_x86_umask_t snbep_unc_h_conflict_cycles[]={ { .uname = "CONFLICT", .udesc = "Number of cycles that we are handling conflicts", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NO_CONFLICT", .udesc = "Number of cycles that we are not handling conflicts", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_directory_lookup[]={ { .uname = "NO_SNP", .udesc = "Snoop not needed", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "Snoop needed", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_directory_update[]={ { .uname = "ANY", .udesc = "Counts any directory update", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "CLEAR", .udesc = "Directory clears", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SET", .udesc = "Directory set", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_igr_no_credit_cycles[]={ { .uname = "AD_QPI0", .udesc = "AD to QPI link 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "AD_QPI1", .udesc = "AD to QPI link 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "BL_QPI0", .udesc = "BL to QPI link 0", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "BL_QPI1", .udesc = "BL to QPI link 1", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_imc_writes[]={ { .uname = "ALL", .udesc = "Counts all writes", .ucode = 0xf00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FULL", .udesc = "Counts full line non ISOCH", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "FULL_ISOCH", .udesc = "Counts ISOCH full line", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PARTIAL", .udesc = "Counts partial non-ISOCH", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PARTIAL_ISOCH", .udesc = "Counts ISOCH partial", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_requests[]={ { .uname = "READS", .udesc = "Counts incoming read requests. Good proxy for LLC read misses, incl. RFOs", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WRITES", .udesc = "Counts incoming writes", .ucode = 0xc00, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_rpq_cycles_no_reg_credits[]={ { .uname = "CHN0", .udesc = "Channel 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN1", .udesc = "Channel 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN2", .udesc = "channel 2", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN3", .udesc = "Chanell 3", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_tad_requests_g0[]={ { .uname = "REGION0", .udesc = "Counts for TAD Region 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION1", .udesc = "Counts for TAD Region 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION2", .udesc = "Counts for TAD Region 2", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION3", .udesc = "Counts for TAD Region 3", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION4", .udesc = "Counts for TAD Region 4", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION5", .udesc = "Counts for TAD Region 5", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION6", .udesc = "Counts for TAD Region 6", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION7", .udesc = "Counts for TAD Region 7", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_tad_requests_g1[]={ { .uname = "REGION8", .udesc = "Counts for TAD Region 8", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION9", .udesc = "Counts for TAD Region 9", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION10", .udesc = "Counts for TAD Region 10", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REGION11", .udesc = "Counts for TAD Region 11", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_tracker_inserts[]={ { .uname = "ALL", .udesc = "Counts all requests", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_h_txr_ad[]={ { .uname = "NDR", .udesc = "Counts non-data responses", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "Counts outbound snoops send on the ring", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_txr_ad_cycles_full[]={ { .uname = "ALL", .udesc = "Counts cycles full from both schedulers", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "SCHED0", .udesc = "Counts cycles full from scheduler bank 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SCHED1", .udesc = "Counts cycles full from scheduler bank 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_txr_ak_cycles_full[]={ { .uname = "ALL", .udesc = "Counts cycles from both schedulers", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "SCHED0", .udesc = "Counts cycles from scheduler bank 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SCHED1", .udesc = "Counts cycles from scheduler bank 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_txr_bl[]={ { .uname = "DRS_CACHE", .udesc = "Counts data being sent to the cache", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DRS_CORE", .udesc = "Counts data being sent directly to the requesting core", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DRS_QPI", .udesc = "Counts data being sent to a remote socket over QPI", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_wpq_cycles_no_reg_credits[]={ { .uname = "CHN0", .udesc = "HA iMC CHN0 WPQ Credits Empty - Regular", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN1", .udesc = "HA iMC CHN0 WPQ Credits Empty - Regular", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN2", .udesc = "HA iMC CHN0 WPQ Credits Empty - Regular", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CHN3", .udesc = "HA iMC CHN0 WPQ Credits Empty - Regular", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_h_txr_bl_cycles_full[]={ { .uname = "ALL", .udesc = "BL Egress Full", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SCHED0", .udesc = "BL Egress Full", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SCHED1", .udesc = "BL Egress Full", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; #if 0 static const intel_x86_umask_t snbep_unc_h_addr_opc_match[]={ { .uname = "FILT", .udesc = "Number of addr and opcode matches (opc via opc= or address via addr= modifiers)", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_ADDR, }, }; #endif static const intel_x86_entry_t intel_snbep_unc_h_pe[]={ { .name = "UNC_H_CLOCKTICKS", .desc = "HA Uncore clockticks", .modmsk = SNBEP_UNC_HA_ATTRS, .cntmsk = 0xf, .code = 0x00, }, { .name = "UNC_H_CONFLICT_CYCLES", .desc = "Conflict Checks", .code = 0xb, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_conflict_cycles), .umasks = snbep_unc_h_conflict_cycles, }, { .name = "UNC_H_DIRECT2CORE_COUNT", .desc = "Direct2Core Messages Sent", .code = 0x11, .cntmsk = 0xf, .modmsk = SNBEP_UNC_HA_ATTRS, }, { .name = "UNC_H_DIRECT2CORE_CYCLES_DISABLED", .desc = "Cycles when Direct2Core was Disabled", .code = 0x12, .cntmsk = 0xf, .modmsk = SNBEP_UNC_HA_ATTRS, }, { .name = "UNC_H_DIRECT2CORE_TXN_OVERRIDE", .desc = "Number of Reads that had Direct2Core Overridden", .code = 0x13, .cntmsk = 0xf, .modmsk = SNBEP_UNC_HA_ATTRS, }, { .name = "UNC_H_DIRECTORY_LOOKUP", .desc = "Directory Lookups", .code = 0xc, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_directory_lookup), .umasks = snbep_unc_h_directory_lookup }, { .name = "UNC_H_DIRECTORY_UPDATE", .desc = "Directory Updates", .code = 0xd, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_directory_update), .umasks = snbep_unc_h_directory_update }, { .name = "UNC_H_IGR_NO_CREDIT_CYCLES", .desc = "Cycles without QPI Ingress Credits", .code = 0x22, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_igr_no_credit_cycles), .umasks = snbep_unc_h_igr_no_credit_cycles }, { .name = "UNC_H_IMC_RETRY", .desc = "Retry Events", .code = 0x1e, .cntmsk = 0xf, .modmsk = SNBEP_UNC_HA_ATTRS, }, { .name = "UNC_H_IMC_WRITES", .desc = "HA to iMC Full Line Writes Issued", .code = 0x1a, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_imc_writes), .umasks = snbep_unc_h_imc_writes }, { .name = "UNC_H_REQUESTS", .desc = "Read and Write Requests", .code = 0x1, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_requests), .umasks = snbep_unc_h_requests }, { .name = "UNC_H_RPQ_CYCLES_NO_REG_CREDITS", .desc = "iMC RPQ Credits Empty - Regular", .code = 0x15, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_rpq_cycles_no_reg_credits), .umasks = snbep_unc_h_rpq_cycles_no_reg_credits }, { .name = "UNC_H_TAD_REQUESTS_G0", .desc = "HA Requests to a TAD Region - Group 0", .code = 0x1b, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_tad_requests_g0), .umasks = snbep_unc_h_tad_requests_g0 }, { .name = "UNC_H_TAD_REQUESTS_G1", .desc = "HA Requests to a TAD Region - Group 1", .code = 0x1c, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_tad_requests_g1), .umasks = snbep_unc_h_tad_requests_g1 }, { .name = "UNC_H_TRACKER_INSERTS", .desc = "Tracker Allocations", .code = 0x6, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_tracker_inserts), .umasks = snbep_unc_h_tracker_inserts }, { .name = "UNC_H_TXR_AD", .desc = "Outbound NDR Ring Transactions", .code = 0xf, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_txr_ad), .umasks = snbep_unc_h_txr_ad }, { .name = "UNC_H_TXR_AD_CYCLES_FULL", .desc = "AD Egress Full", .code = 0x2a, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_txr_ad_cycles_full), .umasks = snbep_unc_h_txr_ad_cycles_full }, { .name = "UNC_H_TXR_AK_CYCLES_FULL", .desc = "AK Egress Full", .code = 0x32, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_txr_ak_cycles_full), .umasks = snbep_unc_h_txr_ak_cycles_full }, { .name = "UNC_H_TXR_AK_NDR", .desc = "Outbound NDR Ring Transactions", .code = 0xe, .cntmsk = 0xf, .modmsk = SNBEP_UNC_HA_ATTRS, }, { .name = "UNC_H_TXR_BL", .desc = "Outbound DRS Ring Transactions to Cache", .code = 0x10, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_txr_bl), .umasks = snbep_unc_h_txr_bl }, { .name = "UNC_H_TXR_BL_CYCLES_FULL", .desc = "BL Egress Full", .code = 0x36, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_txr_ak_cycles_full), .umasks = snbep_unc_h_txr_ak_cycles_full, /* identical to snbep_unc_h_txr_ak_cycles_full */ }, { .name = "UNC_H_WPQ_CYCLES_NO_REG_CREDITS", .desc = "HA iMC CHN0 WPQ Credits Empty - Regular", .code = 0x18, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_rpq_cycles_no_reg_credits), .umasks = snbep_unc_h_rpq_cycles_no_reg_credits , /* identical to snbep_unc_h_rpq_cycles_no_reg_credits */ }, #if 0 { .name = "UNC_H_ADDR_OPC_MATCH", .desc = "QPI address/opcode match", .code = 0x20, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_HA_OPC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_h_addr_opc_match), .umasks = snbep_unc_h_addr_opc_match, }, #endif }; papi-5.3.0/src/libpfm4/lib/events/intel_wsm_events.h0000600003276200002170000022717512247131123022110 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: wsm (Intel Westmere (single-socket)) */ static const intel_x86_umask_t wsm_uops_decoded[]={ { .uname = "ESP_FOLDING", .udesc = "Stack pointer instructions decoded", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ESP_SYNC", .udesc = "Stack pointer sync operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS", .udesc = "Counts the number of uops decoded by the Microcode Sequencer (MS). The MS delivers uops when the instruction is more than 4 uops long or a microcode assist is occurring.", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MS_CYCLES_ACTIVE", .udesc = "Uops decoded by Microcode Sequencer", .uequiv = "MS:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no Uops are decoded", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_bpu_clears[]={ { .uname = "EARLY", .udesc = "Early Branch Prediction Unit clears", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LATE", .udesc = "Late Branch Prediction Unit clears", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_uops_retired[]={ { .uname = "ANY", .udesc = "Uops retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "MACRO_FUSED", .udesc = "Macro-fused Uops retired (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "RETIRE_SLOTS", .udesc = "Retirement slots used (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles Uops are not retiring (Precise Event)", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Total cycles using precise uop retired event (Precise Event)", .uequiv = "ANY:c=16:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ACTIVE_CYCLES", .udesc = "Alias for TOTAL_CYCLES (Precise Event)", .uequiv = "ANY:c=16:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_br_misp_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "Mispredicted retired branch instructions (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "Mispredicted near retired calls (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "CONDITIONAL", .udesc = "Mispredicted conditional branches retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_ept[]={ { .uname = "WALK_CYCLES", .udesc = "Extended Page Table walk cycles", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_uops_executed[]={ { .uname = "PORT0", .udesc = "Uops executed on port 0 (integer arithmetic, SIMD and FP add uops)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT1", .udesc = "Uops executed on port 1 (integer arithmetic, SIMD, integer shift, FP multiply, FP divide uops)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT2_CORE", .udesc = "Uops executed on port 2 on any thread (load uops) (core count only)", .ucode = 0x400 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT3_CORE", .udesc = "Uops executed on port 3 on any thread (store uops) (core count only)", .ucode = 0x800 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT4_CORE", .udesc = "Uops executed on port 4 on any thread (handle store values for stores on port 3) (core count only)", .ucode = 0x1000 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT5", .udesc = "Uops executed on port 5", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT015", .udesc = "Uops issued on ports 0, 1 or 5", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT234_CORE", .udesc = "Uops issued on ports 2, 3 or 4 on any thread (core count only)", .ucode = 0x8000 | INTEL_X86_MOD_ANY, .modhw = _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PORT015_STALL_CYCLES", .udesc = "Cycles no Uops issued on ports 0, 1 or 5", .uequiv = "PORT015:c=1:i=1", .ucode = 0x4000 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_ACTIVE_CYCLES_NO_PORT5", .udesc = "Cycles in which uops are executed only on port0-4 on any thread (core count only)", .ucode = 0x1f00 | INTEL_X86_MOD_ANY | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C | _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_ACTIVE_CYCLES", .udesc = "Cycles in which uops are executed on any port any thread (core count only)", .ucode = 0x3f00 | INTEL_X86_MOD_ANY | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_C | _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles in which no uops are executed on any port any thread (core count only)", .ucode = 0x3f00 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C | _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_STALL_CYCLES_NO_PORT5", .udesc = "Cycles in which no uops are executed on any port0-4 on any thread (core count only)", .ucode = 0x1f00 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C | _INTEL_X86_ATTR_T, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_STALL_COUNT", .udesc = "Number of transitions from stalled to uops to execute on any port any thread (core count only)", .uequiv = "CORE_STALL_CYCLES:e=1", .ucode = 0x3f00 | INTEL_X86_MOD_EDGE | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_STALL_COUNT_NO_PORT5", .udesc = "Number of transitions from stalled to uops to execute on port0-4 on any thread (core count only)", .uequiv = "CORE_STALL_CYCLES_NO_PORT5:e=1", .ucode = 0x1f00 | INTEL_X86_MOD_EDGE | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_inst_retired[]={ { .uname = "ANY_P", .udesc = "Instructions Retired (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "X87", .udesc = "Retired floating-point operations (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "MMX", .udesc = "Retired MMX instructions (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Total cycles (Precise Event)", .uequiv = "ANY_P:c=16:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x10 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_ild_stall[]={ { .uname = "ANY", .udesc = "Any Instruction Length Decoder stall cycles", .uequiv = "IQ_FULL:LCP:MRU:REGEN", .ucode = 0xf00, .uflags= INTEL_X86_DFL, }, { .uname = "IQ_FULL", .udesc = "Instruction Queue full stall cycles", .ucode = 0x400, }, { .uname = "LCP", .udesc = "Length Change Prefix stall cycles", .ucode = 0x100, }, { .uname = "MRU", .udesc = "Stall cycles due to BPU MRU bypass", .ucode = 0x200, }, { .uname = "REGEN", .udesc = "Regen stall cycles", .ucode = 0x800, }, }; static const intel_x86_umask_t wsm_dtlb_load_misses[]={ { .uname = "ANY", .udesc = "DTLB load misses", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "PDE_MISS", .udesc = "DTLB load miss caused by low part of address", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "DTLB second level hit", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "DTLB load miss page walks complete", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_CYCLES", .udesc = "DTLB load miss page walk cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l2_lines_in[]={ { .uname = "ANY", .udesc = "L2 lines alloacated", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "E_STATE", .udesc = "L2 lines allocated in the E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L2 lines allocated in the S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_ssex_uops_retired[]={ { .uname = "PACKED_DOUBLE", .udesc = "SIMD Packed-Double Uops retired (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "PACKED_SINGLE", .udesc = "SIMD Packed-Single Uops retired (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SCALAR_DOUBLE", .udesc = "SIMD Scalar-Double Uops retired (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SCALAR_SINGLE", .udesc = "SIMD Scalar-Single Uops retired (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "VECTOR_INTEGER", .udesc = "SIMD Vector Integer Uops retired (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_store_blocks[]={ { .uname = "AT_RET", .udesc = "Loads delayed with at-Retirement block code", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_BLOCK", .udesc = "Cacheable loads delayed with L1D block code", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_fp_mmx_trans[]={ { .uname = "ANY", .udesc = "All Floating Point to and from MMX transitions", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "TO_FP", .udesc = "Transitions from MMX to Floating Point instructions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TO_MMX", .udesc = "Transitions from Floating Point to MMX instructions", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_cache_lock_cycles[]={ { .uname = "L1D", .udesc = "Cycles L1D locked", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_L2", .udesc = "Cycles L1D and L2 locked", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l3_lat_cache[]={ { .uname = "MISS", .udesc = "Last level cache miss", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCE", .udesc = "Last level cache reference", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_simd_int_64[]={ { .uname = "PACK", .udesc = "SIMD integer 64 bit pack operations", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_ARITH", .udesc = "SIMD integer 64 bit arithmetic operations", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_LOGICAL", .udesc = "SIMD integer 64 bit logical operations", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_MPY", .udesc = "SIMD integer 64 bit packed multiply operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_SHIFT", .udesc = "SIMD integer 64 bit shift operations", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHUFFLE_MOVE", .udesc = "SIMD integer 64 bit shuffle/move operations", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK", .udesc = "SIMD integer 64 bit unpack operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_br_misp_exec[]={ { .uname = "ANY", .udesc = "Mispredicted branches executed", .ucode = 0x7f00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "COND", .udesc = "Mispredicted conditional branches executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT", .udesc = "Mispredicted unconditional branches executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT_NEAR_CALL", .udesc = "Mispredicted non call branches executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NEAR_CALL", .udesc = "Mispredicted indirect call branches executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NON_CALL", .udesc = "Mispredicted indirect non call branches executed", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NEAR_CALLS", .udesc = "Mispredicted call branches executed", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NON_CALLS", .udesc = "Mispredicted non call branches executed", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RETURN_NEAR", .udesc = "Mispredicted return branches executed", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN", .udesc = "Mispredicted taken branches executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_baclear[]={ { .uname = "BAD_TARGET", .udesc = "BACLEAR asserted with bad target address", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CLEAR", .udesc = "BACLEAR asserted, regardless of cause", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_dtlb_misses[]={ { .uname = "ANY", .udesc = "DTLB misses", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "LARGE_WALK_COMPLETED", .udesc = "DTLB miss large page walks", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "DTLB first level misses but second level hit", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "DTLB miss page walks", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_CYCLES", .udesc = "DTLB miss page walk cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_mem_inst_retired[]={ { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory instructions retired above programmed clocks, minimum threshold value is 3, (Precise Event and ldlat required)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT, }, { .uname = "LOADS", .udesc = "Instructions retired which contains a load (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STORES", .udesc = "Instructions retired which contains a store (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_uops_issued[]={ { .uname = "ANY", .udesc = "Uops issued", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "STALL_CYCLES", .udesc = "Cycles stalled no issued uops", .uequiv = "ANY:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "FUSED", .udesc = "Fused Uops issued", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES_ALL_THREADS", .udesc = "Cycles uops issued on either threads (core count)", .uequiv = "ANY:c=1:t=1", .ucode = 0x100 | INTEL_X86_MOD_ANY | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles no uops issued on any threads (core count)", .uequiv = "ANY:c=1:i=1:t=1", .ucode = 0x100 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l2_rqsts[]={ { .uname = "IFETCH_HIT", .udesc = "L2 instruction fetch hits", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH_MISS", .udesc = "L2 instruction fetch misses", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCHES", .udesc = "L2 instruction fetches", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LD_HIT", .udesc = "L2 load hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LD_MISS", .udesc = "L2 load misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOADS", .udesc = "L2 requests", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "All L2 misses", .ucode = 0xaa00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_HIT", .udesc = "L2 prefetch hits", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_MISS", .udesc = "L2 prefetch misses", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCHES", .udesc = "All L2 prefetches", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REFERENCES", .udesc = "All L2 requests", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HIT", .udesc = "L2 RFO hits", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MISS", .udesc = "L2 RFO misses", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFOS", .udesc = "L2 RFO requests", .ucode = 0xc00, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_load_dispatch[]={ { .uname = "ANY", .udesc = "All loads dispatched", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "RS", .udesc = "Number of loads dispatched from the Reservation Station (RS) that bypass the Memory Order Buffer", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RS_DELAYED", .udesc = "Number of delayed RS dispatches at the stage latch", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MOB", .udesc = "Number of loads dispatched from Reservation Station (RS)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_snoopq_requests[]={ { .uname = "CODE", .udesc = "Snoop code requests", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DATA", .udesc = "Snoop data requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INVALIDATE", .udesc = "Snoop invalidate requests", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_offcore_requests[]={ { .uname = "ANY", .udesc = "All offcore requests", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY_READ", .udesc = "Offcore read requests", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ANY_RFO", .udesc = "Offcore RFO requests", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_CODE", .udesc = "Offcore demand code read requests", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_DATA", .udesc = "Offcore demand data read requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Offcore demand RFO requests", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WRITEBACK", .udesc = "Offcore L1 data cache writebacks", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_load_block[]={ { .uname = "OVERLAP_STORE", .udesc = "Loads that partially overlap an earlier store", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_misalign_memory[]={ { .uname = "STORE", .udesc = "Store referenced with misaligned address", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_machine_clears[]={ { .uname = "MEM_ORDER", .udesc = "Execution pipeline restart due to Memory ordering conflicts ", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CYCLES", .udesc = "Cycles machine clear is asserted", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SMC", .udesc = "Self-modifying code detected", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_fp_comp_ops_exe[]={ { .uname = "MMX", .udesc = "MMX Uops", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_DOUBLE_PRECISION", .udesc = "SSE FP double precision Uops", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP", .udesc = "SSE and SSE2 FP Uops", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_PACKED", .udesc = "SSE FP packed Uops", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_FP_SCALAR", .udesc = "SSE FP scalar Uops", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE_SINGLE_PRECISION", .udesc = "SSE FP single precision Uops", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SSE2_INTEGER", .udesc = "SSE2 integer Uops", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "X87", .udesc = "Computational floating-point operations executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_br_inst_retired[]={ { .uname = "ALL_BRANCHES", .udesc = "Retired branch instructions (Precise Event)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "CONDITIONAL", .udesc = "Retired conditional branch instructions (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "Retired near call instructions (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_large_itlb[]={ { .uname = "HIT", .udesc = "Large ITLB hit", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_lsd[]={ { .uname = "UOPS", .udesc = "Counts the number of micro-ops delivered by LSD", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ACTIVE", .udesc = "Cycles is which at least one micro-op delivered by LSD", .uequiv = "UOPS:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "INACTIVE", .udesc = "Cycles is which no micro-op is delivered by LSD", .uequiv = "UOPS:c=1:i=1", .ucode = 0x100 | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l2_lines_out[]={ { .uname = "ANY", .udesc = "L2 lines evicted", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "DEMAND_CLEAN", .udesc = "L2 lines evicted by a demand request", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "L2 modified lines evicted by a demand request", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_CLEAN", .udesc = "L2 lines evicted by a prefetch request", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_DIRTY", .udesc = "L2 modified lines evicted by a prefetch request", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_itlb_misses[]={ { .uname = "ANY", .udesc = "ITLB miss", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "WALK_COMPLETED", .udesc = "ITLB miss page walks", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WALK_CYCLES", .udesc = "ITLB miss page walk cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LARGE_WALK_COMPLETED", .udesc = "Number of completed large page walks due to misses in the STLB", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "ITLB misses hitting second level TLB", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l1d_prefetch[]={ { .uname = "MISS", .udesc = "L1D hardware prefetch misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REQUESTS", .udesc = "L1D hardware prefetch requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TRIGGERS", .udesc = "L1D hardware prefetch requests triggered", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_sq_misc[]={ { .uname = "LRU_HINTS", .udesc = "Super Queue LRU hints sent to LLC", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SPLIT_LOCK", .udesc = "Super Queue lock splits across a cache line", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_fp_assist[]={ { .uname = "ALL", .udesc = "All X87 Floating point assists (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "INPUT", .udesc = "X87 Floating point assists for invalid input value (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "OUTPUT", .udesc = "X87 Floating point assists for invalid output value (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_simd_int_128[]={ { .uname = "PACK", .udesc = "128 bit SIMD integer pack operations", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_ARITH", .udesc = "128 bit SIMD integer arithmetic operations", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_LOGICAL", .udesc = "128 bit SIMD integer logical operations", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_MPY", .udesc = "128 bit SIMD integer multiply operations", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_SHIFT", .udesc = "128 bit SIMD integer shift operations", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SHUFFLE_MOVE", .udesc = "128 bit SIMD integer shuffle/move operations", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "UNPACK", .udesc = "128 bit SIMD integer unpack operations", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_offcore_requests_outstanding[]={ { .uname = "ANY_READ", .udesc = "Outstanding offcore reads", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_CODE", .udesc = "Outstanding offcore demand code reads", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_READ_DATA", .udesc = "Outstanding offcore demand data reads", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Outstanding offcore demand RFOs", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_mem_store_retired[]={ { .uname = "DTLB_MISS", .udesc = "Retired stores that miss the DTLB (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_inst_decoded[]={ { .uname = "DEC0", .udesc = "Instructions that must be decoded by decoder 0", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_macro_insts[]={ { .uname = "DECODED", .udesc = "Instructions decoded", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_arith[]={ { .uname = "CYCLES_DIV_BUSY", .udesc = "Counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. Count may be incorrect when HT is on", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIV", .udesc = "Counts the number of divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. Count may be incorrect when HT is on", .uequiv = "CYCLES_DIV_BUSY:c=1:i=1:e=1", .ucode = 0x100 | INTEL_X86_MOD_EDGE | INTEL_X86_MOD_INV | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "MUL", .udesc = "Counts the number of multiply operations executed. This includes integer as well as floating point multiply operations but excludes DPPS mul and MPSAD. Count may be incorrect when HT is on", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l2_transactions[]={ { .uname = "ANY", .udesc = "All L2 transactions", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FILL", .udesc = "L2 fill transactions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IFETCH", .udesc = "L2 instruction fetch transactions", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writeback to L2 transactions", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOAD", .udesc = "L2 Load transactions", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH", .udesc = "L2 prefetch transactions", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "L2 RFO transactions", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "WB", .udesc = "L2 writeback to LLC transactions", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_sb_drain[]={ { .uname = "ANY", .udesc = "All Store buffer stall cycles", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_mem_uncore_retired[]={ { .uname = "LOCAL_HITM", .udesc = "Load instructions retired that HIT modified data in sibling core (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "LOCAL_DRAM_AND_REMOTE_CACHE_HIT", .udesc = "Load instructions retired local dram and remote cache HIT data sources (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "REMOTE_DRAM", .udesc = "Load instructions retired remote DRAM and remote home-remote cache HITM (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "UNCACHEABLE", .udesc = "Load instructions retired IO (Precise Event)", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "REMOTE_HITM", .udesc = "Retired loads that hit remote socket in modified state (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "OTHER_LLC_MISS", .udesc = "Load instructions retired other LLC miss (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "UNKNOWN_SOURCE", .udesc = "Load instructions retired unknown LLC miss (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "LOCAL_DRAM", .udesc = "Retired loads with a data source of local DRAM or locally homed remote cache HITM (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "OTHER_CORE_L2_HITM", .udesc = "Retired loads instruction that hit modified data in sibling core (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "REMOTE_CACHE_LOCAL_HOME_HIT", .udesc = "Retired loads instruction that hit remote cache hit data source (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "REMOTE_DRAM", .udesc = "Retired loads instruction remote DRAM and remote home-remote cache HITM (Precise Event)", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, .umodel = PFM_PMU_INTEL_WSM, }, }; static const intel_x86_umask_t wsm_l2_data_rqsts[]={ { .uname = "ANY", .udesc = "All L2 data requests", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "DEMAND_E_STATE", .udesc = "L2 data demand loads in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_I_STATE", .udesc = "L2 data demand loads in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_M_STATE", .udesc = "L2 data demand loads in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_MESI", .udesc = "L2 data demand requests", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_S_STATE", .udesc = "L2 data demand loads in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_E_STATE", .udesc = "L2 data prefetches in E state", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_I_STATE", .udesc = "L2 data prefetches in the I state (misses)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_M_STATE", .udesc = "L2 data prefetches in M state", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_MESI", .udesc = "All L2 data prefetches", .ucode = 0xf000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PREFETCH_S_STATE", .udesc = "L2 data prefetches in the S state", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_br_inst_exec[]={ { .uname = "ANY", .udesc = "Branch instructions executed", .ucode = 0x7f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "COND", .udesc = "Conditional branch instructions executed", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT", .udesc = "Unconditional branches executed", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DIRECT_NEAR_CALL", .udesc = "Unconditional call branches executed", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NEAR_CALL", .udesc = "Indirect call branches executed", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INDIRECT_NON_CALL", .udesc = "Indirect non call branches executed", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NEAR_CALLS", .udesc = "Call branches executed", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "NON_CALLS", .udesc = "All non call branches executed", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RETURN_NEAR", .udesc = "Indirect return branches executed", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TAKEN", .udesc = "Taken branches executed", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_snoopq_requests_outstanding[]={ { .uname = "CODE", .udesc = "Outstanding snoop code requests", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "CODE_NOT_EMPTY", .udesc = "Cycles snoop code requests queue not empty", .uequiv = "CODE:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DATA", .udesc = "Outstanding snoop data requests", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DATA_NOT_EMPTY", .udesc = "Cycles snoop data requests queue not empty", .uequiv = "DATA:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "INVALIDATE", .udesc = "Outstanding snoop invalidate requests", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INVALIDATE_NOT_EMPTY", .udesc = "Cycles snoop invalidate requests queue not empty", .uequiv = "INVALIDATE:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_mem_load_retired[]={ { .uname = "DTLB_MISS", .udesc = "Retired loads that miss the DTLB (Precise Event)", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "HIT_LFB", .udesc = "Retired loads that miss L1D and hit an previously allocated LFB (Precise Event)", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1D_HIT", .udesc = "Retired loads that hit the L1 data cache (Precise Event)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Retired loads that hit the L2 cache (Precise Event)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Retired loads that miss the LLC cache (Precise Event)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LLC_MISS", .udesc = "This is an alias for L3_MISS", .uequiv = "L3_MISS", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_UNSHARED_HIT", .udesc = "Retired loads that hit valid versions in the LLC cache (Precise Event)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LLC_UNSHARED_HIT", .udesc = "This is an alias for L3_UNSHARED_HIT", .uequiv = "L3_UNSHARED_HIT", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "OTHER_CORE_L2_HIT_HITM", .udesc = "Retired loads that hit sibling core's L2 in modified or unmodified states (Precise Event)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t wsm_l1i[]={ { .uname = "CYCLES_STALLED", .udesc = "L1I instruction fetch stall cycles", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITS", .udesc = "L1I instruction fetch hits", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MISSES", .udesc = "L1I instruction fetch misses", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "READS", .udesc = "L1I Instruction fetches", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l2_write[]={ { .uname = "LOCK_E_STATE", .udesc = "L2 demand lock RFOs in E state", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_HIT", .udesc = "All demand L2 lock RFOs that hit the cache", .ucode = 0xe000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_I_STATE", .udesc = "L2 demand lock RFOs in I state (misses)", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_M_STATE", .udesc = "L2 demand lock RFOs in M state", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_MESI", .udesc = "All demand L2 lock RFOs", .ucode = 0xf000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LOCK_S_STATE", .udesc = "L2 demand lock RFOs in S state", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_HIT", .udesc = "All L2 demand store RFOs that hit the cache", .ucode = 0xe00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_I_STATE", .udesc = "L2 demand store RFOs in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_M_STATE", .udesc = "L2 demand store RFOs in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_MESI", .udesc = "All L2 demand store RFOs", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "RFO_S_STATE", .udesc = "L2 demand store RFOs in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_snoop_response[]={ { .uname = "HIT", .udesc = "Thread responded HIT to snoop", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITE", .udesc = "Thread responded HITE to snoop", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "HITM", .udesc = "Thread responded HITM to snoop", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l1d[]={ { .uname = "M_EVICT", .udesc = "L1D cache lines replaced in M state ", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_REPL", .udesc = "L1D cache lines allocated in the M state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_SNOOP_EVICT", .udesc = "L1D snoop eviction of cache lines in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REPL", .udesc = "L1 data cache lines allocated", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_resource_stalls[]={ { .uname = "ANY", .udesc = "Resource related stall cycles", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FPCW", .udesc = "FPU control word write stall cycles", .ucode = 0x2000, }, { .uname = "LOAD", .udesc = "Load buffer stall cycles", .ucode = 0x200, }, { .uname = "MXCSR", .udesc = "MXCSR rename stall cycles", .ucode = 0x4000, }, { .uname = "OTHER", .udesc = "Other Resource related stall cycles", .ucode = 0x8000, }, { .uname = "ROB_FULL", .udesc = "ROB full stall cycles", .ucode = 0x1000, }, { .uname = "RS_FULL", .udesc = "Reservation Station full stall cycles", .ucode = 0x400, }, { .uname = "STORE", .udesc = "Store buffer stall cycles", .ucode = 0x800, }, }; static const intel_x86_umask_t wsm_rat_stalls[]={ { .uname = "ANY", .udesc = "All RAT stall cycles", .uequiv = "FLAGS:REGISTERS:ROB_READ_PORT:SCOREBOARD", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "FLAGS", .udesc = "Flag stall cycles", .ucode = 0x100, }, { .uname = "REGISTERS", .udesc = "Partial register stall cycles", .ucode = 0x200, }, { .uname = "ROB_READ_PORT", .udesc = "ROB read port stalls cycles", .ucode = 0x400, }, { .uname = "SCOREBOARD", .udesc = "Scoreboard stall cycles", .ucode = 0x800, }, }; static const intel_x86_umask_t wsm_cpu_clk_unhalted[]={ { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted (programmable counter)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "REF_P", .udesc = "Reference base clock (133 Mhz) cycles when thread is not halted", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TOTAL_CYCLES", .udesc = "Total number of elapsed cycles. Does not work when C-state enabled", .uequiv = "THREAD_P:c=2:i=1", .ucode = 0x0 | INTEL_X86_MOD_INV | (0x2 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_l1d_wb_l2[]={ { .uname = "E_STATE", .udesc = "L1 writebacks to L2 in E state", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "I_STATE", .udesc = "L1 writebacks to L2 in I state (misses)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "M_STATE", .udesc = "L1 writebacks to L2 in M state", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "MESI", .udesc = "All L1 writebacks to L2", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO, }, { .uname = "S_STATE", .udesc = "L1 writebacks to L2 in S state", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_offcore_response_0[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: counts the number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 0x100, .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: counts the number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO", .ucode = 0x200, .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: counts the number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 0x400, .grpid = 0, }, { .uname = "WB", .udesc = "Request: counts the number of writeback (modified to exclusive) transactions", .ucode = 0x800, .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: counts the number of data cacheline reads generated by L2 prefetchers", .ucode = 0x1000, .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: counts the number of RFO requests generated by L2 prefetchers", .ucode = 0x2000, .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: counts the number of code reads generated by L2 prefetchers", .ucode = 0x4000, .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 0x8000, .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH", .ucode = 0x4400, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all requests umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:OTHER", .ucode = 0xff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: any data read/write request", .uequiv = "DMND_DATA_RD:PF_DATA_RD:DMND_RFO:PF_RFO", .ucode = 0x3300, .grpid = 0, }, { .uname = "ANY_DATA_RD", .udesc = "Request: any data read in request", .uequiv = "DMND_DATA_RD:PF_DATA_RD", .ucode = 0x1100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO", .uequiv = "DMND_RFO:PF_RFO", .ucode = 0x2200, .grpid = 0, }, { .uname = "UNCORE_HIT", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore with no coherency actions required (snooping)", .ucode = 0x10000, .grpid = 1, }, { .uname = "OTHER_CORE_HIT_SNP", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by another core with a cross core snoop where no modified copies were found (clean)", .ucode = 0x20000, .grpid = 1, }, { .uname = "OTHER_CORE_HITM", .udesc = "Response: counts L3 Hit: local or remote home requests that hit L3 cache in the uncore and was serviced by another core with a cross core snoop where modified copies were found (HITM)", .ucode = 0x40000, .grpid = 1, }, { .uname = "REMOTE_CACHE_HITM", .udesc = "Response: counts L3 Hit: local or remote home requests that hit a remote L3 cacheline in modified (HITM) state", .ucode = 0x80000, .grpid = 1, }, { .uname = "REMOTE_CACHE_FWD", .udesc = "Response: counts L3 Miss: local homed requests that missed the L3 cache and was serviced by forwarded data following a cross package snoop where no modified copies found. (Remote home requests are not counted)", .ucode = 0x100000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "LOCAL_DRAM_AND_REMOTE_CACHE_HIT", .udesc = "Response: counts L3 Miss: local home requests that missed the L3 cache and were serviced by local DRAM or a remote cache", .ucode = 0x100000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "REMOTE_DRAM", .udesc = "Response: counts L3 Miss: remote home requests that missed the L3 cache and were serviced by remote DRAM", .ucode = 0x200000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "LOCAL_DRAM", .udesc = "Response: counts L3 Miss: local home requests that missed the L3 cache and were serviced by local DRAM", .ucode = 0x200000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "REMOTE_DRAM", .udesc = "Response: counts L3 Miss: remote home requests that missed the L3 cache and were serviced by remote DRAM", .ucode = 0x400000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "OTHER_LLC_MISS", .udesc = "Response: counts L3 Miss: remote home requests that missed the L3 cache", .ucode = 0x400000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "NON_DRAM", .udesc = "Response: Non-DRAM requests that were serviced by IOH", .ucode = 0x800000, .grpid = 1, }, { .uname = "ANY_CACHE_DRAM", .udesc = "Response: requests serviced by any source but IOH", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_FWD:REMOTE_CACHE_HITM:REMOTE_DRAM:LOCAL_DRAM", .ucode = 0x7f0000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "ANY_CACHE_DRAM", .udesc = "Response: requests serviced by any source but IOH", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_HITM:OTHER_LLC_MISS:REMOTE_DRAM:LOCAL_DRAM_AND_REMOTE_CACHE_HIT", .ucode = 0x7f0000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "ANY_DRAM", .udesc = "Response: requests serviced by local or remote DRAM", .uequiv = "REMOTE_DRAM:LOCAL_DRAM", .ucode = 0x600000, .umodel = PFM_PMU_INTEL_WSM, .grpid = 1, }, { .uname = "ANY_LLC_MISS", .udesc = "Response: requests that missed in L3", .uequiv = "REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM:LOCAL_DRAM:NON_DRAM", .ucode = 0xf80000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "ANY_LLC_MISS", .udesc = "Response: requests that missed in L3", .uequiv = "REMOTE_CACHE_HITM:REMOTE_DRAM:OTHER_LLC_MISS:LOCAL_DRAM_AND_REMOTE_CACHE_HIT:NON_DRAM", .ucode = 0xf80000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, { .uname = "LOCAL_CACHE_DRAM", .udesc = "Response: requests hit local core or uncore caches or local DRAM", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:LOCAL_DRAM", .ucode = 0x270000, .umodel = PFM_PMU_INTEL_WSM, .grpid = 1, }, { .uname = "REMOTE_CACHE_DRAM", .udesc = "Response: requests that miss L3 and hit remote caches or DRAM", .uequiv = "REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM", .ucode = 0x580000, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "LOCAL_CACHE", .udesc = "Response: any local (core and socket) caches", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM", .ucode = 0x70000, .grpid = 1, }, { .uname = "ANY_RESPONSE", .udesc = "Response: combination of all response umasks", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_HITM:REMOTE_CACHE_FWD:REMOTE_DRAM:LOCAL_DRAM:NON_DRAM", .ucode = 0xff0000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM, }, { .uname = "ANY_RESPONSE", .udesc = "Response: combination of all response umasks", .uequiv = "UNCORE_HIT:OTHER_CORE_HIT_SNP:OTHER_CORE_HITM:REMOTE_CACHE_HITM:REMOTE_DRAM:OTHER_LLC_MISS:LOCAL_DRAM_AND_REMOTE_CACHE_HIT:NON_DRAM", .ucode = 0xff0000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, .umodel = PFM_PMU_INTEL_WSM_DP, }, }; static const intel_x86_entry_t intel_wsm_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted).", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "INSTRUCTION_RETIRED", .desc = "Count the number of instructions at retirement.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V3_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "LLC_REFERENCES", .desc = "Count each request originating from the core to reference a cache line in the last level cache. The count may include speculation, but excludes cache line fills due to hardware prefetch (Alias for L3_LAT_CACHE:REFERENCE).", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:REFERENCE", .cntmsk = 0xf, .code = 0x4f2e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for L3_LAT_CACHE:REFERENCE", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:REFERENCE", .cntmsk = 0xf, .code = 0x4f2e, }, { .name = "LLC_MISSES", .desc = "Count each cache miss condition for references to the last level cache. The event count may include speculation, but excludes cache line fills due to hardware prefetch (Alias for L3_LAT_CACHE:MISS)", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:MISS", .cntmsk = 0xf, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for L3_LAT_CACHE:MISS", .modmsk = INTEL_V3_ATTRS, .equiv = "L3_LAT_CACHE:MISS", .cntmsk = 0xf, .code = 0x412e, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction.", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xf, .code = 0x4c4, }, { .name = "UOPS_DECODED", .desc = "Micro-ops decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd1, .numasks = LIBPFM_ARRAY_SIZE(wsm_uops_decoded), .ngrp = 1, .umasks = wsm_uops_decoded, }, { .name = "L1D_CACHE_LOCK_FB_HIT", .desc = "L1D cacheable load lock speculated or retired accepted into the fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x152, }, { .name = "BPU_CLEARS", .desc = "Branch Prediction Unit clears", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(wsm_bpu_clears), .ngrp = 1, .umasks = wsm_bpu_clears, }, { .name = "UOPS_RETIRED", .desc = "Cycles Uops are being retired (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc2, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_uops_retired), .ngrp = 1, .umasks = wsm_uops_retired, }, { .name = "BR_MISP_RETIRED", .desc = "Mispredicted retired branches (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc5, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_br_misp_retired), .ngrp = 1, .umasks = wsm_br_misp_retired, }, { .name = "EPT", .desc = "Extended Page Table", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x4f, .numasks = LIBPFM_ARRAY_SIZE(wsm_ept), .ngrp = 1, .umasks = wsm_ept, }, { .name = "UOPS_EXECUTED", .desc = "Micro-ops executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb1, .numasks = LIBPFM_ARRAY_SIZE(wsm_uops_executed), .ngrp = 1, .umasks = wsm_uops_executed, }, { .name = "IO_TRANSACTIONS", .desc = "I/O transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x16c, }, { .name = "ES_REG_RENAMES", .desc = "ES segment renames", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1d5, }, { .name = "INST_RETIRED", .desc = "Instructions retired (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc0, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_inst_retired), .ngrp = 1, .umasks = wsm_inst_retired, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(wsm_ild_stall), .ngrp = 1, .umasks = wsm_ild_stall, }, { .name = "DTLB_LOAD_MISSES", .desc = "DTLB load misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(wsm_dtlb_load_misses), .ngrp = 1, .umasks = wsm_dtlb_load_misses, }, { .name = "L2_LINES_IN", .desc = "L2 lines alloacated", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf1, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_lines_in), .ngrp = 1, .umasks = wsm_l2_lines_in, }, { .name = "SSEX_UOPS_RETIRED", .desc = "SIMD micro-ops retired (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc7, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_ssex_uops_retired), .ngrp = 1, .umasks = wsm_ssex_uops_retired, }, { .name = "STORE_BLOCKS", .desc = "Load delayed by block code", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(wsm_store_blocks), .ngrp = 1, .umasks = wsm_store_blocks, }, { .name = "FP_MMX_TRANS", .desc = "Floating Point to and from MMX transitions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(wsm_fp_mmx_trans), .ngrp = 1, .umasks = wsm_fp_mmx_trans, }, { .name = "CACHE_LOCK_CYCLES", .desc = "Cache locked", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(wsm_cache_lock_cycles), .ngrp = 1, .umasks = wsm_cache_lock_cycles, }, { .name = "OFFCORE_REQUESTS_SQ_FULL", .desc = "Offcore requests blocked due to Super Queue full", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b2, }, { .name = "L3_LAT_CACHE", .desc = "Last level cache accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(wsm_l3_lat_cache), .ngrp = 1, .umasks = wsm_l3_lat_cache, }, { .name = "SIMD_INT_64", .desc = "SIMD 64-bit integer operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xfd, .numasks = LIBPFM_ARRAY_SIZE(wsm_simd_int_64), .ngrp = 1, .umasks = wsm_simd_int_64, }, { .name = "BR_INST_DECODED", .desc = "Branch instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1e0, }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x89, .numasks = LIBPFM_ARRAY_SIZE(wsm_br_misp_exec), .ngrp = 1, .umasks = wsm_br_misp_exec, }, { .name = "SQ_FULL_STALL_CYCLES", .desc = "Super Queue full stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1f6, }, { .name = "BACLEAR", .desc = "Branch address calculator clears", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe6, .numasks = LIBPFM_ARRAY_SIZE(wsm_baclear), .ngrp = 1, .umasks = wsm_baclear, }, { .name = "DTLB_MISSES", .desc = "Data TLB misses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x49, .numasks = LIBPFM_ARRAY_SIZE(wsm_dtlb_misses), .ngrp = 1, .umasks = wsm_dtlb_misses, }, { .name = "MEM_INST_RETIRED", .desc = "Memory instructions retired (Precise Event)", .modmsk = INTEL_V3_ATTRS | _INTEL_X86_ATTR_LDLAT, .cntmsk = 0xf, .code = 0xb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_mem_inst_retired), .ngrp = 1, .umasks = wsm_mem_inst_retired, }, { .name = "UOPS_ISSUED", .desc = "Uops issued", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xe, .numasks = LIBPFM_ARRAY_SIZE(wsm_uops_issued), .ngrp = 1, .umasks = wsm_uops_issued, }, { .name = "L2_RQSTS", .desc = "L2 requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_rqsts), .ngrp = 1, .umasks = wsm_l2_rqsts, }, { .name = "TWO_UOP_INSTS_DECODED", .desc = "Two Uop instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x119, }, { .name = "LOAD_DISPATCH", .desc = "Loads dispatched", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x13, .numasks = LIBPFM_ARRAY_SIZE(wsm_load_dispatch), .ngrp = 1, .umasks = wsm_load_dispatch, }, { .name = "BACLEAR_FORCE_IQ", .desc = "BACLEAR forced by Instruction queue", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1a7, }, { .name = "SNOOPQ_REQUESTS", .desc = "Snoopq requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb4, .numasks = LIBPFM_ARRAY_SIZE(wsm_snoopq_requests), .ngrp = 1, .umasks = wsm_snoopq_requests, }, { .name = "OFFCORE_REQUESTS", .desc = "Offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb0, .numasks = LIBPFM_ARRAY_SIZE(wsm_offcore_requests), .ngrp = 1, .umasks = wsm_offcore_requests, }, { .name = "LOAD_BLOCK", .desc = "Loads blocked", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(wsm_load_block), .ngrp = 1, .umasks = wsm_load_block, }, { .name = "MISALIGN_MEMORY", .desc = "Misaligned accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(wsm_misalign_memory), .ngrp = 1, .umasks = wsm_misalign_memory, }, { .name = "INST_QUEUE_WRITE_CYCLES", .desc = "Cycles instructions are written to the instruction queue", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x11e, }, { .name = "LSD_OVERFLOW", .desc = "Number of loops that cannot stream from the instruction queue.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x120, }, { .name = "MACHINE_CLEARS", .desc = "Machine clear asserted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc3, .numasks = LIBPFM_ARRAY_SIZE(wsm_machine_clears), .ngrp = 1, .umasks = wsm_machine_clears, }, { .name = "FP_COMP_OPS_EXE", .desc = "SSE/MMX micro-ops", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x10, .numasks = LIBPFM_ARRAY_SIZE(wsm_fp_comp_ops_exe), .ngrp = 1, .umasks = wsm_fp_comp_ops_exe, }, { .name = "ITLB_FLUSH", .desc = "ITLB flushes", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1ae, }, { .name = "BR_INST_RETIRED", .desc = "Retired branch instructions (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc4, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_br_inst_retired), .ngrp = 1, .umasks = wsm_br_inst_retired, }, { .name = "L1D_CACHE_PREFETCH_LOCK_FB_HIT", .desc = "L1D prefetch load lock accepted in fill buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x152, }, { .name = "LARGE_ITLB", .desc = "Large ITLB accesses", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x82, .numasks = LIBPFM_ARRAY_SIZE(wsm_large_itlb), .ngrp = 1, .umasks = wsm_large_itlb, }, { .name = "LSD", .desc = "Loop stream detector", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xa8, .numasks = LIBPFM_ARRAY_SIZE(wsm_lsd), .ngrp = 1, .umasks = wsm_lsd, }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf2, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_lines_out), .ngrp = 1, .umasks = wsm_l2_lines_out, }, { .name = "ITLB_MISSES", .desc = "ITLB miss", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(wsm_itlb_misses), .ngrp = 1, .umasks = wsm_itlb_misses, }, { .name = "L1D_PREFETCH", .desc = "L1D hardware prefetch", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x4e, .numasks = LIBPFM_ARRAY_SIZE(wsm_l1d_prefetch), .ngrp = 1, .umasks = wsm_l1d_prefetch, }, { .name = "SQ_MISC", .desc = "Super Queue miscellaneous", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf4, .numasks = LIBPFM_ARRAY_SIZE(wsm_sq_misc), .ngrp = 1, .umasks = wsm_sq_misc, }, { .name = "SEG_RENAME_STALLS", .desc = "Segment rename stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1d4, }, { .name = "FP_ASSIST", .desc = "X87 Floating point assists (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf7, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_fp_assist), .ngrp = 1, .umasks = wsm_fp_assist, }, { .name = "SIMD_INT_128", .desc = "128 bit SIMD operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x12, .numasks = LIBPFM_ARRAY_SIZE(wsm_simd_int_128), .ngrp = 1, .umasks = wsm_simd_int_128, }, { .name = "OFFCORE_REQUESTS_OUTSTANDING", .desc = "Outstanding offcore requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x1, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(wsm_offcore_requests_outstanding), .ngrp = 1, .umasks = wsm_offcore_requests_outstanding, }, { .name = "MEM_STORE_RETIRED", .desc = "Retired stores", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xc, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_mem_store_retired), .ngrp = 1, .umasks = wsm_mem_store_retired, }, { .name = "INST_DECODED", .desc = "Instructions decoded", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x18, .numasks = LIBPFM_ARRAY_SIZE(wsm_inst_decoded), .ngrp = 1, .umasks = wsm_inst_decoded, }, { .name = "MACRO_INSTS_FUSIONS_DECODED", .desc = "Count the number of instructions decoded that are macros-fused but not necessarily executed or retired", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1a6, }, { .name = "MACRO_INSTS", .desc = "Macro-instructions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd0, .numasks = LIBPFM_ARRAY_SIZE(wsm_macro_insts), .ngrp = 1, .umasks = wsm_macro_insts, }, { .name = "PARTIAL_ADDRESS_ALIAS", .desc = "False dependencies due to partial address aliasing", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x107, }, { .name = "ARITH", .desc = "Counts arithmetic multiply and divide operations", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x14, .numasks = LIBPFM_ARRAY_SIZE(wsm_arith), .ngrp = 1, .umasks = wsm_arith, }, { .name = "L2_TRANSACTIONS", .desc = "L2 transactions", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf0, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_transactions), .ngrp = 1, .umasks = wsm_l2_transactions, }, { .name = "INST_QUEUE_WRITES", .desc = "Instructions written to instruction queue.", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x117, }, { .name = "SB_DRAIN", .desc = "Store buffer", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(wsm_sb_drain), .ngrp = 1, .umasks = wsm_sb_drain, }, { .name = "LOAD_HIT_PRE", .desc = "Load operations conflicting with software prefetches", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x14c, }, { .name = "MEM_UNCORE_RETIRED", .desc = "Load instructions retired (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xf, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_mem_uncore_retired), .ngrp = 1, .umasks = wsm_mem_uncore_retired, }, { .name = "L2_DATA_RQSTS", .desc = "All L2 data requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_data_rqsts), .ngrp = 1, .umasks = wsm_l2_data_rqsts, }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x88, .numasks = LIBPFM_ARRAY_SIZE(wsm_br_inst_exec), .ngrp = 1, .umasks = wsm_br_inst_exec, }, { .name = "ITLB_MISS_RETIRED", .desc = "Retired instructions that missed the ITLB (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x20c8, .flags= INTEL_X86_PEBS, }, { .name = "BPU_MISSED_CALL_RET", .desc = "Branch prediction unit missed call or return", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1e5, }, { .name = "SNOOPQ_REQUESTS_OUTSTANDING", .desc = "Outstanding snoop requests", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x1, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(wsm_snoopq_requests_outstanding), .ngrp = 1, .umasks = wsm_snoopq_requests_outstanding, }, { .name = "MEM_LOAD_RETIRED", .desc = "Memory loads retired (Precise Event)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xcb, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(wsm_mem_load_retired), .ngrp = 1, .umasks = wsm_mem_load_retired, }, { .name = "L1I", .desc = "L1I instruction fetch", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(wsm_l1i), .ngrp = 1, .umasks = wsm_l1i, }, { .name = "L2_WRITE", .desc = "L2 demand lock/store RFO", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(wsm_l2_write), .ngrp = 1, .umasks = wsm_l2_write, }, { .name = "SNOOP_RESPONSE", .desc = "Snoop", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xb8, .numasks = LIBPFM_ARRAY_SIZE(wsm_snoop_response), .ngrp = 1, .umasks = wsm_snoop_response, }, { .name = "L1D", .desc = "L1D cache", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0x3, .code = 0x51, .numasks = LIBPFM_ARRAY_SIZE(wsm_l1d), .ngrp = 1, .umasks = wsm_l1d, }, { .name = "RESOURCE_STALLS", .desc = "Resource related stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xa2, .numasks = LIBPFM_ARRAY_SIZE(wsm_resource_stalls), .ngrp = 1, .umasks = wsm_resource_stalls, }, { .name = "RAT_STALLS", .desc = "All RAT stall cycles", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0xd2, .numasks = LIBPFM_ARRAY_SIZE(wsm_rat_stalls), .ngrp = 1, .umasks = wsm_rat_stalls, }, { .name = "CPU_CLK_UNHALTED", .desc = "Cycles when processor is not in halted state", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(wsm_cpu_clk_unhalted), .ngrp = 1, .umasks = wsm_cpu_clk_unhalted, }, { .name = "L1D_WB_L2", .desc = "L1D writebacks to L2", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(wsm_l1d_wb_l2), .ngrp = 1, .umasks = wsm_l1d_wb_l2, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", .modmsk = INTEL_V3_ATTRS, .equiv = "BR_MISP_RETIRED:ALL_BRANCHES", .cntmsk = 0xf, .code = 0xc5, }, { .name = "THREAD_ACTIVE", .desc = "Cycles thread is active", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1ec, }, { .name = "UOP_UNFUSION", .desc = "Counts unfusion events due to floating point exception to a fused uop", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1db, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response 0 (must provide at least one request and one response umasks)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(wsm_offcore_response_0), .ngrp = 2, .umasks = wsm_offcore_response_0, }, { .name = "OFFCORE_RESPONSE_1", .desc = "Offcore response 1 (must provide at least one request and one response umasks)", .modmsk = INTEL_V3_ATTRS, .cntmsk = 0xf, .code = 0x1bb, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(wsm_offcore_response_0), .ngrp = 2, .umasks = wsm_offcore_response_0, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/intel_knc_events.h0000600003276200002170000003546012247131123022047 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: knc (Intel Knights Corners) */ static const intel_x86_entry_t intel_knc_pe[]={ { .name = "BANK_CONFLICTS", .desc = "Number of actual bank conflicts", .code = 0xa, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "BRANCHES", .desc = "Number of taken and not taken branches, including: conditional branches, jumps, calls, returns, software interrupts, and interrupt returns", .code = 0x12, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "BRANCHES_MISPREDICTED", .desc = "Number of branch mispredictions that occurred on BTB hits. BTB misses are not considered branch mispredicts because no prediction exists for them yet.", .code = 0x2b, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "CODE_CACHE_MISS", .desc = "Number of instruction reads that miss the internal code cache; whether the read is cacheable or noncacheable", .code = 0xe, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "CODE_PAGE_WALK", .desc = "Number of code page walks", .code = 0xd, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "CODE_READ", .desc = "Number of instruction reads; whether the read is cacheable or noncacheable", .code = 0xc, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "CPU_CLK_UNHALTED", .desc = "Number of cycles during which the processor is not halted.", .code = 0x2a, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_CACHE_LINES_WRITTEN_BACK", .desc = "Number of dirty lines (all) that are written back, regardless of the cause", .code = 0x6, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_PAGE_WALK", .desc = "Number of data page walks", .code = 0x2, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_READ", .desc = "Number of successful memory data reads committed by the K-unit (L1). Cache accesses resulting from prefetch instructions are included for A0 stepping.", .code = 0x0, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_READ_MISS", .desc = "Number of memory read accesses that miss the internal data cache whether or not the access is cacheable or noncacheable. Cache accesses resulting from prefetch instructions are not included.", .code = 0x3, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_READ_MISS_OR_WRITE_MISS", .desc = "Number of memory read and/or write accesses that miss the internal data cache, whether or not the access is cacheable or noncacheable", .code = 0x29, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_READ_OR_WRITE", .desc = "Number of memory data reads and/or writes (internal data cache hit and miss combined). Read cache accesses resulting from prefetch instructions are included for A0 stepping.", .code = 0x28, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_WRITE", .desc = "Number of successful memory data writes committed by the K-unit (L1). Streaming stores (hit/miss L1), cacheable write partials, and UC promotions are all included.", .code = 0x1, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "DATA_WRITE_MISS", .desc = "Number of memory write accesses that miss the internal data cache whether or not the access is cacheable. Non-cacheable misses are not included.", .code = 0x4, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "EXEC_STAGE_CYCLES", .desc = "Number of E-stage cycles that were successfully completed. Includes cycles generated by multi-cycle E-stage instructions. For instructions destined for the FPU or VPU pipelines, this event only counts occupancy in the integer E-stage.", .code = 0x2e, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "FE_STALLED", .desc = "Number of cycles where the front-end could not advance. Any multi-cycle instructions which delay pipeline advance and apply backpressure to the front-end will be included, e.g. read-modify-write instructions. Includes cycles when the front-end did not have any instructions to issue.", .code = 0x2d, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "INSTRUCTIONS_EXECUTED", .desc = "Number of instructions executed (up to two per clock)", .code = 0x16, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "INSTRUCTIONS_EXECUTED_V_PIPE", .desc = "Number of instructions executed in the V_pipe. The event indicates the number of instructions that were paired.", .code = 0x17, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L1_DATA_HIT_INFLIGHT_PF1", .desc = "Number of data requests which hit an in-flight vprefetch0. The in-flight vprefetch0 was not necessarily issued from the same thread as the data request.", .code = 0x20, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L1_DATA_PF1", .desc = "Number of data vprefetch0 requests seen by the L1.", .code = 0x11, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L1_DATA_PF1_DROP", .desc = "Number of data vprefetch0 requests seen by the L1 which were dropped for any reason. A vprefetch0 can be dropped if the requested address matches another in-flight request or if it has a UC memtype.", .code = 0x1e, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L1_DATA_PF1_MISS", .desc = "Number of data vprefetch0 requests seen by the L1 which missed L1. Does not include vprefetch1 requests which are counted in L1_DATA_PF1_DROP.", .code = 0x1c, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L1_DATA_PF2", .desc = "Number of data vprefetch1 requests seen by the L1. This is not necessarily the same number as seen by the L2 because this count includes requests that are dropped by the core. A vprefetch1 can be dropped by the core if the requested address matches another in-flight request or if it has a UC memtype.", .code = 0x37, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_CODE_READ_MISS_CACHE_FILL", .desc = "Number of code read accesses that missed the L2 cache and were satisfied by another L2 cache. Can include promoted read misses that started as DATA accesses.", .code = 0x10f0, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_CODE_READ_MISS_MEM_FILL", .desc = "Number of code read accesses that missed the L2 cache and were satisfied by main memory. Can include promoted read misses that started as DATA accesses.", .code = 0x10f5, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_HIT_INFLIGHT_PF2", .desc = "Number of data requests which hit an in-flight vprefetch1. The in-flight vprefetch1 was not necessarily issued from the same thread as the data request.", .code = 0x10ff, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_PF1_MISS", .desc = "Number of data vprefetch0 requests seen by the L2 which missed L2.", .code = 0x38, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_PF2", .desc = "Number of data vprefetch1 requests seen by the L2. Only counts vprefetch1 hits on A0 stepping.", .code = 0x10fc, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_PF2_DROP", .desc = "Number of data vprefetch1 requests seen by the L2 which were dropped for any reason.", .code = 0x10fd, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_PF2_MISS", .desc = "Number of data vprefetch1 requests seen by the L2 which missed L2. Does not include vprefetch2 requests which are counted in L2_DATA_PF2_DROP.", .code = 0x10fe, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_READ_MISS_CACHE_FILL", .desc = "Number of data read accesses that missed the L2 cache and were satisfied by another L2 cache. Can include promoted read misses that started as CODE accesses.", .code = 0x10f1, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_READ_MISS_MEM_FILL", .desc = "Number of data read accesses that missed the L2 cache and were satisfied by main memory. Can include promoted read misses that started as CODE accesses.", .code = 0x10f6, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_WRITE_MISS_CACHE_FILL", .desc = "Number of data write (RFO) accesses that missed the L2 cache and were satisfied by another L2 cache.", .code = 0x10f2, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_DATA_WRITE_MISS_MEM_FILL", .desc = "Number of data write (RFO) accesses that missed the L2 cache and were satisfied by main memory.", .code = 0x10f7, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_READ_HIT_E", .desc = "L2 Read Hit E State, may include prefetches on A0 stepping.", .code = 0x10c8, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_READ_HIT_M", .desc = "L2 Read Hit M State", .code = 0x10c9, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_READ_HIT_S", .desc = "L2 Read Hit S State", .code = 0x10ca, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_READ_MISS", .desc = "L2 Read Misses. Prefetch and demand requests to the same address will produce double counting.", .code = 0x10cb, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_VICTIM_REQ_WITH_DATA", .desc = "L2 received a victim request and responded with data", .code = 0x10d7, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "L2_WRITE_HIT", .desc = "L2 Write HIT, may undercount on A0 stepping.", .code = 0x10cc, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "LONG_CODE_PAGE_WALK", .desc = "Number of long code page walks, i.e. page walks that also missed the L2 uTLB. Subset of DATA_CODE_WALK event", .code = 0x3b, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "LONG_DATA_PAGE_WALK", .desc = "Number of long data page walks, i.e. page walks that also missed the L2 uTLB. Subset of DATA_PAGE_WALK event", .code = 0x3a, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "MEMORY_ACCESSES_IN_BOTH_PIPES", .desc = "Number of data memory reads or writes that are paired in both pipes of the pipeline", .code = 0x9, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "MICROCODE_CYCLES", .desc = "The number of cycles microcode is executing. While microcode is executing, all other threads are stalled.", .code = 0x2c, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "PIPELINE_AGI_STALLS", .desc = "Number of address generation interlock (AGI) stalls. An AGI occurring in both the U- and V- pipelines in the same clock signals this event twice.", .code = 0x1f, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "PIPELINE_FLUSHES", .desc = "Number of pipeline flushes that occur. Pipeline flushes are caused by BTB misses on taken branches, mispredictions, exceptions, interrupts, and some segment descriptor loads.", .code = 0x15, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "PIPELINE_SG_AGI_STALLS", .desc = "Number of address generation interlock (AGI) stalls due to vscatter* and vgather* instructions.", .code = 0x21, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "SNP_HITM_BUNIT", .desc = "Snoop HITM in BUNIT", .code = 0x10e3, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "SNP_HITM_L2", .desc = "Snoop HITM in L2", .code = 0x10e7, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "SNP_HIT_L2", .desc = "Snoop HIT in L2", .code = 0x10e6, .cntmsk = 0x1, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_DATA_READ", .desc = "Number of read transactions that were issued. In general each read transaction will read 1 64B cacheline. If there are alignment issues, then reads against multiple cache lines will each be counted individually.", .code = 0x2000, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_DATA_READ_MISS", .desc = "VPU L1 data cache readmiss. Counts the number of occurrences.", .code = 0x2003, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_DATA_WRITE", .desc = "Number of write transactions that were issued. . In general each write transaction will write 1 64B cacheline. If there are alignment issues, then write against multiple cache lines will each be counted individually.", .code = 0x2001, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_DATA_WRITE_MISS", .desc = "VPU L1 data cache write miss. Counts the number of occurrences.", .code = 0x2004, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_ELEMENTS_ACTIVE", .desc = "Counts the cumulative number of elements active (via mask) for VPU instructions issued.", .code = 0x2018, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_INSTRUCTIONS_EXECUTED", .desc = "Counts the number of VPU instructions executed in both u- and v-pipes.", .code = 0x2016, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_INSTRUCTIONS_EXECUTED_V_PIPE", .desc = "Counts the number of VPU instructions that paired and executed in the v-pipe.", .code = 0x2017, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, { .name = "VPU_STALL_REG", .desc = "VPU stall on Register Dependency. Counts the number of occurrences. Dependencies will include RAW, WAW, WAR.", .code = 0x2005, .cntmsk = 0x3, .modmsk = INTEL_V3_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/arm_cortex_a15_events.h0000600003276200002170000002325112247131123022705 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * Contributed by Will Deacon * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Cortex A15 r2p0 * based on Table 11-6 from the "Cortex A15 Technical Reference Manual" */ static const arm_entry_t arm_cortex_a15_pe[]={ {.name = "SW_INCR", .modmsk = ARMV7_A15_ATTRS, .code = 0x00, .desc = "Instruction architecturally executed (condition check pass) Software increment" }, {.name = "L1I_CACHE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x01, .desc = "Level 1 instruction cache refill" }, {.name = "L1I_TLB_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x02, .desc = "Level 1 instruction TLB refill" }, {.name = "L1D_CACHE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x03, .desc = "Level 1 data cache refill" }, {.name = "L1D_CACHE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x04, .desc = "Level 1 data cache access" }, {.name = "L1D_TLB_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x05, .desc = "Level 1 data TLB refill" }, {.name = "INST_RETIRED", .modmsk = ARMV7_A15_ATTRS, .code = 0x08, .desc = "Instruction architecturally executed" }, {.name = "EXCEPTION_TAKEN", .modmsk = ARMV7_A15_ATTRS, .code = 0x09, .desc = "Exception taken" }, {.name = "EXCEPTION_RETURN", .modmsk = ARMV7_A15_ATTRS, .code = 0x0a, .desc = "Instruction architecturally executed (condition check pass) Exception return" }, {.name = "CID_WRITE_RETIRED", .modmsk = ARMV7_A15_ATTRS, .code = 0x0b, .desc = "Instruction architecturally executed (condition check pass) Write to CONTEXTIDR" }, {.name = "BRANCH_MISPRED", .modmsk = ARMV7_A15_ATTRS, .code = 0x10, .desc = "Mispredicted or not predicted branch speculatively executed" }, {.name = "CPU_CYCLES", .modmsk = ARMV7_A15_ATTRS, .code = 0x11, .desc = "Cycles" }, {.name = "BRANCH_PRED", .modmsk = ARMV7_A15_ATTRS, .code = 0x12, .desc = "Predictable branch speculatively executed" }, {.name = "DATA_MEM_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x13, .desc = "Data memory access" }, {.name = "L1I_CACHE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x14, .desc = "Level 1 instruction cache access" }, {.name = "L1D_CACHE_WB", .modmsk = ARMV7_A15_ATTRS, .code = 0x15, .desc = "Level 1 data cache WriteBack" }, {.name = "L2D_CACHE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x16, .desc = "Level 2 data cache access" }, {.name = "L2D_CACHE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x17, .desc = "Level 2 data cache refill" }, {.name = "L2D_CACHE_WB", .modmsk = ARMV7_A15_ATTRS, .code = 0x18, .desc = "Level 2 data cache WriteBack" }, {.name = "BUS_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x19, .desc = "Bus access" }, {.name = "LOCAL_MEMORY_ERROR", .modmsk = ARMV7_A15_ATTRS, .code = 0x1a, .desc = "Local memory error" }, {.name = "INST_SPEC_EXEC", .modmsk = ARMV7_A15_ATTRS, .code = 0x1b, .desc = "Instruction speculatively executed" }, {.name = "TTBR_WRITE_RETIRED", .modmsk = ARMV7_A15_ATTRS, .code = 0x1c, .desc = "Instruction architecturally executed (condition check pass) Write to translation table base" }, {.name = "BUS_CYCLES", .modmsk = ARMV7_A15_ATTRS, .code = 0x1d, .desc = "Bus cycle" }, {.name = "L1D_READ_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x40, .desc = "Level 1 data cache read access" }, {.name = "L1D_WRITE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x41, .desc = "Level 1 data cache write access" }, {.name = "L1D_READ_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x42, .desc = "Level 1 data cache read refill" }, {.name = "L1D_WRITE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x43, .desc = "Level 1 data cache write refill" }, {.name = "L1D_WB_VICTIM", .modmsk = ARMV7_A15_ATTRS, .code = 0x46, .desc = "Level 1 data cache writeback victim" }, {.name = "L1D_WB_CLEAN_COHERENCY", .modmsk = ARMV7_A15_ATTRS, .code = 0x47, .desc = "Level 1 data cache writeback cleaning and coherency" }, {.name = "L1D_INVALIDATE", .modmsk = ARMV7_A15_ATTRS, .code = 0x48, .desc = "Level 1 data cache invalidate" }, {.name = "L1D_TLB_READ_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x4c, .desc = "Level 1 data TLB read refill" }, {.name = "L1D_TLB_WRITE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x4d, .desc = "Level 1 data TLB write refill" }, {.name = "L2D_READ_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x50, .desc = "Level 2 data cache read access" }, {.name = "L2D_WRITE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x51, .desc = "Level 2 data cache write access" }, {.name = "L2D_READ_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x52, .desc = "Level 2 data cache read refill" }, {.name = "L2D_WRITE_REFILL", .modmsk = ARMV7_A15_ATTRS, .code = 0x53, .desc = "Level 2 data cache write refill" }, {.name = "L2D_WB_VICTIM", .modmsk = ARMV7_A15_ATTRS, .code = 0x56, .desc = "Level 2 data cache writeback victim" }, {.name = "L2D_WB_CLEAN_COHERENCY", .modmsk = ARMV7_A15_ATTRS, .code = 0x57, .desc = "Level 2 data cache writeback cleaning and coherency" }, {.name = "L2D_INVALIDATE", .modmsk = ARMV7_A15_ATTRS, .code = 0x58, .desc = "Level 2 data cache invalidate" }, {.name = "BUS_READ_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x60, .desc = "Bus read access" }, {.name = "BUS_WRITE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x61, .desc = "Bus write access" }, {.name = "BUS_NORMAL_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x62, .desc = "Bus normal access" }, {.name = "BUS_NOT_NORMAL_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x63, .desc = "Bus not normal access" }, {.name = "BUS_NORMAL_ACCESS_2", .modmsk = ARMV7_A15_ATTRS, .code = 0x64, .desc = "Bus normal access" }, {.name = "BUS_PERIPH_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x65, .desc = "Bus peripheral access" }, {.name = "DATA_MEM_READ_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x66, .desc = "Data memory read access" }, {.name = "DATA_MEM_WRITE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x67, .desc = "Data memory write access" }, {.name = "UNALIGNED_READ_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x68, .desc = "Unaligned read access" }, {.name = "UNALIGNED_WRITE_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x69, .desc = "Unaligned read access" }, {.name = "UNALIGNED_ACCESS", .modmsk = ARMV7_A15_ATTRS, .code = 0x6a, .desc = "Unaligned access" }, {.name = "INST_SPEC_EXEC_LDREX", .modmsk = ARMV7_A15_ATTRS, .code = 0x6c, .desc = "LDREX exclusive instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_STREX_PASS", .modmsk = ARMV7_A15_ATTRS, .code = 0x6d, .desc = "STREX pass exclusive instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_STREX_FAIL", .modmsk = ARMV7_A15_ATTRS, .code = 0x6e, .desc = "STREX fail exclusive instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_LOAD", .modmsk = ARMV7_A15_ATTRS, .code = 0x70, .desc = "Load instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_STORE", .modmsk = ARMV7_A15_ATTRS, .code = 0x71, .desc = "Store instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_LOAD_STORE", .modmsk = ARMV7_A15_ATTRS, .code = 0x72, .desc = "Load or store instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_INTEGER_INST", .modmsk = ARMV7_A15_ATTRS, .code = 0x73, .desc = "Integer data processing instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_SIMD", .modmsk = ARMV7_A15_ATTRS, .code = 0x74, .desc = "Advanced SIMD instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_VFP", .modmsk = ARMV7_A15_ATTRS, .code = 0x75, .desc = "VFP instruction speculatively executed" }, {.name = "INST_SPEC_EXEC_SOFT_PC", .modmsk = ARMV7_A15_ATTRS, .code = 0x76, .desc = "Software of the PC instruction speculatively executed" }, {.name = "BRANCH_SPEC_EXEC_IMM_BRANCH", .modmsk = ARMV7_A15_ATTRS, .code = 0x78, .desc = "Immediate branch speculatively executed" }, {.name = "BRANCH_SPEC_EXEC_RET", .modmsk = ARMV7_A15_ATTRS, .code = 0x79, .desc = "Return branch speculatively executed" }, {.name = "BRANCH_SPEC_EXEC_IND", .modmsk = ARMV7_A15_ATTRS, .code = 0x7a, .desc = "Indirect branch speculatively executed" }, {.name = "BARRIER_SPEC_EXEC_ISB", .modmsk = ARMV7_A15_ATTRS, .code = 0x7c, .desc = "ISB barrier speculatively executed" }, {.name = "BARRIER_SPEC_EXEC_DSB", .modmsk = ARMV7_A15_ATTRS, .code = 0x7d, .desc = "DSB barrier speculatively executed" }, {.name = "BARRIER_SPEC_EXEC_DMB", .modmsk = ARMV7_A15_ATTRS, .code = 0x7e, .desc = "DMB barrier speculatively executed" }, }; papi-5.3.0/src/libpfm4/lib/events/amd64_events_k8.h0000600003276200002170000011147512247131123021417 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Regenerated from previous version by: * * Copyright (c) 2006, 2007 Advanced Micro Devices, Inc. * Contributed by Ray Bryant * Contributed by Robert Richter * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: amd64_k8 (AMD64 K8) */ /* History * * Feb 10 2006 -- Ray Bryant, raybry@mpdtxmail.amd.com * * Brought event table up-to-date with the 3.85 (October 2005) version of the * "BIOS and Kernel Developer's Guide for the AMD Athlon[tm] 64 and * AMD Opteron[tm] Processors," AMD Publication # 26094. * * Dec 12 2007 -- Robert Richter, robert.richter@amd.com * * Updated to: BIOS and Kernel Developer's Guide for AMD NPT Family * 0Fh Processors, Publication # 32559, Revision: 3.08, Issue Date: * July 2007 * * Feb 26 2009 -- Robert Richter, robert.richter@amd.com * * Updates and fixes of some revision flags and descriptions according * to these documents: * BIOS and Kernel Developer's Guide, #26094, Revision: 3.30 * BIOS and Kernel Developer's Guide, #32559, Revision: 3.12 */ static const amd64_umask_t amd64_k8_dispatched_fpu[]={ { .uname = "OPS_ADD", .udesc = "Add pipe ops", .ucode = 0x1, }, { .uname = "OPS_MULTIPLY", .udesc = "Multiply pipe ops", .ucode = 0x2, }, { .uname = "OPS_STORE", .udesc = "Store pipe ops", .ucode = 0x4, }, { .uname = "OPS_ADD_PIPE_LOAD_OPS", .udesc = "Add pipe load ops", .ucode = 0x8, }, { .uname = "OPS_MULTIPLY_PIPE_LOAD_OPS", .udesc = "Multiply pipe load ops", .ucode = 0x10, }, { .uname = "OPS_STORE_PIPE_LOAD_OPS", .udesc = "Store pipe load ops", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All segments", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_locked_ops[]={ { .uname = "EXECUTED", .udesc = "The number of locked instructions executed", .ucode = 0x1, }, { .uname = "CYCLES_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in speculative phase", .ucode = 0x2, }, { .uname = "CYCLES_NON_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in non-speculative phase (including cache miss penalty)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to write-combining (WC) memory or WC buffer flushes to WB memory", .ucode = 0x2, }, { .uname = "STREAMING_STORE", .udesc = "Streaming store (SS) requests", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x83, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_data_cache_refills[]={ { .uname = "SYSTEM", .udesc = "Refill from System", .ucode = 0x1, }, { .uname = "L2_SHARED", .udesc = "Shared-state line from L2", .ucode = 0x2, }, { .uname = "L2_EXCLUSIVE", .udesc = "Exclusive-state line from L2", .ucode = 0x4, }, { .uname = "L2_OWNED", .udesc = "Owned-state line from L2", .ucode = 0x8, }, { .uname = "L2_MODIFIED", .udesc = "Modified-state line from L2", .ucode = 0x10, }, { .uname = "ALL", .udesc = "Shared, Exclusive, Owned, Modified State Refills", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_data_cache_refills_from_system[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "Invalid, Shared, Exclusive, Owned, Modified", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_scrubber_single_bit_ecc_errors[]={ { .uname = "SCRUBBER_ERROR", .udesc = "Scrubber error", .ucode = 0x1, }, { .uname = "PIGGYBACK_ERROR", .udesc = "Piggyback scrubber errors", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_dcache_misses_by_locked_instructions[]={ { .uname = "DATA_CACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .udesc = "Data cache misses by locked instructions", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_data_prefetches[]={ { .uname = "CANCELLED", .udesc = "Cancelled prefetches", .ucode = 0x1, }, { .uname = "ATTEMPTED", .udesc = "Prefetch attempts", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_system_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "ALL", .udesc = "Exclusive, Modified, Shared", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_quadwords_written_to_system[]={ { .uname = "QUADWORD_WRITE_TRANSFER", .udesc = "Quadword write transfer", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB fill (page table walks)", .ucode = 0x4, }, { .uname = "SNOOP", .udesc = "Tag snoop request", .ucode = 0x8, }, { .uname = "CANCELLED", .udesc = "Cancelled request", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All non-cancelled requests", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas event 41h does not)", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB page table walk", .ucode = 0x4, }, { .uname = "ALL", .udesc = "Instructions, Data, TLB walk", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_l2_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills (victims from L1 caches, TLB page table walks and data prefetches)", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_TILL_K8_REV_E, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system.", .ucode = 0x2, .uflags= AMD64_FL_K8_REV_F, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_K8_REV_F, }, }; static const amd64_umask_t amd64_k8_retired_mmx_and_fp_instructions[]={ { .uname = "X87", .udesc = "X87 instructions", .ucode = 0x1, }, { .uname = "MMX_AND_3DNOW", .udesc = "MMX and 3DNow! instructions", .ucode = 0x2, }, { .uname = "PACKED_SSE_AND_SSE2", .udesc = "Packed SSE and SSE2 instructions", .ucode = 0x4, }, { .uname = "SCALAR_SSE_AND_SSE2", .udesc = "Scalar SSE and SSE2 instructions", .ucode = 0x8, }, { .uname = "ALL", .udesc = "X87, MMX(TM), 3DNow!(TM), Scalar and Packed SSE and SSE2 instructions", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_retired_fastpath_double_op_instructions[]={ { .uname = "POSITION_0", .udesc = "With low op in position 0", .ucode = 0x1, }, { .uname = "POSITION_1", .udesc = "With low op in position 1", .ucode = 0x2, }, { .uname = "POSITION_2", .udesc = "With low op in position 2", .ucode = 0x4, }, { .uname = "ALL", .udesc = "With low op in position 0, 1, or 2", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_fpu_exceptions[]={ { .uname = "X87_RECLASS_MICROFAULTS", .udesc = "X87 reclass microfaults", .ucode = 0x1, }, { .uname = "SSE_RETYPE_MICROFAULTS", .udesc = "SSE retype microfaults", .ucode = 0x2, }, { .uname = "SSE_RECLASS_MICROFAULTS", .udesc = "SSE reclass microfaults", .ucode = 0x4, }, { .uname = "SSE_AND_X87_MICROTRAPS", .udesc = "SSE and x87 microtraps", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_dram_accesses_page[]={ { .uname = "HIT", .udesc = "Page hit", .ucode = 0x1, }, { .uname = "MISS", .udesc = "Page Miss", .ucode = 0x2, }, { .uname = "CONFLICT", .udesc = "Page Conflict", .ucode = 0x4, }, { .uname = "ALL", .udesc = "Page Hit, Miss, or Conflict", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_memory_controller_turnarounds[]={ { .uname = "CHIP_SELECT", .udesc = "DIMM (chip select) turnaround", .ucode = 0x1, }, { .uname = "READ_TO_WRITE", .udesc = "Read to write turnaround", .ucode = 0x2, }, { .uname = "WRITE_TO_READ", .udesc = "Write to read turnaround", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All Memory Controller Turnarounds", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_memory_controller_bypass[]={ { .uname = "HIGH_PRIORITY", .udesc = "Memory controller high priority bypass", .ucode = 0x1, }, { .uname = "LOW_PRIORITY", .udesc = "Memory controller low priority bypass", .ucode = 0x2, }, { .uname = "DRAM_INTERFACE", .udesc = "DRAM controller interface bypass", .ucode = 0x4, }, { .uname = "DRAM_QUEUE", .udesc = "DRAM controller queue bypass", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_sized_blocks[]={ { .uname = "32_BYTE_WRITES", .udesc = "32-byte Sized Writes", .ucode = 0x4, }, { .uname = "64_BYTE_WRITES", .udesc = "64-byte Sized Writes", .ucode = 0x8, }, { .uname = "32_BYTE_READS", .udesc = "32-byte Sized Reads", .ucode = 0x10, }, { .uname = "64_BYTE_READS", .udesc = "64-byte Sized Reads", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3c, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_thermal_status_and_ecc_errors[]={ { .uname = "CLKS_CPU_ACTIVE", .udesc = "Number of clocks CPU is active when HTC is active", .ucode = 0x1, .uflags= AMD64_FL_K8_REV_F, }, { .uname = "CLKS_CPU_INACTIVE", .udesc = "Number of clocks CPU clock is inactive when HTC is active", .ucode = 0x2, .uflags= AMD64_FL_K8_REV_F, }, { .uname = "CLKS_DIE_TEMP_TOO_HIGH", .udesc = "Number of clocks when die temperature is higher than the software high temperature threshold", .ucode = 0x4, .uflags= AMD64_FL_K8_REV_F, }, { .uname = "CLKS_TEMP_THRESHOLD_EXCEEDED", .udesc = "Number of clocks when high temperature threshold was exceeded", .ucode = 0x8, .uflags= AMD64_FL_K8_REV_F, }, { .uname = "DRAM_ECC_ERRORS", .udesc = "Number of correctable and Uncorrectable DRAM ECC errors", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x80, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_TILL_K8_REV_E, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x8f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_K8_REV_F, }, }; static const amd64_umask_t amd64_k8_cpu_io_requests_to_memory_io[]={ { .uname = "I_O_TO_I_O", .udesc = "I/O to I/O", .ucode = 0x1, }, { .uname = "I_O_TO_MEM", .udesc = "I/O to Mem", .ucode = 0x2, }, { .uname = "CPU_TO_I_O", .udesc = "CPU to I/O", .ucode = 0x4, }, { .uname = "CPU_TO_MEM", .udesc = "CPU to Mem", .ucode = 0x8, }, { .uname = "TO_REMOTE_NODE", .udesc = "To remote node", .ucode = 0x10, }, { .uname = "TO_LOCAL_NODE", .udesc = "To local node", .ucode = 0x20, }, { .uname = "FROM_REMOTE_NODE", .udesc = "From remote node", .ucode = 0x40, }, { .uname = "FROM_LOCAL_NODE", .udesc = "From local node", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_cache_block[]={ { .uname = "VICTIM_WRITEBACK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "DCACHE_LOAD_MISS", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "SHARED_ICACHE_REFILL", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "READ_TO_DIRTY", .udesc = "Change to Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_sized_commands[]={ { .uname = "NON_POSTED_WRITE_BYTE", .udesc = "NonPosted SzWr Byte (1-32 bytes) Legacy or mapped I/O, typically 1-4 bytes", .ucode = 0x1, }, { .uname = "NON_POSTED_WRITE_DWORD", .udesc = "NonPosted SzWr Dword (1-16 dwords) Legacy or mapped I/O, typically 1 dword", .ucode = 0x2, }, { .uname = "POSTED_WRITE_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes) Sub-cache-line DMA writes, size varies; also flushes of partially-filled Write Combining buffer", .ucode = 0x4, }, { .uname = "POSTED_WRITE_DWORD", .udesc = "Posted SzWr Dword (1-16 dwords) Block-oriented DMA writes, often cache-line sized; also processor Write Combining buffer flushes", .ucode = 0x8, }, { .uname = "READ_BYTE_4_BYTES", .udesc = "SzRd Byte (4 bytes) Legacy or mapped I/O", .ucode = 0x10, }, { .uname = "READ_DWORD_1_16_DWORDS", .udesc = "SzRd Dword (1-16 dwords) Block-oriented DMA reads, typically cache-line size", .ucode = 0x20, }, { .uname = "READ_MODIFY_WRITE", .udesc = "RdModWr", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_probe[]={ { .uname = "MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "HIT_DIRTY_NO_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_DISPLAY_REFRESH_READS", .udesc = "Upstream display refresh reads", .ucode = 0x10, }, { .uname = "UPSTREAM_NON_DISPLAY_REFRESH_READS", .udesc = "Upstream non-display refresh reads", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_TILL_K8_REV_C, }, { .uname = "UPSTREAM_WRITES", .udesc = "Upstream writes", .ucode = 0x40, .uflags= AMD64_FL_K8_REV_D, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_K8_REV_D, }, }; static const amd64_umask_t amd64_k8_gart[]={ { .uname = "APERTURE_HIT_FROM_CPU", .udesc = "GART aperture hit on access from CPU", .ucode = 0x1, }, { .uname = "APERTURE_HIT_FROM_IO", .udesc = "GART aperture hit on access from I/O", .ucode = 0x2, }, { .uname = "MISS", .udesc = "GART miss", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_k8_hypertransport_link0[]={ { .uname = "COMMAND_DWORD_SENT", .udesc = "Command dword sent", .ucode = 0x1, }, { .uname = "DATA_DWORD_SENT", .udesc = "Data dword sent", .ucode = 0x2, }, { .uname = "BUFFER_RELEASE_DWORD_SENT", .udesc = "Buffer release dword sent", .ucode = 0x4, }, { .uname = "NOP_DWORD_SENT", .udesc = "Nop dword sent (idle)", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_k8_pe[]={ { .name = "DISPATCHED_FPU", .desc = "Dispatched FPU Operations", .modmsk = AMD64_BASIC_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_dispatched_fpu), .ngrp = 1, .umasks = amd64_k8_dispatched_fpu, }, { .name = "CYCLES_NO_FPU_OPS_RETIRED", .desc = "Cycles with no FPU Ops Retired", .modmsk = AMD64_BASIC_ATTRS, .code = 0x1, }, { .name = "DISPATCHED_FPU_OPS_FAST_FLAG", .desc = "Dispatched Fast Flag FPU Operations", .modmsk = AMD64_BASIC_ATTRS, .code = 0x2, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_BASIC_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_segment_register_loads), .ngrp = 1, .umasks = amd64_k8_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline restart due to self-modifying code", .modmsk = AMD64_BASIC_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline restart due to probe hit", .modmsk = AMD64_BASIC_ATTRS, .code = 0x22, }, { .name = "LS_BUFFER_2_FULL_CYCLES", .desc = "LS Buffer 2 Full", .modmsk = AMD64_BASIC_ATTRS, .code = 0x23, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_BASIC_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_locked_ops), .ngrp = 1, .umasks = amd64_k8_locked_ops, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_BASIC_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_memory_requests), .ngrp = 1, .umasks = amd64_k8_memory_requests, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2 or System", .modmsk = AMD64_BASIC_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_data_cache_refills), .ngrp = 1, .umasks = amd64_k8_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_SYSTEM", .desc = "Data Cache Refills from System", .modmsk = AMD64_BASIC_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_k8_data_cache_refills_from_system, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_k8_data_cache_refills_from_system, /* identical to actual umasks list for this event */ }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "L1 DTLB Miss and L2 DTLB Hit", .modmsk = AMD64_BASIC_ATTRS, .code = 0x45, }, { .name = "L1_DTLB_AND_L2_DTLB_MISS", .desc = "L1 DTLB and L2 DTLB Miss", .modmsk = AMD64_BASIC_ATTRS, .code = 0x46, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x47, }, { .name = "MICROARCHITECTURAL_LATE_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Late Cancel of an Access", .modmsk = AMD64_BASIC_ATTRS, .code = 0x48, }, { .name = "MICROARCHITECTURAL_EARLY_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Early Cancel of an Access", .modmsk = AMD64_BASIC_ATTRS, .code = 0x49, }, { .name = "SCRUBBER_SINGLE_BIT_ECC_ERRORS", .desc = "Single-bit ECC Errors Recorded by Scrubber", .modmsk = AMD64_BASIC_ATTRS, .code = 0x4a, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_scrubber_single_bit_ecc_errors), .ngrp = 1, .umasks = amd64_k8_scrubber_single_bit_ecc_errors, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_BASIC_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_k8_prefetch_instructions_dispatched, }, { .name = "DCACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .desc = "DCACHE Misses by Locked Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_dcache_misses_by_locked_instructions), .ngrp = 1, .umasks = amd64_k8_dcache_misses_by_locked_instructions, }, { .name = "DATA_PREFETCHES", .desc = "Data Prefetcher", .modmsk = AMD64_BASIC_ATTRS, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_data_prefetches), .ngrp = 1, .umasks = amd64_k8_data_prefetches, }, { .name = "SYSTEM_READ_RESPONSES", .desc = "System Read Responses by Coherency State", .modmsk = AMD64_BASIC_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_system_read_responses), .ngrp = 1, .umasks = amd64_k8_system_read_responses, }, { .name = "QUADWORDS_WRITTEN_TO_SYSTEM", .desc = "Quadwords Written to System", .modmsk = AMD64_BASIC_ATTRS, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_quadwords_written_to_system), .ngrp = 1, .umasks = amd64_k8_quadwords_written_to_system, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_BASIC_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_requests_to_l2), .ngrp = 1, .umasks = amd64_k8_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_l2_cache_miss), .ngrp = 1, .umasks = amd64_k8_l2_cache_miss, }, { .name = "L2_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_BASIC_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_l2_fill_writeback), .ngrp = 1, .umasks = amd64_k8_l2_fill_writeback, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_BASIC_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_BASIC_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_BASIC_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_BASIC_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss and L2 ITLB Hit", .modmsk = AMD64_BASIC_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_BASIC_ATTRS, .code = 0x85, }, { .name = "PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE", .desc = "Pipeline Restart Due to Instruction Stream Probe", .modmsk = AMD64_BASIC_ATTRS, .code = 0x86, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_BASIC_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_BASIC_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_BASIC_ATTRS, .code = 0x89, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0x27, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_BASIC_ATTRS, .code = 0x76, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_BASIC_ATTRS, .code = 0xca, }, { .name = "RETIRED_MMX_AND_FP_INSTRUCTIONS", .desc = "Retired MMX/FP Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_retired_mmx_and_fp_instructions), .ngrp = 1, .umasks = amd64_k8_retired_mmx_and_fp_instructions, }, { .name = "RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS", .desc = "Retired Fastpath Double Op Instructions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_retired_fastpath_double_op_instructions), .ngrp = 1, .umasks = amd64_k8_retired_fastpath_double_op_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_BASIC_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_BASIC_ATTRS, .code = 0xcf, }, { .name = "DECODER_EMPTY", .desc = "Decoder Empty", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd0, }, { .name = "DISPATCH_STALLS", .desc = "Dispatch Stalls", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd1, }, { .name = "DISPATCH_STALL_FOR_BRANCH_ABORT", .desc = "Dispatch Stall for Branch Abort to Retire", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd2, }, { .name = "DISPATCH_STALL_FOR_SERIALIZATION", .desc = "Dispatch Stall for Serialization", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd3, }, { .name = "DISPATCH_STALL_FOR_SEGMENT_LOAD", .desc = "Dispatch Stall for Segment Load", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd4, }, { .name = "DISPATCH_STALL_FOR_REORDER_BUFFER_FULL", .desc = "Dispatch Stall for Reorder Buffer Full", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd5, }, { .name = "DISPATCH_STALL_FOR_RESERVATION_STATION_FULL", .desc = "Dispatch Stall for Reservation Station Full", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd6, }, { .name = "DISPATCH_STALL_FOR_FPU_FULL", .desc = "Dispatch Stall for FPU Full", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd7, }, { .name = "DISPATCH_STALL_FOR_LS_FULL", .desc = "Dispatch Stall for LS Full", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd8, }, { .name = "DISPATCH_STALL_WAITING_FOR_ALL_QUIET", .desc = "Dispatch Stall Waiting for All Quiet", .modmsk = AMD64_BASIC_ATTRS, .code = 0xd9, }, { .name = "DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RSYNC", .desc = "Dispatch Stall for Far Transfer or Resync to Retire", .modmsk = AMD64_BASIC_ATTRS, .code = 0xda, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_BASIC_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_fpu_exceptions), .ngrp = 1, .umasks = amd64_k8_fpu_exceptions, }, { .name = "DR0_BREAKPOINT_MATCHES", .desc = "DR0 Breakpoint Matches", .modmsk = AMD64_BASIC_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINT_MATCHES", .desc = "DR1 Breakpoint Matches", .modmsk = AMD64_BASIC_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINT_MATCHES", .desc = "DR2 Breakpoint Matches", .modmsk = AMD64_BASIC_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINT_MATCHES", .desc = "DR3 Breakpoint Matches", .modmsk = AMD64_BASIC_ATTRS, .code = 0xdf, }, { .name = "DRAM_ACCESSES_PAGE", .desc = "DRAM Accesses", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_dram_accesses_page), .ngrp = 1, .umasks = amd64_k8_dram_accesses_page, }, { .name = "MEMORY_CONTROLLER_PAGE_TABLE_OVERFLOWS", .desc = "Memory Controller Page Table Overflows", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe1, }, { .name = "MEMORY_CONTROLLER_TURNAROUNDS", .desc = "Memory Controller Turnarounds", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe3, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_memory_controller_turnarounds), .ngrp = 1, .umasks = amd64_k8_memory_controller_turnarounds, }, { .name = "MEMORY_CONTROLLER_BYPASS", .desc = "Memory Controller Bypass Counter Saturation", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_memory_controller_bypass), .ngrp = 1, .umasks = amd64_k8_memory_controller_bypass, }, { .name = "SIZED_BLOCKS", .desc = "Sized Blocks", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe5, .flags = AMD64_FL_K8_REV_D, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_sized_blocks), .ngrp = 1, .umasks = amd64_k8_sized_blocks, }, { .name = "THERMAL_STATUS_AND_ECC_ERRORS", .desc = "Thermal Status and ECC Errors", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe8, .flags = AMD64_FL_K8_REV_E, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_thermal_status_and_ecc_errors), .ngrp = 1, .umasks = amd64_k8_thermal_status_and_ecc_errors, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .modmsk = AMD64_BASIC_ATTRS, .code = 0xe9, .flags = AMD64_FL_K8_REV_E, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_k8_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK", .desc = "Cache Block Commands", .modmsk = AMD64_BASIC_ATTRS, .code = 0xea, .flags = AMD64_FL_K8_REV_E, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_cache_block), .ngrp = 1, .umasks = amd64_k8_cache_block, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .modmsk = AMD64_BASIC_ATTRS, .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_sized_commands), .ngrp = 1, .umasks = amd64_k8_sized_commands, }, { .name = "PROBE", .desc = "Probe Responses and Upstream Requests", .modmsk = AMD64_BASIC_ATTRS, .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_probe), .ngrp = 1, .umasks = amd64_k8_probe, }, { .name = "GART", .desc = "GART Events", .modmsk = AMD64_BASIC_ATTRS, .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_gart), .ngrp = 1, .umasks = amd64_k8_gart, }, { .name = "HYPERTRANSPORT_LINK0", .desc = "HyperTransport Link 0 Transmit Bandwidth", .modmsk = AMD64_BASIC_ATTRS, .code = 0xf6, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_hypertransport_link0), .ngrp = 1, .umasks = amd64_k8_hypertransport_link0, }, { .name = "HYPERTRANSPORT_LINK1", .desc = "HyperTransport Link 1 Transmit Bandwidth", .modmsk = AMD64_BASIC_ATTRS, .code = 0xf7, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_hypertransport_link0), .ngrp = 1, .umasks = amd64_k8_hypertransport_link0, /* identical to actual umasks list for this event */ }, { .name = "HYPERTRANSPORT_LINK2", .desc = "HyperTransport Link 2 Transmit Bandwidth", .modmsk = AMD64_BASIC_ATTRS, .code = 0xf8, .numasks = LIBPFM_ARRAY_SIZE(amd64_k8_hypertransport_link0), .ngrp = 1, .umasks = amd64_k8_hypertransport_link0, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/powerpc_events.h0000600003276200002170000000225512247131124021555 0ustar ralphundrgrad/* * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * powerpc_events.h */ #ifndef _POWERPC_EVENTS_H_ #define _POWERPC_EVENTS_H_ #define PME_INSTR_COMPLETED 1 #endif papi-5.3.0/src/libpfm4/lib/events/intel_coreduo_events.h0000600003276200002170000010631112247131123022726 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: coreduo (Intel Core Duo/Core Solo) */ static const intel_x86_umask_t coreduo_sse_prefetch[]={ { .uname = "NTA", .udesc = "Streaming SIMD Extensions (SSE) Prefetch NTA instructions executed", .ucode = 0x0, }, { .uname = "T1", .udesc = "SSE software prefetch instruction PREFE0xTCT1 retired", .ucode = 0x100, }, { .uname = "T2", .udesc = "SSE software prefetch instruction PREFE0xTCT2 retired", .ucode = 0x200, }, }; static const intel_x86_umask_t coreduo_l2_ads[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_l2_lines_in[]={ { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 0, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t coreduo_l2_ifetch[]={ { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 0, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 0, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 0, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, }; static const intel_x86_umask_t coreduo_l2_rqsts[]={ { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, .grpid = 0, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, .grpid = 0, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, .grpid = 0, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, .grpid = 0, }, { .uname = "SELF", .udesc = "This core", .ucode = 0x4000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, { .uname = "BOTH_CORES", .udesc = "Both cores", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, .grpid = 1, }, { .uname = "ANY", .udesc = "All inclusive", .ucode = 0x3000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 2, }, { .uname = "PREFETCH", .udesc = "Hardware prefetch only", .ucode = 0x1000, .uflags= INTEL_X86_NCOMBO, .grpid = 2, }, }; static const intel_x86_umask_t coreduo_thermal_trip[]={ { .uname = "CYCLES", .udesc = "Duration in a thermal trip based on the current core clock", .ucode = 0xc000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "TRIPS", .udesc = "Number of thermal trips", .ucode = 0xc000 | INTEL_X86_MOD_EDGE, .modhw = _INTEL_X86_ATTR_E, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_cpu_clk_unhalted[]={ { .uname = "CORE_P", .udesc = "Unhalted core cycles", .ucode = 0x0, }, { .uname = "NONHLT_REF_CYCLES", .udesc = "Non-halted bus cycles", .ucode = 0x100, }, { .uname = "SERIAL_EXECUTION_CYCLES", .udesc = "Non-halted bus cycles of this core executing code while the other core is halted", .ucode = 0x200, }, }; static const intel_x86_umask_t coreduo_dcache_cache_ld[]={ { .uname = "MESI", .udesc = "Any cacheline access", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "I_STATE", .udesc = "Invalid cacheline", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Shared cacheline", .ucode = 0x200, }, { .uname = "E_STATE", .udesc = "Exclusive cacheline", .ucode = 0x400, }, { .uname = "M_STATE", .udesc = "Modified cacheline", .ucode = 0x800, }, }; static const intel_x86_umask_t coreduo_sse_pre_miss[]={ { .uname = "NTA_MISS", .udesc = "PREFETCHNTA missed all caches", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T1_MISS", .udesc = "PREFETCHT1 missed all caches", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "T2_MISS", .udesc = "PREFETCHT2 missed all caches", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "STORES_MISS", .udesc = "SSE streaming store instruction missed all caches", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_bus_drdy_clocks[]={ { .uname = "THIS_AGENT", .udesc = "This agent", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL_AGENTS", .udesc = "Any agent on the bus", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_simd_int_instructions[]={ { .uname = "MUL", .udesc = "Number of SIMD Integer packed multiply instructions executed", .ucode = 0x100, }, { .uname = "SHIFT", .udesc = "Number of SIMD Integer packed shift instructions executed", .ucode = 0x200, }, { .uname = "PACK", .udesc = "Number of SIMD Integer pack operations instruction executed", .ucode = 0x400, }, { .uname = "UNPACK", .udesc = "Number of SIMD Integer unpack instructions executed", .ucode = 0x800, }, { .uname = "LOGICAL", .udesc = "Number of SIMD Integer packed logical instructions executed", .ucode = 0x1000, }, { .uname = "ARITHMETIC", .udesc = "Number of SIMD Integer packed arithmetic instructions executed", .ucode = 0x2000, }, }; static const intel_x86_umask_t coreduo_mmx_fp_trans[]={ { .uname = "TO_FP", .udesc = "Number of transitions from MMX to X87", .ucode = 0x0, }, { .uname = "TO_MMX", .udesc = "Number of transitions from X87 to MMX", .ucode = 0x100, }, }; static const intel_x86_umask_t coreduo_sse_instructions_retired[]={ { .uname = "SINGLE", .udesc = "Number of SSE/SSE2 single precision instructions retired (packed and scalar)", .ucode = 0x0, .uflags= INTEL_X86_NCOMBO, }, { .uname = "SCALAR_SINGLE", .udesc = "Number of SSE/SSE2 scalar single precision instructions retired", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "PACKED_DOUBLE", .udesc = "Number of SSE/SSE2 packed double percision instructions retired", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DOUBLE", .udesc = "Number of SSE/SSE2 scalar double percision instructions retired", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, { .uname = "INT_128", .udesc = "Number of SSE2 128 bit integer instructions retired", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_sse_comp_instructions_retired[]={ { .uname = "PACKED_SINGLE", .udesc = "Number of SSE/SSE2 packed single precision compute instructions retired (does not include AND, OR, XOR)", .ucode = 0x0, }, { .uname = "SCALAR_SINGLE", .udesc = "Number of SSE/SSE2 scalar single precision compute instructions retired (does not include AND, OR, XOR)", .ucode = 0x100, }, { .uname = "PACKED_DOUBLE", .udesc = "Number of SSE/SSE2 packed double precision compute instructions retired (does not include AND, OR, XOR)", .ucode = 0x200, }, { .uname = "SCALAR_DOUBLE", .udesc = "Number of SSE/SSE2 scalar double precision compute instructions retired (does not include AND, OR, XOR)", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t coreduo_fused_uops[]={ { .uname = "ALL", .udesc = "All fused uops retired", .ucode = 0x0, }, { .uname = "LOADS", .udesc = "Fused load uops retired", .ucode = 0x100, }, { .uname = "STORES", .udesc = "Fused load uops retired", .ucode = 0x200, }, }; static const intel_x86_umask_t coreduo_est_trans[]={ { .uname = "ANY", .udesc = "Any Intel Enhanced SpeedStep(R) Technology transitions", .ucode = 0x0, }, { .uname = "FREQ", .udesc = "Intel Enhanced SpeedStep Technology frequency transitions", .ucode = 0x1000, }, }; static const intel_x86_entry_t intel_coreduo_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Unhalted core cycles", .modmsk = INTEL_X86_ATTRS, .equiv = "CPU_CLK_UNHALTED:CORE_P", .cntmsk = 0x3, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles. Measures bus cycles", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x13c, .flags = INTEL_X86_FIXED, }, { .name = "INSTRUCTION_RETIRED", .desc = "Instructions retired", .modmsk = INTEL_X86_ATTRS, .equiv = "INSTR_RET", .cntmsk = 0x3, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_X86_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x3, .code = 0xc0, }, { .name = "LLC_REFERENCES", .desc = "Last level of cache references", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LAST_LEVEL_CACHE_REFERENCES", .desc = "This is an alias for LLC_REFERENCES", .modmsk = INTEL_X86_ATTRS, .equiv = "LLC_REFERENCES", .cntmsk = 0x3, .code = 0x4f2e, }, { .name = "LLC_MISSES", .desc = "Last level of cache misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x412e, }, { .name = "LAST_LEVEL_CACHE_MISSES", .desc = "This is an alias for LLC_MISSES", .modmsk = INTEL_X86_ATTRS, .equiv = "LLC_MISSES", .cntmsk = 0x3, .code = 0x412e, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Branch instructions retired", .modmsk = INTEL_X86_ATTRS, .equiv = "BR_INSTR_RET", .cntmsk = 0x3, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Mispredicted branch instruction retired", .modmsk = INTEL_X86_ATTRS, .equiv = "BR_MISPRED_RET", .cntmsk = 0x3, .code = 0xc5, }, { .name = "LD_BLOCKS", .desc = "Load operations delayed due to store buffer blocks. The preceding store may be blocked due to unknown address, unknown data, or conflict due to partial overlap between the load and store.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3, }, { .name = "SD_DRAINS", .desc = "Cycles while draining store buffers", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4, }, { .name = "MISALIGN_MEM_REF", .desc = "Misaligned data memory references (MOB splits of loads and stores).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x5, }, { .name = "SEG_REG_LOADS", .desc = "Segment register loads", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6, }, { .name = "SSE_PREFETCH", .desc = "Streaming SIMD Extensions (SSE) Prefetch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(coreduo_sse_prefetch), .ngrp = 1, .umasks = coreduo_sse_prefetch, }, { .name = "SSE_NTSTORES_RET", .desc = "SSE streaming store instruction retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x307, }, { .name = "FP_COMPS_OP_EXE", .desc = "FP computational Instruction executed. FADD, FSUB, FCOM, FMULs, MUL, IMUL, FDIVs, DIV, IDIV, FPREMs, FSQRT are included; but exclude FADD or FMUL used in the middle of a transcendental instruction.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x10, }, { .name = "FP_ASSIST", .desc = "FP exceptions experienced microcode assists", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x11, }, { .name = "MUL", .desc = "Multiply operations (a speculative count, including FP and integer multiplies).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x12, }, { .name = "DIV", .desc = "Divide operations (a speculative count, including FP and integer multiplies). ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x2, .code = 0x13, }, { .name = "CYCLES_DIV_BUSY", .desc = "Cycles the divider is busy ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0x14, }, { .name = "L2_ADS", .desc = "L2 Address strobes ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x21, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, }, { .name = "DBUS_BUSY", .desc = "Core cycle during which data buswas busy (increments by 4)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "DBUS_BUSY_RD", .desc = "Cycles data bus is busy transferring data to a core (increments by 4) ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x23, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_IN", .desc = "L2 cache lines allocated", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, }, { .name = "L2_M_LINES_IN", .desc = "L2 Modified-state cache lines allocated", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x25, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "L2_LINES_OUT", .desc = "L2 cache lines evicted ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x26, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "L2_M_LINES_OUT", .desc = "L2 Modified-state cache lines evicted ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x27, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "L2_IFETCH", .desc = "L2 instruction fetches from nstruction fetch unit (includes speculative fetches) ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ifetch), .ngrp = 2, .umasks = coreduo_l2_ifetch, }, { .name = "L2_LD", .desc = "L2 cache reads (includes speculation) ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ifetch), .ngrp = 2, .umasks = coreduo_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_ST", .desc = "L2 cache writes (includes speculation)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ifetch), .ngrp = 2, .umasks = coreduo_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "L2_RQSTS", .desc = "L2 cache reference requests ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_rqsts), .ngrp = 3, .umasks = coreduo_l2_rqsts, }, { .name = "L2_REJECT_CYCLES", .desc = "Cycles L2 is busy and rejecting new requests.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x30, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_rqsts), .ngrp = 3, .umasks = coreduo_l2_rqsts, /* identical to actual umasks list for this event */ }, { .name = "L2_NO_REQUEST_CYCLES", .desc = "Cycles there is no request to access L2.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x32, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_rqsts), .ngrp = 3, .umasks = coreduo_l2_rqsts, /* identical to actual umasks list for this event */ }, { .name = "EST_TRANS", .desc = "Intel Enhanced SpeedStep(R) Technology transitions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3a, .numasks= LIBPFM_ARRAY_SIZE(coreduo_est_trans), .ngrp = 1, .umasks = coreduo_est_trans, }, { .name = "THERMAL_TRIP", .desc = "Duration in a thermal trip based on the current core clock ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3b, .numasks = LIBPFM_ARRAY_SIZE(coreduo_thermal_trip), .ngrp = 1, .umasks = coreduo_thermal_trip, }, { .name = "CPU_CLK_UNHALTED", .desc = "Core cycles when core is not halted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x3c, .numasks = LIBPFM_ARRAY_SIZE(coreduo_cpu_clk_unhalted), .ngrp = 1, .umasks = coreduo_cpu_clk_unhalted, }, { .name = "DCACHE_CACHE_LD", .desc = "L1 cacheable data read operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(coreduo_dcache_cache_ld), .ngrp = 1, .umasks = coreduo_dcache_cache_ld, }, { .name = "DCACHE_CACHE_ST", .desc = "L1 cacheable data write operations", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(coreduo_dcache_cache_ld), .ngrp = 1, .umasks = coreduo_dcache_cache_ld, /* identical to actual umasks list for this event */ }, { .name = "DCACHE_CACHE_LOCK", .desc = "L1 cacheable lock read operations to invalid state", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(coreduo_dcache_cache_ld), .ngrp = 1, .umasks = coreduo_dcache_cache_ld, /* identical to actual umasks list for this event */ }, { .name = "DATA_MEM_REF", .desc = "L1 data read and writes of cacheable and non-cacheable types", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x143, }, { .name = "DATA_MEM_CACHE_REF", .desc = "L1 data cacheable read and write operations.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x244, }, { .name = "DCACHE_REPL", .desc = "L1 data cache line replacements", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf45, }, { .name = "DCACHE_M_REPL", .desc = "L1 data M-state cache line allocated", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x46, }, { .name = "DCACHE_M_EVICT", .desc = "L1 data M-state cache line evicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x47, }, { .name = "DCACHE_PEND_MISS", .desc = "Weighted cycles of L1 miss outstanding", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x48, }, { .name = "DTLB_MISS", .desc = "Data references that missed TLB", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x49, }, { .name = "SSE_PRE_MISS", .desc = "Streaming SIMD Extensions (SSE) instructions missing all cache levels", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(coreduo_sse_pre_miss), .ngrp = 1, .umasks = coreduo_sse_pre_miss, }, { .name = "L1_PREF_REQ", .desc = "L1 prefetch requests due to DCU cache misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4f, }, { .name = "BUS_REQ_OUTSTANDING", .desc = "Weighted cycles of cacheable bus data read requests. This event counts full-line read request from DCU or HW prefetcher, but not RFO, write, instruction fetches, or others.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_BNR_CLOCKS", .desc = "External bus cycles while BNR asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x61, }, { .name = "BUS_DRDY_CLOCKS", .desc = "External bus cycles while DRDY asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(coreduo_bus_drdy_clocks), .ngrp = 1, .umasks = coreduo_bus_drdy_clocks, }, { .name = "BUS_LOCKS_CLOCKS", .desc = "External bus cycles while bus lock signal asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_DATA_RCV", .desc = "External bus cycles while bus lock signal asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x4064, }, { .name = "BUS_TRANS_BRD", .desc = "Burst read bus transactions (data or code)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_RFO", .desc = "Completed read for ownership ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IFETCH", .desc = "Completed instruction fetch transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_INVAL", .desc = "Completed invalidate transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_PWR", .desc = "Completed partial write transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6a, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_P", .desc = "Completed partial transactions (include partial read + partial write + line write)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6b, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_IO", .desc = "Completed I/O transactions (read and write)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_lines_in), .ngrp = 2, .umasks = coreduo_l2_lines_in, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_DEF", .desc = "Completed defer transactions ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x206d, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_WB", .desc = "Completed writeback transactions from DCU (does not include L2 writebacks)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc067, .numasks = LIBPFM_ARRAY_SIZE(coreduo_bus_drdy_clocks), .ngrp = 1, .umasks = coreduo_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_BURST", .desc = "Completed burst transactions (full line transactions include reads, write, RFO, and writebacks) ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc06e, .numasks = LIBPFM_ARRAY_SIZE(coreduo_bus_drdy_clocks), .ngrp = 1, .umasks = coreduo_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_MEM", .desc = "Completed memory transactions. This includes Bus_Trans_Burst + Bus_Trans_P + Bus_Trans_Inval.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc06f, .numasks = LIBPFM_ARRAY_SIZE(coreduo_bus_drdy_clocks), .ngrp = 1, .umasks = coreduo_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_TRANS_ANY", .desc = "Any completed bus transactions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc070, .numasks = LIBPFM_ARRAY_SIZE(coreduo_bus_drdy_clocks), .ngrp = 1, .umasks = coreduo_bus_drdy_clocks, /* identical to actual umasks list for this event */ }, { .name = "BUS_SNOOPS", .desc = "External bus cycles while bus lock signal asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x77, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ifetch), .ngrp = 2, .umasks = coreduo_l2_ifetch, /* identical to actual umasks list for this event */ }, { .name = "DCU_SNOOP_TO_SHARE", .desc = "DCU snoops to share-state L1 cache line due to L1 misses ", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x178, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_NOT_IN_USE", .desc = "Number of cycles there is no transaction from the core", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(coreduo_l2_ads), .ngrp = 1, .umasks = coreduo_l2_ads, /* identical to actual umasks list for this event */ }, { .name = "BUS_SNOOP_STALL", .desc = "Number of bus cycles while bus snoop is stalled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x7e, }, { .name = "ICACHE_READS", .desc = "Number of instruction fetches from ICache, streaming buffers (both cacheable and uncacheable fetches)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x80, }, { .name = "ICACHE_MISSES", .desc = "Number of instruction fetch misses from ICache, streaming buffers.", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x81, }, { .name = "ITLB_MISSES", .desc = "Number of iITLB misses", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x85, }, { .name = "IFU_MEM_STALL", .desc = "Cycles IFU is stalled while waiting for data from memory", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x86, }, { .name = "ILD_STALL", .desc = "Number of instruction length decoder stalls (Counts number of LCP stalls)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x87, }, { .name = "BR_INST_EXEC", .desc = "Branch instruction executed (includes speculation).", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x88, }, { .name = "BR_MISSP_EXEC", .desc = "Branch instructions executed and mispredicted at execution (includes branches that do not have prediction or mispredicted)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x89, }, { .name = "BR_BAC_MISSP_EXEC", .desc = "Branch instructions executed that were mispredicted at front end", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8a, }, { .name = "BR_CND_EXEC", .desc = "Conditional branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8b, }, { .name = "BR_CND_MISSP_EXEC", .desc = "Conditional branch instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8c, }, { .name = "BR_IND_EXEC", .desc = "Indirect branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8d, }, { .name = "BR_IND_MISSP_EXEC", .desc = "Indirect branch instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8e, }, { .name = "BR_RET_EXEC", .desc = "Return branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x8f, }, { .name = "BR_RET_MISSP_EXEC", .desc = "Return branch instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x90, }, { .name = "BR_RET_BAC_MISSP_EXEC", .desc = "Return branch instructions executed that were mispredicted at the front end", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x91, }, { .name = "BR_CALL_EXEC", .desc = "Return call instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x92, }, { .name = "BR_CALL_MISSP_EXEC", .desc = "Return call instructions executed that were mispredicted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x93, }, { .name = "BR_IND_CALL_EXEC", .desc = "Indirect call branch instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0x94, }, { .name = "RESOURCE_STALL", .desc = "Cycles while there is a resource related stall (renaming, buffer entries) as seen by allocator", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xa2, }, { .name = "MMX_INSTR_EXEC", .desc = "Number of MMX instructions executed (does not include MOVQ and MOVD stores)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb0, }, { .name = "SIMD_INT_SAT_EXEC", .desc = "Number of SIMD Integer saturating instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb1, }, { .name = "SIMD_INT_INSTRUCTIONS", .desc = "Number of SIMD Integer instructions executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xb3, .numasks = LIBPFM_ARRAY_SIZE(coreduo_simd_int_instructions), .ngrp = 1, .umasks = coreduo_simd_int_instructions, }, { .name = "INSTR_RET", .desc = "Number of instruction retired (Macro fused instruction count as 2)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc0, }, { .name = "FP_COMP_INSTR_RET", .desc = "Number of FP compute instructions retired (X87 instruction or instruction that contain X87 operations)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x1, .code = 0xc1, }, { .name = "UOPS_RET", .desc = "Number of micro-ops retired (include fused uops)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc2, }, { .name = "SMC_DETECTED", .desc = "Number of times self-modifying code condition detected", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc3, }, { .name = "BR_INSTR_RET", .desc = "Number of branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc4, }, { .name = "BR_MISPRED_RET", .desc = "Number of mispredicted branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc5, }, { .name = "CYCLES_INT_MASKED", .desc = "Cycles while interrupt is disabled", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc6, }, { .name = "CYCLES_INT_PEDNING_MASKED", .desc = "Cycles while interrupt is disabled and interrupts are pending", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc7, }, { .name = "HW_INT_RX", .desc = "Number of hardware interrupts received", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc8, }, { .name = "BR_TAKEN_RET", .desc = "Number of taken branch instruction retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xc9, }, { .name = "BR_MISPRED_TAKEN_RET", .desc = "Number of taken and mispredicted branch instructions retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xca, }, { .name = "MMX_FP_TRANS", .desc = "Transitions from MMX (TM) Instructions to Floating Point Instructions", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(coreduo_mmx_fp_trans), .ngrp = 1, .umasks = coreduo_mmx_fp_trans, }, { .name = "MMX_ASSIST", .desc = "Number of EMMS executed", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xcd, }, { .name = "MMX_INSTR_RET", .desc = "Number of MMX instruction retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xce, }, { .name = "INSTR_DECODED", .desc = "Number of instruction decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd0, }, { .name = "ESP_UOPS", .desc = "Number of ESP folding instruction decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd7, }, { .name = "SSE_INSTRUCTIONS_RETIRED", .desc = "Number of SSE/SSE2 instructions retired (packed and scalar)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd8, .numasks = LIBPFM_ARRAY_SIZE(coreduo_sse_instructions_retired), .ngrp = 1, .umasks = coreduo_sse_instructions_retired, }, { .name = "SSE_COMP_INSTRUCTIONS_RETIRED", .desc = "Number of computational SSE/SSE2 instructions retired (does not include AND, OR, XOR)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xd9, .numasks = LIBPFM_ARRAY_SIZE(coreduo_sse_comp_instructions_retired), .ngrp = 1, .umasks = coreduo_sse_comp_instructions_retired, }, { .name = "FUSED_UOPS", .desc = "Fused uops retired", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xda, .numasks = LIBPFM_ARRAY_SIZE(coreduo_fused_uops), .ngrp = 1, .umasks = coreduo_fused_uops, }, { .name = "UNFUSION", .desc = "Number of unfusion events in the ROB (due to exception)", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xdb, }, { .name = "BR_INSTR_DECODED", .desc = "Branch instructions decoded", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe0, }, { .name = "BTB_MISSES", .desc = "Number of branches the BTB did not produce a prediction", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe2, }, { .name = "BR_BOGUS", .desc = "Number of bogus branches", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe4, }, { .name = "BACLEARS", .desc = "Number of BAClears asserted", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xe6, }, { .name = "PREF_RQSTS_UP", .desc = "Number of hardware prefetch requests issued in forward streams", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf0, }, { .name = "PREF_RQSTS_DN", .desc = "Number of hardware prefetch requests issued in backward streams", .modmsk = INTEL_X86_ATTRS, .cntmsk = 0x3, .code = 0xf8, }, }; papi-5.3.0/src/libpfm4/lib/events/amd64_events_fam10h.h0000600003276200002170000021055212247131123022145 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Regenerated from previous version by: * Copyright (c) 2007 Advanced Micro Devices, Inc. * Contributed by Robert Richter * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: amd64_fam10h (AMD64 Fam10h) */ /* History * * May 28 2010 -- Robert Richter, robert.richter@amd.com: * * Update from: BIOS and Kernel Developer's Guide (BKDG) For AMD * Family 10h Processors, 31116 Rev 3.48 - April 22, 2010 * * Feb 06 2009 -- Robert Richter, robert.richter@amd.com: * * Update for Family 10h RevD (Istanbul) from: BIOS and Kernel * Developer's Guide (BKDG) For AMD Family 10h Processors, 31116 Rev * 3.20 - February 04, 2009 * This file has been automatically generated. * * Update for Family 10h RevC (Shanghai) from: BIOS and Kernel * Developer's Guide (BKDG) For AMD Family 10h Processors, 31116 Rev * 3.20 - February 04, 2009 * * * Dec 12 2007 -- Robert Richter, robert.richter@amd.com: * * Created from: BIOS and Kernel Developer's Guide (BKDG) For AMD * Family 10h Processors, 31116 Rev 3.00 - September 07, 2007 * PMU: amd64_fam10h (AMD64 Fam10h) */ static const amd64_umask_t amd64_fam10h_dispatched_fpu[]={ { .uname = "OPS_ADD", .udesc = "Add pipe ops excluding load ops and SSE move ops", .ucode = 0x1, }, { .uname = "OPS_MULTIPLY", .udesc = "Multiply pipe ops excluding load ops and SSE move ops", .ucode = 0x2, }, { .uname = "OPS_STORE", .udesc = "Store pipe ops excluding load ops and SSE move ops", .ucode = 0x4, }, { .uname = "OPS_ADD_PIPE_LOAD_OPS", .udesc = "Add pipe load ops and SSE move ops", .ucode = 0x8, }, { .uname = "OPS_MULTIPLY_PIPE_LOAD_OPS", .udesc = "Multiply pipe load ops and SSE move ops", .ucode = 0x10, }, { .uname = "OPS_STORE_PIPE_LOAD_OPS", .udesc = "Store pipe load ops and SSE move ops", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_sse_operations[]={ { .uname = "SINGLE_ADD_SUB_OPS", .udesc = "Single precision add/subtract ops", .ucode = 0x1, }, { .uname = "SINGLE_MUL_OPS", .udesc = "Single precision multiply ops", .ucode = 0x2, }, { .uname = "SINGLE_DIV_OPS", .udesc = "Single precision divide/square root ops", .ucode = 0x4, }, { .uname = "DOUBLE_ADD_SUB_OPS", .udesc = "Double precision add/subtract ops", .ucode = 0x8, }, { .uname = "DOUBLE_MUL_OPS", .udesc = "Double precision multiply ops", .ucode = 0x10, }, { .uname = "DOUBLE_DIV_OPS", .udesc = "Double precision divide/square root ops", .ucode = 0x20, }, { .uname = "OP_TYPE", .udesc = "Op type: 0=uops. 1=FLOPS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_move_ops[]={ { .uname = "LOW_QW_MOVE_UOPS", .udesc = "Merging low quadword move uops", .ucode = 0x1, }, { .uname = "HIGH_QW_MOVE_UOPS", .udesc = "Merging high quadword move uops", .ucode = 0x2, }, { .uname = "ALL_OTHER_MERGING_MOVE_UOPS", .udesc = "All other merging move uops", .ucode = 0x4, }, { .uname = "ALL_OTHER_MOVE_UOPS", .udesc = "All other move uops", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_serializing_ops[]={ { .uname = "SSE_BOTTOM_EXECUTING_UOPS", .udesc = "SSE bottom-executing uops retired", .ucode = 0x1, }, { .uname = "SSE_BOTTOM_SERIALIZING_UOPS", .udesc = "SSE bottom-serializing uops retired", .ucode = 0x2, }, { .uname = "X87_BOTTOM_EXECUTING_UOPS", .udesc = "X87 bottom-executing uops retired", .ucode = 0x4, }, { .uname = "X87_BOTTOM_SERIALIZING_UOPS", .udesc = "X87 bottom-serializing uops retired", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_fp_scheduler_cycles[]={ { .uname = "BOTTOM_EXECUTE_CYCLES", .udesc = "Number of cycles a bottom-execute uop is in the FP scheduler", .ucode = 0x1, }, { .uname = "BOTTOM_SERIALIZING_CYCLES", .udesc = "Number of cycles a bottom-serializing uop is in the FP scheduler", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_segment_register_loads[]={ { .uname = "ES", .udesc = "ES", .ucode = 0x1, }, { .uname = "CS", .udesc = "CS", .ucode = 0x2, }, { .uname = "SS", .udesc = "SS", .ucode = 0x4, }, { .uname = "DS", .udesc = "DS", .ucode = 0x8, }, { .uname = "FS", .udesc = "FS", .ucode = 0x10, }, { .uname = "GS", .udesc = "GS", .ucode = 0x20, }, { .uname = "HS", .udesc = "HS", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_locked_ops[]={ { .uname = "EXECUTED", .udesc = "The number of locked instructions executed", .ucode = 0x1, }, { .uname = "CYCLES_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in speculative phase", .ucode = 0x2, }, { .uname = "CYCLES_NON_SPECULATIVE_PHASE", .udesc = "The number of cycles spent in non-speculative phase (including cache miss penalty)", .ucode = 0x4, }, { .uname = "CYCLES_WAITING", .udesc = "The number of cycles waiting for a cache hit (cache miss penalty).", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cancelled_store_to_load_forward_operations[]={ { .uname = "ADDRESS_MISMATCHES", .udesc = "Address mismatches (starting byte not the same).", .ucode = 0x1, }, { .uname = "STORE_IS_SMALLER_THAN_LOAD", .udesc = "Store is smaller than load.", .ucode = 0x2, }, { .uname = "MISALIGNED", .udesc = "Misaligned.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_data_cache_refills[]={ { .uname = "SYSTEM", .udesc = "Refill from the Northbridge", .ucode = 0x1, }, { .uname = "L2_SHARED", .udesc = "Shared-state line from L2", .ucode = 0x2, }, { .uname = "L2_EXCLUSIVE", .udesc = "Exclusive-state line from L2", .ucode = 0x4, }, { .uname = "L2_OWNED", .udesc = "Owned-state line from L2", .ucode = 0x8, }, { .uname = "L2_MODIFIED", .udesc = "Modified-state line from L2", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_data_cache_refills_from_system[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_data_cache_lines_evicted[]={ { .uname = "INVALID", .udesc = "Invalid", .ucode = 0x1, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x2, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x10, }, { .uname = "BY_PREFETCHNTA", .udesc = "Cache line evicted was brought into the cache with by a PrefetchNTA instruction.", .ucode = 0x20, }, { .uname = "NOT_BY_PREFETCHNTA", .udesc = "Cache line evicted was not brought into the cache with by a PrefetchNTA instruction.", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_l1_dtlb_miss_and_l2_dtlb_hit[]={ { .uname = "L2_4K_TLB_HIT", .udesc = "L2 4K TLB hit", .ucode = 0x1, }, { .uname = "L2_2M_TLB_HIT", .udesc = "L2 2M TLB hit", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_TILL_FAM10H_REV_B, }, { .uname = "L2_1G_TLB_HIT", .udesc = "L2 1G TLB hit", .ucode = 0x4, .uflags= AMD64_FL_FAM10H_REV_C, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL | AMD64_FL_FAM10H_REV_C, }, }; static const amd64_umask_t amd64_fam10h_l1_dtlb_and_l2_dtlb_miss[]={ { .uname = "4K_TLB_RELOAD", .udesc = "4K TLB reload", .ucode = 0x1, }, { .uname = "2M_TLB_RELOAD", .udesc = "2M TLB reload", .ucode = 0x2, }, { .uname = "1G_TLB_RELOAD", .udesc = "1G TLB reload", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_scrubber_single_bit_ecc_errors[]={ { .uname = "SCRUBBER_ERROR", .udesc = "Scrubber error", .ucode = 0x1, }, { .uname = "PIGGYBACK_ERROR", .udesc = "Piggyback scrubber errors", .ucode = 0x2, }, { .uname = "LOAD_PIPE_ERROR", .udesc = "Load pipe error", .ucode = 0x4, }, { .uname = "STORE_WRITE_PIPE_ERROR", .udesc = "Store write pipe error", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_prefetch_instructions_dispatched[]={ { .uname = "LOAD", .udesc = "Load (Prefetch, PrefetchT0/T1/T2)", .ucode = 0x1, }, { .uname = "STORE", .udesc = "Store (PrefetchW)", .ucode = 0x2, }, { .uname = "NTA", .udesc = "NTA (PrefetchNTA)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_dcache_misses_by_locked_instructions[]={ { .uname = "DATA_CACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .udesc = "Data cache misses by locked instructions", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_l1_dtlb_hit[]={ { .uname = "L1_4K_TLB_HIT", .udesc = "L1 4K TLB hit", .ucode = 0x1, }, { .uname = "L1_2M_TLB_HIT", .udesc = "L1 2M TLB hit", .ucode = 0x2, }, { .uname = "L1_1G_TLB_HIT", .udesc = "L1 1G TLB hit", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_ineffective_sw_prefetches[]={ { .uname = "SW_PREFETCH_HIT_IN_L1", .udesc = "Software prefetch hit in the L1.", .ucode = 0x1, }, { .uname = "SW_PREFETCH_HIT_IN_L2", .udesc = "Software prefetch hit in L2.", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x9, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_requests[]={ { .uname = "NON_CACHEABLE", .udesc = "Requests to non-cacheable (UC) memory", .ucode = 0x1, }, { .uname = "WRITE_COMBINING", .udesc = "Requests to write-combining (WC) memory or WC buffer flushes to WB memory", .ucode = 0x2, }, { .uname = "STREAMING_STORE", .udesc = "Streaming store (SS) requests", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x83, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_data_prefetches[]={ { .uname = "CANCELLED", .udesc = "Cancelled prefetches", .ucode = 0x1, }, { .uname = "ATTEMPTED", .udesc = "Prefetch attempts", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_mab_requests[]={ { .uname = "BUFFER_0", .udesc = "Buffer 0", .ucode = 0x0, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_1", .udesc = "Buffer 1", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_2", .udesc = "Buffer 2", .ucode = 0x2, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_3", .udesc = "Buffer 3", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_4", .udesc = "Buffer 4", .ucode = 0x4, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_5", .udesc = "Buffer 5", .ucode = 0x5, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_6", .udesc = "Buffer 6", .ucode = 0x6, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_7", .udesc = "Buffer 7", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_8", .udesc = "Buffer 8", .ucode = 0x8, .uflags= AMD64_FL_NCOMBO, }, { .uname = "BUFFER_9", .udesc = "Buffer 9", .ucode = 0x9, .uflags= AMD64_FL_NCOMBO, }, }; static const amd64_umask_t amd64_fam10h_system_read_responses[]={ { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x1, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x2, }, { .uname = "SHARED", .udesc = "Shared", .ucode = 0x4, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x8, }, { .uname = "DATA_ERROR", .udesc = "Data Error", .ucode = 0x10, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_quadwords_written_to_system[]={ { .uname = "QUADWORD_WRITE_TRANSFER", .udesc = "Octword write transfer", .ucode = 0x1, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x1, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_requests_to_l2[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB fill (page table walks)", .ucode = 0x4, }, { .uname = "SNOOP", .udesc = "Tag snoop request", .ucode = 0x8, }, { .uname = "CANCELLED", .udesc = "Cancelled request", .ucode = 0x10, }, { .uname = "HW_PREFETCH_FROM_DC", .udesc = "Hardware prefetch from DC", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_l2_cache_miss[]={ { .uname = "INSTRUCTIONS", .udesc = "IC fill", .ucode = 0x1, }, { .uname = "DATA", .udesc = "DC fill (includes possible replays, whereas EventSelect 041h does not)", .ucode = 0x2, }, { .uname = "TLB_WALK", .udesc = "TLB page table walk", .ucode = 0x4, }, { .uname = "HW_PREFETCH_FROM_DC", .udesc = "Hardware prefetch from DC", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_l2_fill_writeback[]={ { .uname = "L2_FILLS", .udesc = "L2 fills (victims from L1 caches, TLB page table walks and data prefetches)", .ucode = 0x1, }, { .uname = "L2_WRITEBACKS", .udesc = "L2 Writebacks to system.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_l1_itlb_miss_and_l2_itlb_miss[]={ { .uname = "4K_PAGE_FETCHES", .udesc = "Instruction fetches to a 4K page.", .ucode = 0x1, }, { .uname = "2M_PAGE_FETCHES", .udesc = "Instruction fetches to a 2M page.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_instruction_cache_lines_invalidated[]={ { .uname = "INVALIDATING_PROBE_NO_IN_FLIGHT", .udesc = "Invalidating probe that did not hit any in-flight instructions.", .ucode = 0x1, }, { .uname = "INVALIDATING_PROBE_ONE_OR_MORE_IN_FLIGHT", .udesc = "Invalidating probe that hit one or more in-flight instructions.", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_mmx_and_fp_instructions[]={ { .uname = "X87", .udesc = "X87 instructions", .ucode = 0x1, }, { .uname = "MMX_AND_3DNOW", .udesc = "MMX and 3DNow! instructions", .ucode = 0x2, }, { .uname = "PACKED_SSE_AND_SSE2", .udesc = "SSE instructions (SSE, SSE2, SSE3, and SSE4A)", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_fastpath_double_op_instructions[]={ { .uname = "POSITION_0", .udesc = "With low op in position 0", .ucode = 0x1, }, { .uname = "POSITION_1", .udesc = "With low op in position 1", .ucode = 0x2, }, { .uname = "POSITION_2", .udesc = "With low op in position 2", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_fpu_exceptions[]={ { .uname = "X87_RECLASS_MICROFAULTS", .udesc = "X87 reclass microfaults", .ucode = 0x1, }, { .uname = "SSE_RETYPE_MICROFAULTS", .udesc = "SSE retype microfaults", .ucode = 0x2, }, { .uname = "SSE_RECLASS_MICROFAULTS", .udesc = "SSE reclass microfaults", .ucode = 0x4, }, { .uname = "SSE_AND_X87_MICROTRAPS", .udesc = "SSE and x87 microtraps", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_dram_accesses_page[]={ { .uname = "HIT", .udesc = "DCT0 Page hit", .ucode = 0x1, }, { .uname = "MISS", .udesc = "DCT0 Page Miss", .ucode = 0x2, }, { .uname = "CONFLICT", .udesc = "DCT0 Page Conflict", .ucode = 0x4, }, { .uname = "DCT1_PAGE_HIT", .udesc = "DCT1 Page hit", .ucode = 0x8, }, { .uname = "DCT1_PAGE_MISS", .udesc = "DCT1 Page Miss", .ucode = 0x10, }, { .uname = "DCT1_PAGE_CONFLICT", .udesc = "DCT1 Page Conflict", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_controller_page_table_overflows[]={ { .uname = "DCT0_PAGE_TABLE_OVERFLOW", .udesc = "DCT0 Page Table Overflow", .ucode = 0x1, }, { .uname = "DCT1_PAGE_TABLE_OVERFLOW", .udesc = "DCT1 Page Table Overflow", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_controller_slot_misses[]={ { .uname = "DCT0_COMMAND_SLOTS_MISSED", .udesc = "DCT0 Command Slots Missed", .ucode = 0x1, }, { .uname = "DCT1_COMMAND_SLOTS_MISSED", .udesc = "DCT1 Command Slots Missed", .ucode = 0x2, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_controller_turnarounds[]={ { .uname = "CHIP_SELECT", .udesc = "DCT0 DIMM (chip select) turnaround", .ucode = 0x1, }, { .uname = "READ_TO_WRITE", .udesc = "DCT0 Read to write turnaround", .ucode = 0x2, }, { .uname = "WRITE_TO_READ", .udesc = "DCT0 Write to read turnaround", .ucode = 0x4, }, { .uname = "DCT1_DIMM", .udesc = "DCT1 DIMM (chip select) turnaround", .ucode = 0x8, }, { .uname = "DCT1_READ_TO_WRITE_TURNAROUND", .udesc = "DCT1 Read to write turnaround", .ucode = 0x10, }, { .uname = "DCT1_WRITE_TO_READ_TURNAROUND", .udesc = "DCT1 Write to read turnaround", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_controller_bypass[]={ { .uname = "HIGH_PRIORITY", .udesc = "Memory controller high priority bypass", .ucode = 0x1, }, { .uname = "LOW_PRIORITY", .udesc = "Memory controller medium priority bypass", .ucode = 0x2, }, { .uname = "DRAM_INTERFACE", .udesc = "DCT0 DCQ bypass", .ucode = 0x4, }, { .uname = "DRAM_QUEUE", .udesc = "DCT1 DCQ bypass", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_thermal_status_and_ecc_errors[]={ { .uname = "CLKS_DIE_TEMP_TOO_HIGH", .udesc = "Number of times the HTC trip point is crossed", .ucode = 0x4, }, { .uname = "CLKS_TEMP_THRESHOLD_EXCEEDED", .udesc = "Number of clocks when STC trip point active", .ucode = 0x8, }, { .uname = "STC_TRIP_POINTS_CROSSED", .udesc = "Number of times the STC trip point is crossed", .ucode = 0x10, }, { .uname = "CLOCKS_HTC_P_STATE_INACTIVE", .udesc = "Number of clocks HTC P-state is inactive.", .ucode = 0x20, }, { .uname = "CLOCKS_HTC_P_STATE_ACTIVE", .udesc = "Number of clocks HTC P-state is active", .ucode = 0x40, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7c, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cpu_io_requests_to_memory_io[]={ { .uname = "I_O_TO_I_O", .udesc = "IO to IO", .ucode = 0x1, }, { .uname = "I_O_TO_MEM", .udesc = "IO to Mem", .ucode = 0x2, }, { .uname = "CPU_TO_I_O", .udesc = "CPU to IO", .ucode = 0x4, }, { .uname = "CPU_TO_MEM", .udesc = "CPU to Mem", .ucode = 0x8, }, { .uname = "TO_REMOTE_NODE", .udesc = "To remote node", .ucode = 0x10, }, { .uname = "TO_LOCAL_NODE", .udesc = "To local node", .ucode = 0x20, }, { .uname = "FROM_REMOTE_NODE", .udesc = "From remote node", .ucode = 0x40, }, { .uname = "FROM_LOCAL_NODE", .udesc = "From local node", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cache_block[]={ { .uname = "VICTIM_WRITEBACK", .udesc = "Victim Block (Writeback)", .ucode = 0x1, }, { .uname = "DCACHE_LOAD_MISS", .udesc = "Read Block (Dcache load miss refill)", .ucode = 0x4, }, { .uname = "SHARED_ICACHE_REFILL", .udesc = "Read Block Shared (Icache refill)", .ucode = 0x8, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read Block Modified (Dcache store miss refill)", .ucode = 0x10, }, { .uname = "READ_TO_DIRTY", .udesc = "Change-to-Dirty (first store to clean block already in cache)", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3d, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_sized_commands[]={ { .uname = "NON_POSTED_WRITE_BYTE", .udesc = "Non-Posted SzWr Byte (1-32 bytes) Legacy or mapped IO, typically 1-4 bytes", .ucode = 0x1, }, { .uname = "NON_POSTED_WRITE_DWORD", .udesc = "Non-Posted SzWr DW (1-16 dwords) Legacy or mapped IO, typically 1 DWORD", .ucode = 0x2, }, { .uname = "POSTED_WRITE_BYTE", .udesc = "Posted SzWr Byte (1-32 bytes) Subcache-line DMA writes, size varies; also flushes of partially-filled Write Combining buffer", .ucode = 0x4, }, { .uname = "POSTED_WRITE_DWORD", .udesc = "Posted SzWr DW (1-16 dwords) Block-oriented DMA writes, often cache-line sized; also processor Write Combining buffer flushes", .ucode = 0x8, }, { .uname = "READ_BYTE_4_BYTES", .udesc = "SzRd Byte (4 bytes) Legacy or mapped IO", .ucode = 0x10, }, { .uname = "READ_DWORD_1_16_DWORDS", .udesc = "SzRd DW (1-16 dwords) Block-oriented DMA reads, typically cache-line size", .ucode = 0x20, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_probe[]={ { .uname = "MISS", .udesc = "Probe miss", .ucode = 0x1, }, { .uname = "HIT_CLEAN", .udesc = "Probe hit clean", .ucode = 0x2, }, { .uname = "HIT_DIRTY_NO_MEMORY_CANCEL", .udesc = "Probe hit dirty without memory cancel (probed by Sized Write or Change2Dirty)", .ucode = 0x4, }, { .uname = "HIT_DIRTY_WITH_MEMORY_CANCEL", .udesc = "Probe hit dirty with memory cancel (probed by DMA read or cache refill request)", .ucode = 0x8, }, { .uname = "UPSTREAM_DISPLAY_REFRESH_READS", .udesc = "Upstream display refresh/ISOC reads", .ucode = 0x10, }, { .uname = "UPSTREAM_NON_DISPLAY_REFRESH_READS", .udesc = "Upstream non-display refresh reads", .ucode = 0x20, }, { .uname = "UPSTREAM_WRITES", .udesc = "Upstream ISOC writes", .ucode = 0x40, }, { .uname = "UPSTREAM_NON_ISOC_WRITES", .udesc = "Upstream non-ISOC writes", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_gart[]={ { .uname = "APERTURE_HIT_FROM_CPU", .udesc = "GART aperture hit on access from CPU", .ucode = 0x1, }, { .uname = "APERTURE_HIT_FROM_IO", .udesc = "GART aperture hit on access from IO", .ucode = 0x2, }, { .uname = "MISS", .udesc = "GART miss", .ucode = 0x4, }, { .uname = "REQUEST_HIT_TABLE_WALK", .udesc = "GART/DEV Request hit table walk in progress", .ucode = 0x8, }, { .uname = "DEV_HIT", .udesc = "DEV hit", .ucode = 0x10, }, { .uname = "DEV_MISS", .udesc = "DEV miss", .ucode = 0x20, }, { .uname = "DEV_ERROR", .udesc = "DEV error", .ucode = 0x40, }, { .uname = "MULTIPLE_TABLE_WALK", .udesc = "GART/DEV multiple table walk in progress", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_memory_controller_requests[]={ { .uname = "WRITE_REQUESTS", .udesc = "Write requests sent to the DCT", .ucode = 0x1, }, { .uname = "READ_REQUESTS", .udesc = "Read requests (including prefetch requests) sent to the DCT", .ucode = 0x2, }, { .uname = "PREFETCH_REQUESTS", .udesc = "Prefetch requests sent to the DCT", .ucode = 0x4, }, { .uname = "32_BYTES_WRITES", .udesc = "32 Bytes Sized Writes", .ucode = 0x8, }, { .uname = "64_BYTES_WRITES", .udesc = "64 Bytes Sized Writes", .ucode = 0x10, }, { .uname = "32_BYTES_READS", .udesc = "32 Bytes Sized Reads", .ucode = 0x20, }, { .uname = "64_BYTES_READS", .udesc = "64 Byte Sized Reads", .ucode = 0x40, }, { .uname = "READ_REQUESTS_WHILE_WRITES_REQUESTS", .udesc = "Read requests sent to the DCT while writes requests are pending in the DCT", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cpu_to_dram_requests_to_target_node[]={ { .uname = "LOCAL_TO_0", .udesc = "From Local node to Node 0", .ucode = 0x1, }, { .uname = "LOCAL_TO_1", .udesc = "From Local node to Node 1", .ucode = 0x2, }, { .uname = "LOCAL_TO_2", .udesc = "From Local node to Node 2", .ucode = 0x4, }, { .uname = "LOCAL_TO_3", .udesc = "From Local node to Node 3", .ucode = 0x8, }, { .uname = "LOCAL_TO_4", .udesc = "From Local node to Node 4", .ucode = 0x10, }, { .uname = "LOCAL_TO_5", .udesc = "From Local node to Node 5", .ucode = 0x20, }, { .uname = "LOCAL_TO_6", .udesc = "From Local node to Node 6", .ucode = 0x40, }, { .uname = "LOCAL_TO_7", .udesc = "From Local node to Node 7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cpu_read_command_latency_to_target_node_0_3[]={ { .uname = "READ_BLOCK", .udesc = "Read block", .ucode = 0x1, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read block shared", .ucode = 0x2, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read block modified", .ucode = 0x4, }, { .uname = "CHANGE_TO_DIRTY", .udesc = "Change-to-Dirty", .ucode = 0x8, }, { .uname = "LOCAL_TO_0", .udesc = "From Local node to Node 0", .ucode = 0x10, }, { .uname = "LOCAL_TO_1", .udesc = "From Local node to Node 1", .ucode = 0x20, }, { .uname = "LOCAL_TO_2", .udesc = "From Local node to Node 2", .ucode = 0x40, }, { .uname = "LOCAL_TO_3", .udesc = "From Local node to Node 3", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cpu_read_command_latency_to_target_node_4_7[]={ { .uname = "READ_BLOCK", .udesc = "Read block", .ucode = 0x1, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read block shared", .ucode = 0x2, }, { .uname = "READ_BLOCK_MODIFIED", .udesc = "Read block modified", .ucode = 0x4, }, { .uname = "CHANGE_TO_DIRTY", .udesc = "Change-to-Dirty", .ucode = 0x8, }, { .uname = "LOCAL_TO_4", .udesc = "From Local node to Node 4", .ucode = 0x10, }, { .uname = "LOCAL_TO_5", .udesc = "From Local node to Node 5", .ucode = 0x20, }, { .uname = "LOCAL_TO_6", .udesc = "From Local node to Node 6", .ucode = 0x40, }, { .uname = "LOCAL_TO_7", .udesc = "From Local node to Node 7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_cpu_command_latency_to_target_node_0_3_4_7[]={ { .uname = "READ_SIZED", .udesc = "Read Sized", .ucode = 0x1, }, { .uname = "WRITE_SIZED", .udesc = "Write Sized", .ucode = 0x2, }, { .uname = "VICTIM_BLOCK", .udesc = "Victim Block", .ucode = 0x4, }, { .uname = "NODE_GROUP_SELECT", .udesc = "Node Group Select. 0=Nodes 0-3. 1= Nodes 4-7.", .ucode = 0x8, }, { .uname = "LOCAL_TO_0_4", .udesc = "From Local node to Node 0/4", .ucode = 0x10, }, { .uname = "LOCAL_TO_1_5", .udesc = "From Local node to Node 1/5", .ucode = 0x20, }, { .uname = "LOCAL_TO_2_6", .udesc = "From Local node to Node 2/6", .ucode = 0x40, }, { .uname = "LOCAL_TO_3_7", .udesc = "From Local node to Node 3/7", .ucode = 0x80, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xff, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_hypertransport_link0[]={ { .uname = "COMMAND_DWORD_SENT", .udesc = "Command DWORD sent", .ucode = 0x1, .grpid = 0, }, { .uname = "DATA_DWORD_SENT", .udesc = "Data DWORD sent", .ucode = 0x2, .grpid = 0, }, { .uname = "BUFFER_RELEASE_DWORD_SENT", .udesc = "Buffer release DWORD sent", .ucode = 0x4, .grpid = 0, }, { .uname = "NOP_DWORD_SENT", .udesc = "Nop DW sent (idle)", .ucode = 0x8, .grpid = 0, }, { .uname = "ADDRESS_EXT_DWORD_SENT", .udesc = "Address extension DWORD sent", .ucode = 0x10, .grpid = 0, }, { .uname = "PER_PACKET_CRC_SENT", .udesc = "Per packet CRC sent", .ucode = 0x20, .grpid = 0, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "SUBLINK_MASK", .udesc = "SubLink Mask", .ucode = 0x80, .uflags= AMD64_FL_OMIT, .grpid = 1, }, }; static const amd64_umask_t amd64_fam10h_hypertransport_link3[]={ { .uname = "COMMAND_DWORD_SENT", .udesc = "Command DWORD sent", .ucode = 0x1, .grpid = 0, }, { .uname = "DATA_DWORD_SENT", .udesc = "Data DWORD sent", .ucode = 0x2, .grpid = 0, }, { .uname = "BUFFER_RELEASE_DWORD_SENT", .udesc = "Buffer release DWORD sent", .ucode = 0x4, .grpid = 0, }, { .uname = "NOP_DWORD_SENT", .udesc = "Nop DW sent (idle)", .ucode = 0x8, .grpid = 0, }, { .uname = "ADDRESS_EXT_DWORD_SENT", .udesc = "Address DWORD sent", .ucode = 0x10, .grpid = 0, }, { .uname = "PER_PACKET_CRC_SENT", .udesc = "Per packet CRC sent", .ucode = 0x20, .grpid = 0, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x3f, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "SUBLINK_MASK", .udesc = "SubLink Mask", .ucode = 0x80, .uflags= AMD64_FL_OMIT, .grpid = 1, }, }; static const amd64_umask_t amd64_fam10h_read_request_to_l3_cache[]={ { .uname = "READ_BLOCK_EXCLUSIVE", .udesc = "Read Block Exclusive (Data cache read)", .ucode = 0x1, .grpid = 0, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read Block Shared (Instruction cache read)", .ucode = 0x2, .grpid = 0, }, { .uname = "READ_BLOCK_MODIFY", .udesc = "Read Block Modify", .ucode = 0x4, .grpid = 0, }, { .uname = "ANY_READ", .udesc = "Any read modes (exclusive, shared, modify)", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "ALL_CORES", .udesc = "All sub-events selected", .ucode = 0xf0, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 1, }, }; static const amd64_umask_t amd64_fam10h_l3_cache_misses[]={ { .uname = "READ_BLOCK_EXCLUSIVE", .udesc = "Read Block Exclusive (Data cache read)", .ucode = 0x1, .grpid = 0, }, { .uname = "READ_BLOCK_SHARED", .udesc = "Read Block Shared (Instruction cache read)", .ucode = 0x2, .grpid = 0, }, { .uname = "READ_BLOCK_MODIFY", .udesc = "Read Block Modify", .ucode = 0x4, .grpid = 0, }, { .uname = "ANY_READ", .udesc = "Any read modes (exclusive, shared, modify)", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "ALL_CORES", .udesc = "All cores", .ucode = 0xf0, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 1, }, }; static const amd64_umask_t amd64_fam10h_l3_fills_caused_by_l2_evictions[]={ { .uname = "SHARED", .udesc = "Shared", .ucode = 0x1, .grpid = 0, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x2, .grpid = 0, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x4, .grpid = 0, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x8, .grpid = 0, }, { .uname = "ANY_STATE", .udesc = "Any line state (shared, owned, exclusive, modified)", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 0, }, { .uname = "ALL_CORES", .udesc = "All cores", .ucode = 0xf0, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, .grpid = 1, }, }; static const amd64_umask_t amd64_fam10h_l3_evictions[]={ { .uname = "SHARED", .udesc = "Shared", .ucode = 0x1, }, { .uname = "EXCLUSIVE", .udesc = "Exclusive", .ucode = 0x2, }, { .uname = "OWNED", .udesc = "Owned", .ucode = 0x4, }, { .uname = "MODIFIED", .udesc = "Modified", .ucode = 0x8, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0xf, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_page_size_mismatches[]={ { .uname = "GUEST_LARGER", .udesc = "Guest page size is larger than the host page size.", .ucode = 0x1, }, { .uname = "MTRR_MISMATCH", .udesc = "MTRR mismatch.", .ucode = 0x2, }, { .uname = "HOST_LARGER", .udesc = "Host page size is larger than the guest page size.", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_umask_t amd64_fam10h_retired_x87_ops[]={ { .uname = "ADD_SUB_OPS", .udesc = "Add/subtract ops", .ucode = 0x1, }, { .uname = "MUL_OPS", .udesc = "Multiply ops", .ucode = 0x2, }, { .uname = "DIV_OPS", .udesc = "Divide ops", .ucode = 0x4, }, { .uname = "ALL", .udesc = "All sub-events selected", .ucode = 0x7, .uflags= AMD64_FL_NCOMBO | AMD64_FL_DFL, }, }; static const amd64_entry_t amd64_fam10h_pe[]={ { .name = "DISPATCHED_FPU", .desc = "Dispatched FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_dispatched_fpu), .ngrp = 1, .umasks = amd64_fam10h_dispatched_fpu, }, { .name = "CYCLES_NO_FPU_OPS_RETIRED", .desc = "Cycles in which the FPU is Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1, }, { .name = "DISPATCHED_FPU_OPS_FAST_FLAG", .desc = "Dispatched Fast Flag FPU Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2, }, { .name = "RETIRED_SSE_OPERATIONS", .desc = "Retired SSE Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_sse_operations), .ngrp = 1, .umasks = amd64_fam10h_retired_sse_operations, }, { .name = "RETIRED_MOVE_OPS", .desc = "Retired Move Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_move_ops), .ngrp = 1, .umasks = amd64_fam10h_retired_move_ops, }, { .name = "RETIRED_SERIALIZING_OPS", .desc = "Retired Serializing Ops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_serializing_ops), .ngrp = 1, .umasks = amd64_fam10h_retired_serializing_ops, }, { .name = "FP_SCHEDULER_CYCLES", .desc = "Number of Cycles that a Serializing uop is in the FP Scheduler", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_fp_scheduler_cycles), .ngrp = 1, .umasks = amd64_fam10h_fp_scheduler_cycles, }, { .name = "SEGMENT_REGISTER_LOADS", .desc = "Segment Register Loads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_segment_register_loads), .ngrp = 1, .umasks = amd64_fam10h_segment_register_loads, }, { .name = "PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE", .desc = "Pipeline Restart Due to Self-Modifying Code", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x21, }, { .name = "PIPELINE_RESTART_DUE_TO_PROBE_HIT", .desc = "Pipeline Restart Due to Probe Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x22, }, { .name = "LS_BUFFER_2_FULL_CYCLES", .desc = "LS Buffer 2 Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x23, }, { .name = "LOCKED_OPS", .desc = "Locked Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_locked_ops), .ngrp = 1, .umasks = amd64_fam10h_locked_ops, }, { .name = "RETIRED_CLFLUSH_INSTRUCTIONS", .desc = "Retired CLFLUSH Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x26, }, { .name = "RETIRED_CPUID_INSTRUCTIONS", .desc = "Retired CPUID Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x27, }, { .name = "CANCELLED_STORE_TO_LOAD_FORWARD_OPERATIONS", .desc = "Cancelled Store to Load Forward Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cancelled_store_to_load_forward_operations), .ngrp = 1, .umasks = amd64_fam10h_cancelled_store_to_load_forward_operations, }, { .name = "SMIS_RECEIVED", .desc = "SMIs Received", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x2b, }, { .name = "DATA_CACHE_ACCESSES", .desc = "Data Cache Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x40, }, { .name = "DATA_CACHE_MISSES", .desc = "Data Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x41, }, { .name = "DATA_CACHE_REFILLS", .desc = "Data Cache Refills from L2 or Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_data_cache_refills), .ngrp = 1, .umasks = amd64_fam10h_data_cache_refills, }, { .name = "DATA_CACHE_REFILLS_FROM_SYSTEM", .desc = "Data Cache Refills from the Northbridge", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_data_cache_refills_from_system), .ngrp = 1, .umasks = amd64_fam10h_data_cache_refills_from_system, }, { .name = "DATA_CACHE_LINES_EVICTED", .desc = "Data Cache Lines Evicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x44, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_data_cache_lines_evicted), .ngrp = 1, .umasks = amd64_fam10h_data_cache_lines_evicted, }, { .name = "L1_DTLB_MISS_AND_L2_DTLB_HIT", .desc = "L1 DTLB Miss and L2 DTLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x45, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l1_dtlb_miss_and_l2_dtlb_hit), .ngrp = 1, .umasks = amd64_fam10h_l1_dtlb_miss_and_l2_dtlb_hit, }, { .name = "L1_DTLB_AND_L2_DTLB_MISS", .desc = "L1 DTLB and L2 DTLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x46, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l1_dtlb_and_l2_dtlb_miss), .ngrp = 1, .umasks = amd64_fam10h_l1_dtlb_and_l2_dtlb_miss, }, { .name = "MISALIGNED_ACCESSES", .desc = "Misaligned Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x47, }, { .name = "MICROARCHITECTURAL_LATE_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Late Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x48, }, { .name = "MICROARCHITECTURAL_EARLY_CANCEL_OF_AN_ACCESS", .desc = "Microarchitectural Early Cancel of an Access", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x49, }, { .name = "SCRUBBER_SINGLE_BIT_ECC_ERRORS", .desc = "Single-bit ECC Errors Recorded by Scrubber", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4a, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_scrubber_single_bit_ecc_errors), .ngrp = 1, .umasks = amd64_fam10h_scrubber_single_bit_ecc_errors, }, { .name = "PREFETCH_INSTRUCTIONS_DISPATCHED", .desc = "Prefetch Instructions Dispatched", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4b, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_prefetch_instructions_dispatched), .ngrp = 1, .umasks = amd64_fam10h_prefetch_instructions_dispatched, }, { .name = "DCACHE_MISSES_BY_LOCKED_INSTRUCTIONS", .desc = "DCACHE Misses by Locked Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_dcache_misses_by_locked_instructions), .ngrp = 1, .umasks = amd64_fam10h_dcache_misses_by_locked_instructions, }, { .name = "L1_DTLB_HIT", .desc = "L1 DTLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l1_dtlb_hit), .ngrp = 1, .umasks = amd64_fam10h_l1_dtlb_hit, }, { .name = "INEFFECTIVE_SW_PREFETCHES", .desc = "Ineffective Software Prefetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x52, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_ineffective_sw_prefetches), .ngrp = 1, .umasks = amd64_fam10h_ineffective_sw_prefetches, }, { .name = "GLOBAL_TLB_FLUSHES", .desc = "Global TLB Flushes", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x54, }, { .name = "MEMORY_REQUESTS", .desc = "Memory Requests by Type", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_requests), .ngrp = 1, .umasks = amd64_fam10h_memory_requests, }, { .name = "DATA_PREFETCHES", .desc = "Data Prefetcher", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x67, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_data_prefetches), .ngrp = 1, .umasks = amd64_fam10h_data_prefetches, }, { .name = "MAB_REQUESTS", .desc = "Average L1 refill latency for Icache and Dcache misses (request count for cache refills)", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x68, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_mab_requests), .ngrp = 1, .umasks = amd64_fam10h_mab_requests, }, { .name = "MAB_WAIT_CYCLES", .desc = "Average L1 refill latency for Icache and Dcache misses (cycles that requests spent waiting for the refills)", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x69, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_mab_requests), .ngrp = 1, .umasks = amd64_fam10h_mab_requests, /* identical to actual umasks list for this event */ }, { .name = "SYSTEM_READ_RESPONSES", .desc = "Northbridge Read Responses by Coherency State", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_system_read_responses), .ngrp = 1, .umasks = amd64_fam10h_system_read_responses, }, { .name = "QUADWORDS_WRITTEN_TO_SYSTEM", .desc = "Octwords Written to System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x6d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_quadwords_written_to_system), .ngrp = 1, .umasks = amd64_fam10h_quadwords_written_to_system, }, { .name = "CPU_CLK_UNHALTED", .desc = "CPU Clocks not Halted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x76, }, { .name = "REQUESTS_TO_L2", .desc = "Requests to L2 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7d, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_requests_to_l2), .ngrp = 1, .umasks = amd64_fam10h_requests_to_l2, }, { .name = "L2_CACHE_MISS", .desc = "L2 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7e, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l2_cache_miss), .ngrp = 1, .umasks = amd64_fam10h_l2_cache_miss, }, { .name = "L2_FILL_WRITEBACK", .desc = "L2 Fill/Writeback", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x7f, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l2_fill_writeback), .ngrp = 1, .umasks = amd64_fam10h_l2_fill_writeback, }, { .name = "INSTRUCTION_CACHE_FETCHES", .desc = "Instruction Cache Fetches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x80, }, { .name = "INSTRUCTION_CACHE_MISSES", .desc = "Instruction Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x81, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_L2", .desc = "Instruction Cache Refills from L2", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x82, }, { .name = "INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM", .desc = "Instruction Cache Refills from System", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x83, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_HIT", .desc = "L1 ITLB Miss and L2 ITLB Hit", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x84, }, { .name = "L1_ITLB_MISS_AND_L2_ITLB_MISS", .desc = "L1 ITLB Miss and L2 ITLB Miss", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x85, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l1_itlb_miss_and_l2_itlb_miss), .ngrp = 1, .umasks = amd64_fam10h_l1_itlb_miss_and_l2_itlb_miss, }, { .name = "PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE", .desc = "Pipeline Restart Due to Instruction Stream Probe", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x86, }, { .name = "INSTRUCTION_FETCH_STALL", .desc = "Instruction Fetch Stall", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x87, }, { .name = "RETURN_STACK_HITS", .desc = "Return Stack Hits", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x88, }, { .name = "RETURN_STACK_OVERFLOWS", .desc = "Return Stack Overflows", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x89, }, { .name = "INSTRUCTION_CACHE_VICTIMS", .desc = "Instruction Cache Victims", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8b, }, { .name = "INSTRUCTION_CACHE_LINES_INVALIDATED", .desc = "Instruction Cache Lines Invalidated", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x8c, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_instruction_cache_lines_invalidated), .ngrp = 1, .umasks = amd64_fam10h_instruction_cache_lines_invalidated, }, { .name = "ITLB_RELOADS", .desc = "ITLB Reloads", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x99, }, { .name = "ITLB_RELOADS_ABORTED", .desc = "ITLB Reloads Aborted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x9a, }, { .name = "RETIRED_INSTRUCTIONS", .desc = "Retired Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc0, }, { .name = "RETIRED_UOPS", .desc = "Retired uops", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc1, }, { .name = "RETIRED_BRANCH_INSTRUCTIONS", .desc = "Retired Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc2, }, { .name = "RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS", .desc = "Retired Mispredicted Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc3, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS", .desc = "Retired Taken Branch Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc4, }, { .name = "RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED", .desc = "Retired Taken Branch Instructions Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc5, }, { .name = "RETIRED_FAR_CONTROL_TRANSFERS", .desc = "Retired Far Control Transfers", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc6, }, { .name = "RETIRED_BRANCH_RESYNCS", .desc = "Retired Branch Resyncs", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc7, }, { .name = "RETIRED_NEAR_RETURNS", .desc = "Retired Near Returns", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc8, }, { .name = "RETIRED_NEAR_RETURNS_MISPREDICTED", .desc = "Retired Near Returns Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xc9, }, { .name = "RETIRED_INDIRECT_BRANCHES_MISPREDICTED", .desc = "Retired Indirect Branches Mispredicted", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xca, }, { .name = "RETIRED_MMX_AND_FP_INSTRUCTIONS", .desc = "Retired MMX/FP Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_mmx_and_fp_instructions), .ngrp = 1, .umasks = amd64_fam10h_retired_mmx_and_fp_instructions, }, { .name = "RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS", .desc = "Retired Fastpath Double Op Instructions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcc, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_fastpath_double_op_instructions), .ngrp = 1, .umasks = amd64_fam10h_retired_fastpath_double_op_instructions, }, { .name = "INTERRUPTS_MASKED_CYCLES", .desc = "Interrupts-Masked Cycles", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcd, }, { .name = "INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING", .desc = "Interrupts-Masked Cycles with Interrupt Pending", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xce, }, { .name = "INTERRUPTS_TAKEN", .desc = "Interrupts Taken", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xcf, }, { .name = "DECODER_EMPTY", .desc = "Decoder Empty", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd0, }, { .name = "DISPATCH_STALLS", .desc = "Dispatch Stalls", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd1, }, { .name = "DISPATCH_STALL_FOR_BRANCH_ABORT", .desc = "Dispatch Stall for Branch Abort to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd2, }, { .name = "DISPATCH_STALL_FOR_SERIALIZATION", .desc = "Dispatch Stall for Serialization", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd3, }, { .name = "DISPATCH_STALL_FOR_SEGMENT_LOAD", .desc = "Dispatch Stall for Segment Load", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd4, }, { .name = "DISPATCH_STALL_FOR_REORDER_BUFFER_FULL", .desc = "Dispatch Stall for Reorder Buffer Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd5, }, { .name = "DISPATCH_STALL_FOR_RESERVATION_STATION_FULL", .desc = "Dispatch Stall for Reservation Station Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd6, }, { .name = "DISPATCH_STALL_FOR_FPU_FULL", .desc = "Dispatch Stall for FPU Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd7, }, { .name = "DISPATCH_STALL_FOR_LS_FULL", .desc = "Dispatch Stall for LS Full", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd8, }, { .name = "DISPATCH_STALL_WAITING_FOR_ALL_QUIET", .desc = "Dispatch Stall Waiting for All Quiet", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xd9, }, { .name = "DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RSYNC", .desc = "Dispatch Stall for Far Transfer or Resync to Retire", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xda, }, { .name = "FPU_EXCEPTIONS", .desc = "FPU Exceptions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_fpu_exceptions), .ngrp = 1, .umasks = amd64_fam10h_fpu_exceptions, }, { .name = "DR0_BREAKPOINT_MATCHES", .desc = "DR0 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdc, }, { .name = "DR1_BREAKPOINT_MATCHES", .desc = "DR1 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdd, }, { .name = "DR2_BREAKPOINT_MATCHES", .desc = "DR2 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xde, }, { .name = "DR3_BREAKPOINT_MATCHES", .desc = "DR3 Breakpoint Matches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xdf, }, { .name = "DRAM_ACCESSES_PAGE", .desc = "DRAM Accesses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_dram_accesses_page), .ngrp = 1, .umasks = amd64_fam10h_dram_accesses_page, }, { .name = "MEMORY_CONTROLLER_PAGE_TABLE_OVERFLOWS", .desc = "DRAM Controller Page Table Overflows", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_controller_page_table_overflows), .ngrp = 1, .umasks = amd64_fam10h_memory_controller_page_table_overflows, }, { .name = "MEMORY_CONTROLLER_SLOT_MISSES", .desc = "Memory Controller DRAM Command Slots Missed", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_controller_slot_misses), .ngrp = 1, .umasks = amd64_fam10h_memory_controller_slot_misses, }, { .name = "MEMORY_CONTROLLER_TURNAROUNDS", .desc = "Memory Controller Turnarounds", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_controller_turnarounds), .ngrp = 1, .umasks = amd64_fam10h_memory_controller_turnarounds, }, { .name = "MEMORY_CONTROLLER_BYPASS", .desc = "Memory Controller Bypass Counter Saturation", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_controller_bypass), .ngrp = 1, .umasks = amd64_fam10h_memory_controller_bypass, }, { .name = "THERMAL_STATUS_AND_ECC_ERRORS", .desc = "Thermal Status", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_thermal_status_and_ecc_errors), .ngrp = 1, .umasks = amd64_fam10h_thermal_status_and_ecc_errors, }, { .name = "CPU_IO_REQUESTS_TO_MEMORY_IO", .desc = "CPU/IO Requests to Memory/IO", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xe9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_io_requests_to_memory_io), .ngrp = 1, .umasks = amd64_fam10h_cpu_io_requests_to_memory_io, }, { .name = "CACHE_BLOCK", .desc = "Cache Block Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xea, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cache_block), .ngrp = 1, .umasks = amd64_fam10h_cache_block, }, { .name = "SIZED_COMMANDS", .desc = "Sized Commands", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xeb, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_sized_commands), .ngrp = 1, .umasks = amd64_fam10h_sized_commands, }, { .name = "PROBE", .desc = "Probe Responses and Upstream Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xec, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_probe), .ngrp = 1, .umasks = amd64_fam10h_probe, }, { .name = "GART", .desc = "GART Events", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xee, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_gart), .ngrp = 1, .umasks = amd64_fam10h_gart, }, { .name = "MEMORY_CONTROLLER_REQUESTS", .desc = "Memory Controller Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1f0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_memory_controller_requests), .ngrp = 1, .umasks = amd64_fam10h_memory_controller_requests, }, { .name = "CPU_TO_DRAM_REQUESTS_TO_TARGET_NODE", .desc = "CPU to DRAM Requests to Target Node", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e0, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_to_dram_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam10h_cpu_to_dram_requests_to_target_node, }, { .name = "IO_TO_DRAM_REQUESTS_TO_TARGET_NODE", .desc = "IO to DRAM Requests to Target Node", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e1, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_to_dram_requests_to_target_node), .ngrp = 1, .umasks = amd64_fam10h_cpu_to_dram_requests_to_target_node, /* identical to actual umasks list for this event */ }, { .name = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_0_3", .desc = "CPU Read Command Latency to Target Node 0-3", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e2, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_read_command_latency_to_target_node_0_3), .ngrp = 1, .umasks = amd64_fam10h_cpu_read_command_latency_to_target_node_0_3, }, { .name = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_0_3", .desc = "CPU Read Command Requests to Target Node 0-3", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_read_command_latency_to_target_node_0_3), .ngrp = 1, .umasks = amd64_fam10h_cpu_read_command_latency_to_target_node_0_3, /* identical to actual umasks list for this event */ }, { .name = "CPU_READ_COMMAND_LATENCY_TO_TARGET_NODE_4_7", .desc = "CPU Read Command Latency to Target Node 4-7", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e4, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_read_command_latency_to_target_node_4_7), .ngrp = 1, .umasks = amd64_fam10h_cpu_read_command_latency_to_target_node_4_7, }, { .name = "CPU_READ_COMMAND_REQUESTS_TO_TARGET_NODE_4_7", .desc = "CPU Read Command Requests to Target Node 4-7", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e5, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_read_command_latency_to_target_node_4_7), .ngrp = 1, .umasks = amd64_fam10h_cpu_read_command_latency_to_target_node_4_7, /* identical to actual umasks list for this event */ }, { .name = "CPU_COMMAND_LATENCY_TO_TARGET_NODE_0_3_4_7", .desc = "CPU Command Latency to Target Node 0-3/4-7", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_command_latency_to_target_node_0_3_4_7), .ngrp = 1, .umasks = amd64_fam10h_cpu_command_latency_to_target_node_0_3_4_7, }, { .name = "CPU_REQUESTS_TO_TARGET_NODE_0_3_4_7", .desc = "CPU Requests to Target Node 0-3/4-7", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1e7, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_cpu_command_latency_to_target_node_0_3_4_7), .ngrp = 1, .umasks = amd64_fam10h_cpu_command_latency_to_target_node_0_3_4_7, /* identical to actual umasks list for this event */ }, { .name = "HYPERTRANSPORT_LINK0", .desc = "HyperTransport Link 0 Transmit Bandwidth", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xf6, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_hypertransport_link0), .ngrp = 2, .umasks = amd64_fam10h_hypertransport_link0, }, { .name = "HYPERTRANSPORT_LINK1", .desc = "HyperTransport Link 1 Transmit Bandwidth", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xf7, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_hypertransport_link0), .ngrp = 2, .umasks = amd64_fam10h_hypertransport_link0, /* identical to actual umasks list for this event */ }, { .name = "HYPERTRANSPORT_LINK2", .desc = "HyperTransport Link 2 Transmit Bandwidth", .modmsk = AMD64_FAM10H_ATTRS, .code = 0xf8, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_hypertransport_link0), .ngrp = 2, .umasks = amd64_fam10h_hypertransport_link0, /* identical to actual umasks list for this event */ }, { .name = "HYPERTRANSPORT_LINK3", .desc = "HyperTransport Link 3 Transmit Bandwidth", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1f9, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_hypertransport_link3), .ngrp = 2, .umasks = amd64_fam10h_hypertransport_link3, }, { .name = "READ_REQUEST_TO_L3_CACHE", .desc = "Read Request to L3 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e0, .flags = AMD64_FL_TILL_FAM10H_REV_C, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_read_request_to_l3_cache), .ngrp = 2, .umasks = amd64_fam10h_read_request_to_l3_cache, }, { .name = "L3_CACHE_MISSES", .desc = "L3 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e1, .flags = AMD64_FL_TILL_FAM10H_REV_C, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_cache_misses), .ngrp = 2, .umasks = amd64_fam10h_l3_cache_misses, }, { .name = "L3_FILLS_CAUSED_BY_L2_EVICTIONS", .desc = "L3 Fills caused by L2 Evictions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e2, .flags = AMD64_FL_TILL_FAM10H_REV_C, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_fills_caused_by_l2_evictions), .ngrp = 2, .umasks = amd64_fam10h_l3_fills_caused_by_l2_evictions, }, { .name = "L3_EVICTIONS", .desc = "L3 Evictions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e3, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_evictions), .ngrp = 1, .umasks = amd64_fam10h_l3_evictions, }, { .name = "PAGE_SIZE_MISMATCHES", .desc = "Page Size Mismatches", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x165, .flags = AMD64_FL_FAM10H_REV_C, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_page_size_mismatches), .ngrp = 1, .umasks = amd64_fam10h_page_size_mismatches, }, { .name = "RETIRED_X87_OPS", .desc = "Retired x87 Floating Point Operations", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1c0, .flags = AMD64_FL_FAM10H_REV_C, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_retired_x87_ops), .ngrp = 1, .umasks = amd64_fam10h_retired_x87_ops, }, { .name = "IBS_OPS_TAGGED", .desc = "IBS Ops Tagged", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1cf, .flags = AMD64_FL_FAM10H_REV_C, }, { .name = "LFENCE_INST_RETIRED", .desc = "LFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d3, .flags = AMD64_FL_FAM10H_REV_C, }, { .name = "SFENCE_INST_RETIRED", .desc = "SFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d4, .flags = AMD64_FL_FAM10H_REV_C, }, { .name = "MFENCE_INST_RETIRED", .desc = "MFENCE Instructions Retired", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x1d5, .flags = AMD64_FL_FAM10H_REV_C, }, { .name = "READ_REQUEST_TO_L3_CACHE", .desc = "Read Request to L3 Cache", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e0, .flags = AMD64_FL_FAM10H_REV_D, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_cache_misses), .ngrp = 2, .umasks = amd64_fam10h_l3_cache_misses, /* identical to actual umasks list for this event */ }, { .name = "L3_CACHE_MISSES", .desc = "L3 Cache Misses", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e1, .flags = AMD64_FL_FAM10H_REV_D, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_cache_misses), .ngrp = 2, .umasks = amd64_fam10h_l3_cache_misses, /* identical to actual umasks list for this event */ }, { .name = "L3_FILLS_CAUSED_BY_L2_EVICTIONS", .desc = "L3 Fills caused by L2 Evictions", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4e2, .flags = AMD64_FL_FAM10H_REV_D, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_fills_caused_by_l2_evictions), .ngrp = 2, .umasks = amd64_fam10h_l3_fills_caused_by_l2_evictions, /* identical to actual umasks list for this event */ }, { .name = "NON_CANCELLED_L3_READ_REQUESTS", .desc = "Non-cancelled L3 Read Requests", .modmsk = AMD64_FAM10H_ATTRS, .code = 0x4ed, .flags = AMD64_FL_FAM10H_REV_D, .numasks = LIBPFM_ARRAY_SIZE(amd64_fam10h_l3_cache_misses), .ngrp = 2, .umasks = amd64_fam10h_l3_cache_misses, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/intel_hsw_events.h0000600003276200002170000022455612247131123022103 0ustar ralphundrgrad/* * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: hsw (Intel Haswell) */ static const intel_x86_umask_t hsw_baclears[]={ { .uname = "ANY", .udesc = "NUmber of front-end re-steers due to BPU misprediction", .ucode = 0x1f00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_br_inst_exec[]={ { .uname = "NONTAKEN_CONDITIONAL", .udesc = "All macro conditional nontaken branch instructions", .ucode = 0x4100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_CONDITIONAL", .udesc = "Taken speculative and retired macro-conditional branches", .ucode = 0x8100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_JUMP", .udesc = "Taken speculative and retired macro-conditional branch instructions excluding calls and indirects", .ucode = 0x8200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "Taken speculative and retired indirect branches excluding calls and returns", .ucode = 0x8400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_RETURN", .udesc = "Taken speculative and retired indirect branches with return mnemonic", .ucode = 0x8800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_DIRECT_NEAR_CALL", .udesc = "Taken speculative and retired direct near calls", .ucode = 0x9000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_CONDITIONAL", .udesc = "Speculative and retired macro-conditional branches", .ucode = 0xc100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DIRECT_JMP", .udesc = "Speculative and retired macro-unconditional branches excluding calls and indirects", .ucode = 0xc200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_INDIRECT_JUMP_NON_CALL_RET", .udesc = "Speculative and retired indirect branches excluding calls and returns", .ucode = 0xc400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_INDIRECT_NEAR_RETURN", .udesc = "Speculative and retired indirect return branches", .ucode = 0xc800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DIRECT_NEAR_CALL", .udesc = "Speculative and retired direct near calls", .ucode = 0xd000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "All indirect calls, including both register and memory indirect", .ucode = 0xa000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "All branch instructions executed", .ucode = 0xff00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_br_inst_retired[]={ { .uname = "CONDITIONAL", .udesc = "Counts all taken and not taken macro conditional branch instructions", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_CALL", .udesc = "Counts all macro direct and indirect near calls", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_BRANCHES", .udesc = "Counts all taken and not taken macro branches including far branches (architectural event)", .ucode = 0x0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_PEBS, }, { .uname = "NEAR_RETURN", .udesc = "Counts the number of near ret instructions retired", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NOT_TAKEN", .udesc = "Counts all not taken macro branch instructions retired", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "NEAR_TAKEN", .udesc = "Counts the number of near branch taken instructions retired", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "FAR_BRANCH", .udesc = "Counts the number of far branch instructions retired", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_br_misp_exec[]={ { .uname = "NONTAKEN_CONDITIONAL", .udesc = "Not taken speculative and retired mispredicted macro conditional branches", .ucode = 0x4100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_CONDITIONAL", .udesc = "Taken speculative and retired mispredicted macro conditional branches", .ucode = 0x8100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_INDIRECT_JUMP_NON_CALL_RET", .udesc = "Taken speculative and retired mispredicted indirect branches excluding calls and returns", .ucode = 0x8400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "TAKEN_RETURN_NEAR", .udesc = "Taken speculative and retired mispredicted indirect branches with return mnemonic", .ucode = 0x8800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_CONDITIONAL", .udesc = "Speculative and retired mispredicted macro conditional branches", .ucode = 0xc100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_INDIRECT_JUMP_NON_CALL_RET", .udesc = "All mispredicted indirect branches that are not calls nor returns", .ucode = 0xc400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_BRANCHES", .udesc = "Speculative and retired mispredicted macro conditional branches", .ucode = 0xff00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "TAKEN_INDIRECT_NEAR_CALL", .udesc = "Taken speculative and retired mispredicted indirect calls", .ucode = 0xa000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_br_misp_retired[]={ { .uname = "CONDITIONAL", .udesc = "All mispredicted macro conditional branch instructions", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_BRANCHES", .udesc = "All mispredicted macro branches (architectural event)", .ucode = 0x0, /* architectural encoding */ .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "NEAR_TAKEN", .udesc = "number of near branch instructions retired that were mispredicted and taken", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_cpl_cycles[]={ { .uname = "RING0", .udesc = "Unhalted core cycles when the thread is in ring 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RING123", .udesc = "Unhalted core cycles when thread is in rings 1, 2, or 3", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RING0_TRANS", .udesc = "Number of intervals between processor halts while thread is in ring 0", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (1 << INTEL_X86_CMASK_BIT), /* edge=1 cnt=1 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_cpu_clk_thread_unhalted[]={ { .uname = "REF_XCLK", .udesc = "Cases when the core is unhalted at 100Mhz", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "THREAD_P", .udesc = "Cycles when thread is not halted", .ucode = 0x000, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_cycle_activity[]={ { .uname = "CYCLES_L2_PENDING", .udesc = "Cycles with pending L2 miss loads (must use with HT off only)", .ucode = 0x0100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, .ucntmsk= 0xf, }, { .uname = "CYCLES_LDM_PENDING", .udesc = "Cycles with pending memory loads", .ucode = 0x0200 | (0x2 << INTEL_X86_CMASK_BIT), .uflags = INTEL_X86_NCOMBO, }, { .uname = "CYCLES_L1D_PENDING", .udesc = "Cycles with pending L1D load cache misses", .ucode = 0x0800 | (0x8 << INTEL_X86_CMASK_BIT), .ucntmsk= 0x4, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALLS_L2_PENDING", .udesc = "Execution stalls due to L2 pending loads (must use with HT off only)", .ucode = 0x0500 | (0x5 << INTEL_X86_CMASK_BIT), .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_dtlb_load_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Misses in all DTLB levels that cause page walks", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED_4K", .udesc = "Misses in all TLB levels causes a page walk that completes (4K)", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED_2M_4M", .udesc = "Misses in all TLB levels causes a page walk that completes (2M/4M)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Misses in all TLB levels causes a page walk of any page size that completes", .ucode = 0xe00, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles when PMH is busy with page walks", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT_4K", .udesc = "Misses that miss the DTLB and hit the STLB (4K)", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT_2M", .udesc = "Misses that miss the DTLB and hit the STLB (2M)", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "Number of cache load STLB hits. No page walk", .ucode = 0x6000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PDE_CACHE_MISS", .udesc = "DTLB misses with low part of linear-to-physical address translation missed", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_itlb_misses[]={ { .uname = "MISS_CAUSES_A_WALK", .udesc = "Misses in all DTLB levels that cause page walks", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED_4K", .udesc = "Misses in all TLB levels causes a page walk that completes (4K)", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED_2M_4M", .udesc = "Misses in all TLB levels causes a page walk that completes (2M/4M)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_COMPLETED", .udesc = "Misses in all TLB levels causes a page walk of any page size that completes", .ucode = 0xe00, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WALK_DURATION", .udesc = "Cycles when PMH is busy with page walks", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT_4K", .udesc = "Misses that miss the DTLB and hit the STLB (4K)", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT_2M", .udesc = "Misses that miss the DTLB and hit the STLB (2M)", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_HIT", .udesc = "Number of cache load STLB hits. No page walk", .ucode = 0x6000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_fp_assist[]={ { .uname = "X87_OUTPUT", .udesc = "Number of X87 FP assists due to output values", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "X87_INPUT", .udesc = "Number of X87 FP assists due to input values", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SIMD_OUTPUT", .udesc = "Number of SIMD FP assists due to output values", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SIMD_INPUT", .udesc = "Number of SIMD FP assists due to input values", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ANY", .udesc = "Cycles with any input/output SEE or FP assists", .ucode = 0x1e00 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "ALL", .udesc = "Cycles with any input and output SSE or FP assist", .ucode = 0x1e00 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "ANY", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_icache[]={ { .uname = "MISSES", .udesc = "Number of Instruction Cache, Streaming Buffer and Victim Cache Misses. Includes Uncacheable accesses", .ucode = 0x200, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_idq[]={ { .uname = "EMPTY", .udesc = "Cycles the Instruction Decode Queue (IDQ) is empty", .ucode = 0x200, .ucntmsk= 0xf, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MITE_UOPS", .udesc = "Number of uops delivered to Instruction Decode Queue (IDQ) from MITE path", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DSB_UOPS", .udesc = "Number of uops delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MS_DSB_UOPS", .udesc = "Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MS_MITE_UOPS", .udesc = "Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS", .udesc = "Number of Uops were delivered into Instruction Decode Queue (IDQ) from MS, initiated by Decode Stream Buffer (DSB) or MITE", .ucode = 0x3000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MS_UOPS_CYCLES", .udesc = "Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE", .ucode = 0x3000 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "MS_UOPS:c=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "MITE_UOPS_CYCLES", .udesc = "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path", .ucode = 0x400 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "MITE_UOPS:c=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "DSB_UOPS_CYCLES", .udesc = "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path", .ucode = 0x800 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "DSB_UOPS:c=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "MS_DSB_UOPS_CYCLES", .udesc = "Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", .ucode = 0x1000 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "MS_DSB_UOPS:c=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "MS_DSB_OCCUR", .udesc = "Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy", .ucode = 0x1000 | INTEL_X86_MOD_EDGE | (1 << INTEL_X86_CMASK_BIT), /* edge=1 cnt=1 */ .uequiv = "MS_DSB_UOPS:c=1:e=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, }, { .uname = "ALL_DSB_CYCLES_4_UOPS", .udesc = "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops", .ucode = 0x1800 | (4 << INTEL_X86_CMASK_BIT), /* cnt=4 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "ALL_DSB_CYCLES_ANY_UOPS", .udesc = "Cycles Decode Stream Buffer (DSB) is delivering any Uop", .ucode = 0x1800 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "ALL_MITE_CYCLES_4_UOPS", .udesc = "Cycles MITE is delivering 4 Uops", .ucode = 0x2400 | (4 << INTEL_X86_CMASK_BIT), /* cnt=4 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "ALL_MITE_CYCLES_ANY_UOPS", .udesc = "Cycles MITE is delivering any Uop", .ucode = 0x2400 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "ALL_MITE_UOPS", .udesc = "Number of uops delivered to Instruction Decode Queue (IDQ) from any path", .ucode = 0x3c00, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_idq_uops_not_delivered[]={ { .uname = "CORE", .udesc = "Count number of non-delivered uops to Resource Allocation Table (RAT)", .ucode = 0x100, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_inst_retired[]={ { .uname = "ANY_P", .udesc = "Number of instructions retired. General Counter - architectural event", .ucode = 0x000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL", .udesc = "Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution (Precise Event)", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "TOTAL_CYCLES", .udesc = "Number of cycles using always true condition", .ucode = 0x100 | INTEL_X86_MOD_INV | (10 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=10 */ .uequiv = "ALL:i=1:c=10", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, { .uname = "PREC_DIST", .udesc = "Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution", .ucode = 0x100, .uequiv = "ALL", .ucntmsk= 0x2, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_int_misc[]={ { .uname = "RECOVERY_CYCLES", .udesc = "Number of cycles waiting for Machine Clears except JEClear", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RECOVERY_STALLS_COUNT", .udesc = "Number of occurrences waiting for Machine Clears", .ucode = 0x300 | INTEL_X86_MOD_EDGE | (1 << INTEL_X86_CMASK_BIT), /* edge=1 cnt=1 */ .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_itlb[]={ { .uname = "ITLB_FLUSH", .udesc = "Flushing of the Instruction TLB (ITLB) pages independent of page size", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_l1d[]={ { .uname = "REPLACEMENT", .udesc = "L1D Data line replacements", .ucode = 0x100, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_l1d_pend_miss[]={ { .uname = "PENDING", .udesc = "Cycles with L1D load misses outstanding", .ucode = 0x100, .ucntmsk = 0x4, .uflags = INTEL_X86_DFL, }, { .uname = "PENDING_CYCLES", .udesc = "Cycles with L1D load misses outstanding", .ucode = 0x100 | (1 << INTEL_X86_CMASK_BIT), /* cnt=1 */ .uequiv = "PENDING:c=1", .ucntmsk = 0x4, .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_C, }, { .uname = "OCCURRENCES", .udesc = "Number L1D miss outstanding", .ucode = 0x100 | INTEL_X86_MOD_EDGE | (1 << INTEL_X86_CMASK_BIT), /* edge=1 cnt=1 */ .uequiv = "PENDING:c=1:e=1", .ucntmsk = 0x4, .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_E | _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_l2_demand_rqsts[]={ { .uname = "WB_HIT", .udesc = "WB requests that hit L2 cache", .ucode = 0x5000, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_l2_lines_in[]={ { .uname = "I", .udesc = "L2 cache lines in I state filling L2", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "S", .udesc = "L2 cache lines in S state filling L2", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "E", .udesc = "L2 cache lines in E state filling L2", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL", .udesc = "L2 cache lines filling L2", .ucode = 0x700, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "L2 cache lines filling L2", .uequiv = "ALL", .ucode = 0x700, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_l2_lines_out[]={ { .uname = "DEMAND_CLEAN", .udesc = "Number of clean L2 cachelines evicted by demand", .ucode = 0x500, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DIRTY", .udesc = "Number of dirty L2 cachelines evicted by demand", .ucode = 0x600, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_l2_rqsts[]={ { .uname = "DEMAND_DATA_RD_MISS", .udesc = "Demand Data Read requests that miss L2 cache", .ucode = 0x2100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_HIT", .udesc = "Demand Data Read requests that hit L2 cache", .ucode = 0x4100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_MISS", .udesc = "RFO requests that miss L2 cache", .ucode = 0x2200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_HIT", .udesc = "RFO requests that hit L2 cache", .ucode = 0x4200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_MISS", .udesc = "L2 cache misses when fetching instructions", .ucode = 0x2400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_MISS", .udesc = "All demand requests that miss the L2 cache", .ucode = 0x2700, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CODE_RD_HIT", .udesc = "L2 cache hits when fetching instructions, code reads", .ucode = 0x4400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "L2_PF_MISS", .udesc = "Requests from the L2 hardware prefetchers that miss L2 cache", .ucode = 0x3000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MISS", .udesc = "All requests that miss the L2 cache", .ucode = 0x3f00, .uflags = INTEL_X86_NCOMBO, }, { .uname = "L2_PF_HIT", .udesc = "Requests from the L2 hardware prefetchers that hit L2 cache", .ucode = 0x5000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_DATA_RD", .udesc = "Any data read request to L2 cache", .ucode = 0xe100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_RFO", .udesc = "Any data RFO request to L2 cache", .ucode = 0xe200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_CODE_RD", .udesc = "Any code read request to L2 cache", .ucode = 0xe400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DEMAND_REFERENCES", .udesc = "All demand requests to L2 cache ", .ucode = 0xe700, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "Any L2 HW prefetch request to L2 cache", .ucode = 0xf800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REFERENCES", .udesc = "All requests to L2 cache", .ucode = 0xff00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_l2_trans[]={ { .uname = "DEMAND_DATA_RD", .udesc = "Demand Data Read requests that access L2 cache", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RFO", .udesc = "RFO requests that access L2 cache", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CODE_RD", .udesc = "L2 cache accesses when fetching instructions", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_PF", .udesc = "L2 or L3 HW prefetches that access L2 cache, including rejects", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "L1D_WB", .udesc = "L1D writebacks that access L2 cache", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "L2_FILL", .udesc = "L2 fill requests that access L2 cache", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "L2_WB", .udesc = "L2 writebacks that access L2 cache", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_REQUESTS", .udesc = "Transactions accessing L2 pipe", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_ld_blocks[]={ { .uname = "STORE_FORWARD", .udesc = "Counts the number of loads blocked by overlapping with store buffer entries that cannot be forwarded", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NO_SR", .udesc = "number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_ld_blocks_partial[]={ { .uname = "ADDRESS_ALIAS", .udesc = "False dependencies in MOB due to partial compare on address", .ucode = 0x100, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_load_hit_pre[]={ { .uname = "SW_PF", .udesc = "Non software-prefetch load dispatches that hit FB allocated for software prefetch", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HW_PF", .udesc = "Non software-prefetch load dispatches that hit FB allocated for hardware prefetch", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_lock_cycles[]={ { .uname = "SPLIT_LOCK_UC_LOCK_DURATION", .udesc = "Cycles in which the L1D and L2 are locked, due to a UC lock or split lock", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "CACHE_LOCK_DURATION", .udesc = "cycles that the L1D is locked", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_longest_lat_cache[]={ { .uname = "MISS", .udesc = "Core-originated cacheable demand requests missed LLC - architectural event", .ucode = 0x4100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "REFERENCE", .udesc = "Core-originated cacheable demand requests that refer to LLC - architectural event", .ucode = 0x4f00, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_machine_clears[]={ { .uname = "MEMORY_ORDERING", .udesc = "Number of Memory Ordering Machine Clears detected", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SMC", .udesc = "Number of Self-modifying code (SMC) Machine Clears detected", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MASKMOV", .udesc = "This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_mem_load_uops_l3_hit_retired[]={ { .uname = "XSNP_MISS", .udesc = "Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_HIT", .udesc = "Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_HITM", .udesc = "Load had HitM Response from a core on same socket (shared L3). (Non PEBS", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "XSNP_NONE", .udesc = "Retired load uops which data sources were hits in L3 without snoops required", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_mem_load_uops_l3_miss_retired[]={ { .uname = "LOCAL_DRAM", .udesc = "Retired load uops missing L3 cache but hitting local memory", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_mem_load_uops_retired[]={ { .uname = "L1_HIT", .udesc = "Retired load uops with L1 cache hits as data source", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_HIT", .udesc = "Retired load uops with L2 cache hits as data source", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_HIT", .udesc = "Retired load uops with L3 cache hits as data source", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L1_MISS", .udesc = "Retired load uops which missed the L1D", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L2_MISS", .udesc = "Retired load uops which missed the L2. Unknown data source excluded", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "L3_MISS", .udesc = "Retired load uops which missed the L3", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "HIT_LFB", .udesc = "Retired load uops which missed L1 but hit line fill buffer (LFB)", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_mem_trans_retired[]={ { .uname = "LOAD_LATENCY", .udesc = "Memory load instructions retired above programmed clocks, minimum threshold value is 3 (Precise Event and ldlat required)", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT | INTEL_X86_DFL, }, { .uname = "LATENCY_ABOVE_THRESHOLD", .udesc = "Memory load instructions retired above programmed clocks, minimum threshold value is 3 (Precise Event and ldlat required)", .ucode = 0x100, .uequiv = "LOAD_LATENCY", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_LDLAT | INTEL_X86_NO_AUTOENCODE, }, }; static const intel_x86_umask_t hsw_mem_uops_retired[]={ { .uname = "STLB_MISS_LOADS", .udesc = "Load uops with true STLB miss retired to architected path", .ucode = 0x1100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STLB_MISS_STORES", .udesc = "Store uops with true STLB miss retired to architected path", .ucode = 0x1200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_LOADS", .udesc = "Load uops with locked access retired", .ucode = 0x2100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "LOCK_STORES", .udesc = "Store uops with locked access retired", .ucode = 0x2200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_LOADS", .udesc = "Line-splitted load uops retired", .ucode = 0x4100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "SPLIT_STORES", .udesc = "Line-splitted store uops retired", .ucode = 0x4200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_LOADS", .udesc = "All load uops retired", .ucode = 0x8100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ALL_STORES", .udesc = "All store uops retired", .ucode = 0x8200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, }; static const intel_x86_umask_t hsw_misalign_mem_ref[]={ { .uname = "LOADS", .udesc = "Speculative cache-line split load uops dispatched to the L1D", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STORES", .udesc = "Speculative cache-line split store-address uops dispatched to L1D", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_move_elimination[]={ { .uname = "INT_ELIMINATED", .udesc = "Number of integer Move Elimination candidate uops that were eliminated", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SIMD_ELIMINATED", .udesc = "Number of SIMD Move Elimination candidate uops that were eliminated", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "INT_NOT_ELIMINATED", .udesc = "Number of integer Move Elimination candidate uops that were not eliminated", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SIMD_NOT_ELIMINATED", .udesc = "Number of SIMD Move Elimination candidate uops that were not eliminated", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_offcore_requests[]={ { .uname = "DEMAND_DATA_RD", .udesc = "Demand data read requests sent to uncore (use with HT off only)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Demand code read requests sent to uncore (use with HT off only)", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Demand RFOs requests sent to uncore (use with HT off only)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_RD", .udesc = "Data read requests sent to uncore (use with HT off only)", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_other_assists[]={ { .uname = "AVX_TO_SSE", .udesc = "Number of transitions from AVX-256 to legacy SSE when penalty applicable", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SSE_TO_AVX", .udesc = "Number of transitions from legacy SSE to AVX-256 when penalty applicable", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ANY_WB_ASSIST", .udesc = "Number of times any microcode assist is invoked by HW upon uop writeback", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_resource_stalls[]={ { .uname = "ANY", .udesc = "Cycles Allocation is stalled due to Resource Related reason", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL", .udesc = "Cycles Allocation is stalled due to Resource Related reason", .ucode = 0x100, .uequiv = "ANY", .uflags = INTEL_X86_NCOMBO, }, { .uname = "RS", .udesc = "Stall cycles caused by absence of eligible entries in Reservation Station (RS)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SB", .udesc = "Cycles Allocator is stalled due to Store Buffer full (not including draining from synch)", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ROB", .udesc = "ROB full stall cycles", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_rob_misc_events[]={ { .uname = "LBR_INSERTS", .udesc = "Count each time an new Last Branch Record (LBR) is inserted", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_rs_events[]={ { .uname = "EMPTY_CYCLES", .udesc = "Cycles the Reservation Station (RS) is empty for this thread", .ucode = 0x100, .uflags = INTEL_X86_DFL, }, }; static const intel_x86_umask_t hsw_tlb_flush[]={ { .uname = "DTLB_THREAD", .udesc = "Count number of DTLB flushes of thread-specific entries", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STLB_ANY", .udesc = "Count number of any STLB flushes", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_uops_executed[]={ { .uname = "CORE", .udesc = "Number of uops executed from any thread", .ucode = 0x200, .uflags = INTEL_X86_DFL, }, { .uname = "STALL_CYCLES", .udesc = "Number of cycles with no uops executed", .ucode = 0x200 | INTEL_X86_MOD_INV | (1 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=1 */ .uequiv = "CORE:c=1:i=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_uops_executed_port[]={ { .uname = "PORT_0", .udesc = "Cycles which a Uop is executed on port 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_1", .udesc = "Cycles which a Uop is executed on port 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_2", .udesc = "Cycles which a Uop is executed on port 2", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_3", .udesc = "Cycles which a Uop is executed on port 3", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_4", .udesc = "Cycles which a Uop is executed on port 4", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_5", .udesc = "Cycles which a Uop is executed on port 5", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_6", .udesc = "Cycles which a Uop is executed on port 6", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_7", .udesc = "Cycles which a Uop is executed on port 7", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PORT_0_CORE", .udesc = "tbd", .ucode = 0x100 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_0:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_1_CORE", .udesc = "tbd", .ucode = 0x200 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_1:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_2_CORE", .udesc = "tbd", .ucode = 0x400 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_2:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_3_CORE", .udesc = "tbd", .ucode = 0x800 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_3:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_4_CORE", .udesc = "tbd", .ucode = 0x1000 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_4:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_5_CORE", .udesc = "tbd", .ucode = 0x2000 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_5:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_6_CORE", .udesc = "tbd", .ucode = 0x4000 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_6:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, { .uname = "PORT_7_CORE", .udesc = "tbd", .ucode = 0x8000 | INTEL_X86_MOD_ANY, /* any=1 */ .uequiv = "PORT_7:t=1", .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T, }, }; static const intel_x86_umask_t hsw_uops_issued[]={ { .uname = "ANY", .udesc = "Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS)", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "ALL", .udesc = "Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS)", .ucode = 0x100, .uequiv = "ANY", .uflags = INTEL_X86_NCOMBO, }, { .uname = "FLAGS_MERGE", .udesc = "Number of flags-merge uops being allocated. Such uops adds delay", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SLOW_LEA", .udesc = "Number of slow LEA or similar uops allocated. Such uop has 3 sources regardless if result of LEA instruction or not", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SINGLE_MUL", .udesc = "Number of Multiply packed/scalar single precision uops allocated", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STALL_CYCLES", .udesc = "Counts the number of cycles no uops issued by this thread", .ucode = 0x100 | INTEL_X86_MOD_INV | (1 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=1 */ .uequiv = "ANY:c=1:i=1", .uflags = INTEL_X86_NCOMBO, .ucntmsk = 0xf, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Counts the number of cycles no uops issued on this core", .ucode = 0x100 | INTEL_X86_MOD_ANY | INTEL_X86_MOD_INV | (1 << INTEL_X86_CMASK_BIT), /* any=1 inv=1 cnt=1 */ .uequiv = "ANY:c=1:i=1:t=1", .ucntmsk = 0xf, .uflags = INTEL_X86_NCOMBO, .modhw = _INTEL_X86_ATTR_T | _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, }; static const intel_x86_umask_t hsw_uops_retired[]={ { .uname = "ALL", .udesc = "All uops that actually retired", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS | INTEL_X86_DFL, }, { .uname = "ANY", .udesc = "All uops that actually retired", .ucode = 0x100, .uequiv = "ALL", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "RETIRE_SLOTS", .udesc = "number of retirement slots used non PEBS", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "STALL_CYCLES", .udesc = "Cycles no executable uops retired (Precise Event)", .ucode = 0x100 | INTEL_X86_MOD_INV | (1 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=1 */ .uequiv = "ALL:i=1:c=1", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, { .uname = "TOTAL_CYCLES", .udesc = "Number of cycles using always true condition applied to PEBS uops retired event", .ucode = 0x100 | INTEL_X86_MOD_INV | (10 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=10 */ .uequiv = "ALL:i=1:c=10", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, { .uname = "CORE_STALL_CYCLES", .udesc = "Cycles no executable uops retired on core (Precise Event)", .ucode = 0x100 | INTEL_X86_MOD_INV | (1 << INTEL_X86_CMASK_BIT), /* inv=1 cnt=1 */ .uequiv = "ALL:i=1:c=1:t=1", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C, }, { .uname = "STALL_OCCURRENCES", .udesc = "Number of transitions from stalled to unstalled execution (Precise Event)", .ucode = 0x100 | INTEL_X86_MOD_INV | INTEL_X86_MOD_EDGE| (1 << INTEL_X86_CMASK_BIT), /* inv=1 edge=1 cnt=1 */ .uequiv = "ALL:c=1:i=1:e=1", .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, .modhw = _INTEL_X86_ATTR_I | _INTEL_X86_ATTR_C | _INTEL_X86_ATTR_E, }, }; static const intel_x86_umask_t hsw_offcore_response[]={ { .uname = "DMND_DATA_RD", .udesc = "Request: number of demand and DCU prefetch data reads of full and partial cachelines as well as demand data page table entry cacheline reads. Does not count L2 data read prefetches or instruction fetches", .ucode = 1ULL << (0 + 8), .grpid = 0, }, { .uname = "DMND_RFO", .udesc = "Request: number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO prefetches", .ucode = 1ULL << (1 + 8), .grpid = 0, }, { .uname = "DMND_IFETCH", .udesc = "Request: number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches", .ucode = 1ULL << (2 + 8), .grpid = 0, }, { .uname = "WB", .udesc = "Request: number of writebacks (modified to exclusive) transactions", .ucode = 1ULL << (3 + 8), .grpid = 0, }, { .uname = "PF_DATA_RD", .udesc = "Request: number of data cacheline reads generated by L2 prefetchers", .ucode = 1ULL << (4 + 8), .grpid = 0, }, { .uname = "PF_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetchers", .ucode = 1ULL << (5 + 8), .grpid = 0, }, { .uname = "PF_IFETCH", .udesc = "Request: number of code reads generated by L2 prefetchers", .ucode = 1ULL << (6 + 8), .grpid = 0, }, { .uname = "PF_LLC_DATA_RD", .udesc = "Request: number of L3 prefetcher requests to L2 for loads", .ucode = 1ULL << (7 + 8), .grpid = 0, }, { .uname = "PF_LLC_RFO", .udesc = "Request: number of RFO requests generated by L2 prefetcher", .ucode = 1ULL << (8 + 8), .grpid = 0, }, { .uname = "PF_LLC_IFETCH", .udesc = "Request: number of L2 prefetcher requests to L3 for instruction fetches", .ucode = 1ULL << (9 + 8), .grpid = 0, }, { .uname = "BUS_LOCKS", .udesc = "Request: number bus lock and split lock requests", .ucode = 1ULL << (10 + 8), .grpid = 0, }, { .uname = "STRM_ST", .udesc = "Request: number of streaming store requests", .ucode = 1ULL << (11 + 8), .grpid = 0, }, { .uname = "OTHER", .udesc = "Request: counts one of the following transaction types, including L3 invalidate, I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, lock, unlock, split lock", .ucode = 1ULL << (15+8), .grpid = 0, }, { .uname = "ANY_IFETCH", .udesc = "Request: combination of PF_IFETCH | DMND_IFETCH | PF_LLC_IFETCH", .uequiv = "PF_IFETCH:DMND_IFETCH:PF_LLC_IFETCH", .ucode = 0x24100, .grpid = 0, }, { .uname = "ANY_REQUEST", .udesc = "Request: combination of all request umasks", .uequiv = "DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:PF_LLC_DATA_RD:PF_LLC_RFO:PF_LLC_IFETCH:BUS_LOCKS:STRM_ST:OTHER", .ucode = 0x8fff00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "ANY_DATA", .udesc = "Request: combination of DMND_DATA | PF_DATA_RD | PF_LLC_DATA_RD", .uequiv = "DMND_DATA_RD:PF_DATA_RD:PF_LLC_DATA_RD", .ucode = 0x9100, .grpid = 0, }, { .uname = "ANY_RFO", .udesc = "Request: combination of DMND_RFO | PF_RFO | PF_LLC_RFO", .uequiv = "DMND_RFO:PF_RFO:PF_LLC_RFO", .ucode = 0x10300, .grpid = 0, }, { .uname = "ANY_RESPONSE", .udesc = "Response: count any response type", .ucode = 1ULL << (16+8), .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL | INTEL_X86_EXCL_GRP_GT, .grpid = 1, }, { .uname = "NO_SUPP", .udesc = "Supplier: counts number of times supplier information is not available", .ucode = 1ULL << (17+8), .grpid = 1, }, { .uname = "LLC_HITM", .udesc = "Supplier: counts L3 hits in M-state (initial lookup)", .ucode = 1ULL << (18+8), .grpid = 1, }, { .uname = "LLC_HITE", .udesc = "Supplier: counts L3 hits in E-state", .ucode = 1ULL << (19+8), .grpid = 1, }, { .uname = "LLC_HITS", .udesc = "Supplier: counts L3 hits in S-state", .ucode = 1ULL << (20+8), .grpid = 1, }, { .uname = "LLC_HITF", .udesc = "Supplier: counts L3 hits in F-state", .ucode = 1ULL << (21+8), .grpid = 1, }, { .uname = "LLC_MISS_LOCAL", .udesc = "Supplier: counts L3 misses to local DRAM", .ucode = 1ULL << (22+8), .grpid = 1, }, { .uname = "LLC_HITMESF", .udesc = "Supplier: counts L3 hits in any state (M, E, S, F)", .ucode = 0xfULL << (18+8), .uequiv = "LLC_HITM:LLC_HITE:LLC_HITS:LLC_HITF", .grpid = 1, }, { .uname = "SNP_NONE", .udesc = "Snoop: counts number of times no snoop-related information is available", .ucode = 1ULL << (31+8), .grpid = 2, }, { .uname = "SNP_NOT_NEEDED", .udesc = "Snoop: counts the number of times no snoop was needed to satisfy the request", .ucode = 1ULL << (32+8), .grpid = 2, }, { .uname = "SNP_MISS", .udesc = "Snoop: counts number of times a snoop was needed and it missed all snooped caches", .ucode = 1ULL << (33+8), .grpid = 2, }, { .uname = "SNP_NO_FWD", .udesc = "Snoop: counts number of times a snoop was needed and it hit in at leas one snooped cache", .ucode = 1ULL << (34+8), .grpid = 2, }, { .uname = "SNP_FWD", .udesc = "Snoop: counts number of times a snoop was needed and data was forwarded from a remote socket", .ucode = 1ULL << (35+8), .grpid = 2, }, { .uname = "HITM", .udesc = "Snoop: counts number of times a snoop was needed and it hitM-ed in local or remote cache", .ucode = 1ULL << (36+8), .grpid = 2, }, { .uname = "NON_DRAM", .udesc = "Snoop: counts number of times target was a non-DRAM system address. This includes MMIO transactions", .ucode = 1ULL << (37+8), .grpid = 2, }, { .uname = "SNP_ANY", .udesc = "Snoop: any snoop reason", .ucode = 0x7fULL << (31+8), .uequiv = "SNP_NONE:SNP_NOT_NEEDED:SNP_MISS:SNP_NO_FWD:SNP_FWD:HITM:NON_DRAM", .uflags= INTEL_X86_DFL, .grpid = 2, }, }; static const intel_x86_umask_t hsw_hle_retired[]={ { .uname = "START", .udesc = "Number of times an HLE execution started", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "COMMIT", .udesc = "Number of times an HLE execution successfully committed", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED", .udesc = "Number of times an HLE execution aborted due to any reasons (multiple categories may count as one) (Precise Event)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ABORTED_MISC1", .udesc = "Number of times an HLE execution aborted due to various memory events", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC2", .udesc = "Number of times an HLE execution aborted due to uncommon conditions", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC3", .udesc = "Number of times an HLE execution aborted due to HLE-unfriendly instructions", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC4", .udesc = "Number of times an HLE execution aborted due to incomptaible memory type", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC5", .udesc = "Number of times an HLE execution aborted due to none of the other 4 reasons (e.g., interrupt)", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_rtm_retired[]={ { .uname = "START", .udesc = "Number of times an RTM execution started", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "COMMIT", .udesc = "Number of times an RTM execution successfully committed", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED", .udesc = "Number of times an RTM execution aborted due to any reasons (multiple categories may count as one) (Precise Event)", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO | INTEL_X86_PEBS, }, { .uname = "ABORTED_MISC1", .udesc = "Number of times an RTM execution aborted due to various memory events", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC2", .udesc = "Number of times an RTM execution aborted due to uncommon conditions", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC3", .udesc = "Number of times an RTM execution aborted due to RTM-unfriendly instructions", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC4", .udesc = "Number of times an RTM execution aborted due to incomptaible memory type", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORTED_MISC5", .udesc = "Number of times an RTM execution aborted due to none of the other 4 reasons (e.g., interrupt)", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_tx_mem[]={ { .uname = "ABORT_CONFLICT", .udesc = "Number of times a transactional abort was signaled due to data conflict on a transactionally accessed address", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_CAPACITY", .udesc = "Number of times a transactional abort was signaled due to data capacity limitation", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_HLE_STORE_TO_ELIDED_LOCK", .udesc = "Number of times a HLE transactional execution aborted due to a non xrelease prefixed instruction writing to an elided lock in the elision buffer", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_HLE_ELISION_BUFFER_NOT_EMPTY", .udesc = "Number of times a HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_HLE_ELISION_BUFFER_MISMATCH", .udesc = "Number of times a HLE transaction execution aborted due to xrelease lock not satisfying the address and value requirements in the elision buffer", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT", .udesc = "Number of times a HLE transaction execution aborted due to an unsupported read alignment from the elision buffer", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ABORT_HLE_ELISION_BUFFER_FULL", .udesc = "Number of times a HLE clock could not be elided due to ElisionBufferAvailable being zero", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_tx_exec[]={ { .uname = "MISC1", .udesc = "Number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MISC2", .udesc = "Number of times a class of instructions that may cause a transactional abort was executed inside a transactional region", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MISC3", .udesc = "Number of times an instruction execution caused the supported nest count to be exceeded", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "MISC4", .udesc = "Number of times an instruction with HLE xacquire prefix was executed inside a RTM transactional region", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_offcore_requests_outstanding[]={ { .uname = "ALL_DATA_RD_CYCLES", .udesc = "Cycles with cacheable data read transactions in the superQ (use with HT off only)", .uequiv = "ALL_DATA_RD:c=1", .ucode = 0x800 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD_CYCLES", .udesc = "Cycles with demand code reads transactions in the superQ (use with HT off only)", .uequiv = "DEMAND_CODE_RD:c=1", .ucode = 0x200 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD_CYCLES", .udesc = "Cycles with demand data read transactions in the superQ (use with HT off only)", .uequiv = "DEMAND_DATA_RD:c=1", .ucode = 0x100 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, { .uname = "ALL_DATA_RD", .udesc = "Cacheable data read transactions in the superQ every cycle (use with HT off only)", .ucode = 0x800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_CODE_RD", .udesc = "Code read transactions in the superQ every cycle (use with HT off only)", .ucode = 0x200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_DATA_RD", .udesc = "Demand data read transactions in the superQ every cycle (use with HT off only)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO", .udesc = "Outstanding RFO (store) transactions in the superQ every cycle (use with HT off only)", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DEMAND_RFO_CYCLES", .udesc = "Cycles with outstanding RFO (store) transactions in the superQ (use with HT off only)", .uequiv = "DEMAND_RFO:c=1", .ucode = 0x400 | (0x1 << INTEL_X86_CMASK_BIT), .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_ild_stall[]={ { .uname = "LCP", .udesc = "Stall caused by changing prefix length of the instruction", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "IQ_FULL", .udesc = "Stall cycles due to IQ full", .ucode = 0x400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_page_walker_loads[]={ { .uname = "DTLB_L1", .udesc = "Number of DTLB page walker loads that hit in the L1D and line fill buffer", .ucode = 0x1100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ITLB_L1", .udesc = "Number of ITLB page walker loads that hit in the L1I and line fill buffer", .ucode = 0x2100, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DTLB_L2", .udesc = "Number of DTLB page walker loads that hit in the L2", .ucode = 0x1200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ITLB_L2", .udesc = "Number of ITLB page walker loads that hit in the L2", .ucode = 0x2200, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DTLB_L3", .udesc = "Number of DTLB page walker loads that hit in the L3", .ucode = 0x1400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ITLB_L3", .udesc = "Number of ITLB page walker loads that hit in the L3", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, { .uname = "DTLB_MEMORY", .udesc = "Number of DTLB page walker loads that hit memory", .ucode = 0x1800, .uflags= INTEL_X86_NCOMBO, }, { .uname = "ITLB_MEMORY", .udesc = "Number of ITLB page walker loads that hit memory", .ucode = 0x2800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t hsw_lsd[]={ { .uname = "UOPS", .udesc = "Number of uops delivered by the Loop Stream Detector (LSD)", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_entry_t intel_hsw_pe[]={ { .name = "UNHALTED_CORE_CYCLES", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0x20000000full, .code = 0x3c, }, { .name = "UNHALTED_REFERENCE_CYCLES", .desc = "Unhalted reference cycles", .modmsk = INTEL_FIXED3_ATTRS, .cntmsk = 0x400000000ull, .code = 0x0300, /* pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "INSTRUCTION_RETIRED", .desc = "Number of instructions at retirement", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "INSTRUCTIONS_RETIRED", .desc = "This is an alias for INSTRUCTION_RETIRED", .modmsk = INTEL_V4_ATTRS, .equiv = "INSTRUCTION_RETIRED", .cntmsk = 0x10000000full, .code = 0xc0, }, { .name = "BRANCH_INSTRUCTIONS_RETIRED", .desc = "Count branch instructions at retirement. Specifically, this event counts the retirement of the last micro-op of a branch instruction", .modmsk = INTEL_V4_ATTRS, .equiv = "BR_INST_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc4, }, { .name = "MISPREDICTED_BRANCH_RETIRED", .desc = "Count mispredicted branch instructions at retirement. Specifically, this event counts at retirement of the last micro-op of a branch instruction in the architectural path of the execution and experienced misprediction in the branch prediction hardware", .modmsk = INTEL_V4_ATTRS, .equiv = "BR_MISP_RETIRED:ALL_BRANCHES", .cntmsk = 0xff, .code = 0xc5, }, { .name = "BACLEARS", .desc = "Branch resteered", .code = 0xe6, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_baclears), .umasks = hsw_baclears }, { .name = "BR_INST_EXEC", .desc = "Branch instructions executed", .code = 0x88, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_br_inst_exec), .umasks = hsw_br_inst_exec }, { .name = "BR_INST_RETIRED", .desc = "Branch instructions retired (Precise Event)", .code = 0xc4, .cntmsk = 0xff, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_br_inst_retired), .umasks = hsw_br_inst_retired }, { .name = "BR_MISP_EXEC", .desc = "Mispredicted branches executed", .code = 0x89, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_br_misp_exec), .umasks = hsw_br_misp_exec }, { .name = "BR_MISP_RETIRED", .desc = "Mispredicted retired branches (Precise Event)", .code = 0xc5, .cntmsk = 0xff, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_br_misp_retired), .umasks = hsw_br_misp_retired }, { .name = "CPL_CYCLES", .desc = "Unhalted core cycles at a specific ring level", .code = 0x5c, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_cpl_cycles), .umasks = hsw_cpl_cycles }, { .name = "CPU_CLK_THREAD_UNHALTED", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .code = 0x3c, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_cpu_clk_thread_unhalted), .umasks = hsw_cpu_clk_thread_unhalted }, { .name = "CPU_CLK_UNHALTED", .desc = "Count core clock cycles whenever the clock signal on the specific core is running (not halted)", .code = 0x3c, .cntmsk = 0xff, .modmsk = INTEL_V4_ATTRS, .equiv = "CPU_CLK_THREAD_UNHALTED", }, { .name = "CYCLE_ACTIVITY", .desc = "Stalled cycles", .code = 0xa3, .cntmsk = 0xf, .ngrp = 1, .modmsk = INTEL_V4_ATTRS & ~_INTEL_X86_ATTR_C, .numasks = LIBPFM_ARRAY_SIZE(hsw_cycle_activity), .umasks = hsw_cycle_activity }, { .name = "DTLB_LOAD_MISSES", .desc = "Data TLB load misses", .code = 0x8, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_dtlb_load_misses), .umasks = hsw_dtlb_load_misses }, { .name = "DTLB_STORE_MISSES", .desc = "Data TLB store misses", .code = 0x49, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_dtlb_load_misses), .umasks = hsw_dtlb_load_misses /* shared */ }, { .name = "FP_ASSIST", .desc = "X87 floating-point assists", .code = 0xca, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_fp_assist), .umasks = hsw_fp_assist }, { .name = "HLE_RETIRED", .desc = "HLE execution (Precise Event)", .code = 0xc8, .cntmsk = 0xff, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_hle_retired), .umasks = hsw_hle_retired }, { .name = "ICACHE", .desc = "Instruction Cache", .code = 0x80, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_icache), .umasks = hsw_icache }, { .name = "IDQ", .desc = "IDQ operations", .code = 0x79, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_idq), .umasks = hsw_idq }, { .name = "IDQ_UOPS_NOT_DELIVERED", .desc = "Uops not delivered", .code = 0x9c, .cntmsk = 0xf, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_idq_uops_not_delivered), .umasks = hsw_idq_uops_not_delivered }, { .name = "INST_RETIRED", .desc = "Number of instructions retired (Precise Event)", .code = 0xc0, .cntmsk = 0xff, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_inst_retired), .umasks = hsw_inst_retired }, { .name = "INT_MISC", .desc = "Miscelleanous interruptions", .code = 0xd, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_int_misc), .umasks = hsw_int_misc }, { .name = "ITLB", .desc = "Instruction TLB", .code = 0xae, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_itlb), .umasks = hsw_itlb }, { .name = "ITLB_MISSES", .desc = "Instruction TLB misses", .code = 0x85, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_itlb_misses), .umasks = hsw_itlb_misses }, { .name = "L1D", .desc = "L1D cache", .code = 0x51, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l1d), .umasks = hsw_l1d }, { .name = "L1D_PEND_MISS", .desc = "L1D pending misses", .code = 0x48, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l1d_pend_miss), .umasks = hsw_l1d_pend_miss }, { .name = "L2_DEMAND_RQSTS", .desc = "Demand Data Read requests to L2", .code = 0x27, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l2_demand_rqsts), .umasks = hsw_l2_demand_rqsts }, { .name = "L2_LINES_IN", .desc = "L2 lines allocated", .code = 0xf1, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l2_lines_in), .umasks = hsw_l2_lines_in }, { .name = "L2_LINES_OUT", .desc = "L2 lines evicted", .code = 0xf2, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l2_lines_out), .umasks = hsw_l2_lines_out }, { .name = "L2_RQSTS", .desc = "L2 requests", .code = 0x24, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l2_rqsts), .umasks = hsw_l2_rqsts }, { .name = "L2_TRANS", .desc = "L2 transactions", .code = 0xf0, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_l2_trans), .umasks = hsw_l2_trans }, { .name = "LD_BLOCKS", .desc = "Blocking loads", .code = 0x3, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_ld_blocks), .umasks = hsw_ld_blocks }, { .name = "LD_BLOCKS_PARTIAL", .desc = "Partial load blocks", .code = 0x7, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_ld_blocks_partial), .umasks = hsw_ld_blocks_partial }, { .name = "LOAD_HIT_PRE", .desc = "Load dispatches", .code = 0x4c, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_load_hit_pre), .umasks = hsw_load_hit_pre }, { .name = "LOCK_CYCLES", .desc = "Locked cycles in L1D and L2", .code = 0x63, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_lock_cycles), .umasks = hsw_lock_cycles }, { .name = "LONGEST_LAT_CACHE", .desc = "L3 cache", .code = 0x2e, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_longest_lat_cache), .umasks = hsw_longest_lat_cache }, { .name = "MACHINE_CLEARS", .desc = "Machine clear asserted", .code = 0xc3, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_machine_clears), .umasks = hsw_machine_clears }, { .name = "MEM_LOAD_UOPS_L3_HIT_RETIRED", .desc = "L3 hit load uops retired (Precise Event)", .code = 0xd2, .cntmsk = 0xf, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_mem_load_uops_l3_hit_retired), .umasks = hsw_mem_load_uops_l3_hit_retired }, { .name = "MEM_LOAD_UOPS_L3_MISS_RETIRED", .desc = "Load uops retired that missed the L3 (Precise Event)", .code = 0xd3, .cntmsk = 0xf, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_mem_load_uops_l3_miss_retired), .umasks = hsw_mem_load_uops_l3_miss_retired }, { .name = "MEM_LOAD_UOPS_RETIRED", .desc = "Retired load uops (Precise Event)", .code = 0xd1, .cntmsk = 0xf, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_mem_load_uops_retired), .umasks = hsw_mem_load_uops_retired }, { .name = "MEM_TRANS_RETIRED", .desc = "Memory transactions retired (Precise Event)", .code = 0xcd, .cntmsk = 0x8, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS | _INTEL_X86_ATTR_LDLAT, .numasks = LIBPFM_ARRAY_SIZE(hsw_mem_trans_retired), .umasks = hsw_mem_trans_retired }, { .name = "MEM_UOPS_RETIRED", .desc = "Memory uops retired (Precise Event)", .code = 0xd0, .cntmsk = 0xf, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_mem_uops_retired), .umasks = hsw_mem_uops_retired }, { .name = "MISALIGN_MEM_REF", .desc = "Misaligned memory references", .code = 0x5, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_misalign_mem_ref), .umasks = hsw_misalign_mem_ref }, { .name = "MOVE_ELIMINATION", .desc = "Move Elimination", .code = 0x58, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_move_elimination), .umasks = hsw_move_elimination }, { .name = "OFFCORE_REQUESTS", .desc = "Demand Data Read requests sent to uncore", .code = 0xb0, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_offcore_requests), .umasks = hsw_offcore_requests }, { .name = "OTHER_ASSISTS", .desc = "Software assist", .code = 0xc1, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_other_assists), .umasks = hsw_other_assists }, { .name = "RESOURCE_STALLS", .desc = "Cycles Allocation is stalled due to Resource Related reason", .code = 0xa2, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_resource_stalls), .umasks = hsw_resource_stalls }, { .name = "ROB_MISC_EVENTS", .desc = "ROB miscellaneous events", .code = 0xcc, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_rob_misc_events), .umasks = hsw_rob_misc_events }, { .name = "RS_EVENTS", .desc = "Reservation Station", .code = 0x5e, .cntmsk = 0xf, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_rs_events), .umasks = hsw_rs_events }, { .name = "RTM_RETIRED", .desc = "Restricted Transaction Memory execution (Precise Event)", .code = 0xc9, .cntmsk = 0xf, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_rtm_retired), .umasks = hsw_rtm_retired }, { .name = "TLB_FLUSH", .desc = "TLB flushes", .code = 0xbd, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_tlb_flush), .umasks = hsw_tlb_flush }, { .name = "UOPS_EXECUTED", .desc = "Uops executed", .code = 0xb1, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_uops_executed), .umasks = hsw_uops_executed }, { .name = "LSD", .desc = "Loop stream detector", .code = 0xa8, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_lsd), .umasks = hsw_lsd, }, { .name = "UOPS_EXECUTED_PORT", .desc = "Uops dispatch to specific ports", .code = 0xa1, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_uops_executed_port), .umasks = hsw_uops_executed_port }, { .name = "UOPS_ISSUED", .desc = "Uops issued", .code = 0xe, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_uops_issued), .umasks = hsw_uops_issued }, { .name = "UOPS_RETIRED", .desc = "Uops retired (Precise Event)", .code = 0xc2, .cntmsk = 0xff, .ngrp = 1, .flags = INTEL_X86_PEBS, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_uops_retired), .umasks = hsw_uops_retired }, { .name = "TX_MEM", .desc = "Transactional memory aborts", .code = 0x54, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_tx_mem), .umasks = hsw_tx_mem, }, { .name = "TX_EXEC", .desc = "Transactional execution", .code = 0x5d, .cntmsk = 0xff, .ngrp = 1, .modmsk = INTEL_V4_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(hsw_tx_exec), .umasks = hsw_tx_exec }, { .name = "OFFCORE_REQUESTS_OUTSTANDING", .desc = "Outstanding offcore requests", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0xff, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(hsw_offcore_requests_outstanding), .ngrp = 1, .umasks = hsw_offcore_requests_outstanding, }, { .name = "ILD_STALL", .desc = "Instruction Length Decoder stalls", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0xff, .code = 0x87, .numasks = LIBPFM_ARRAY_SIZE(hsw_ild_stall), .ngrp = 1, .umasks = hsw_ild_stall, }, { .name = "PAGE_WALKER_LOADS", .desc = "Page walker loads", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0xff, .code = 0xbc, .numasks = LIBPFM_ARRAY_SIZE(hsw_page_walker_loads), .ngrp = 1, .umasks = hsw_page_walker_loads, }, { .name = "OFFCORE_RESPONSE_0", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0xf, .code = 0x1b7, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(hsw_offcore_response), .ngrp = 3, .umasks = hsw_offcore_response, }, { .name = "OFFCORE_RESPONSE_1", .desc = "Offcore response event (must provide at least one request type and either any_response or any combination of supplier + snoop)", .modmsk = INTEL_V4_ATTRS, .cntmsk = 0xf, .code = 0x1bb, .flags= INTEL_X86_NHM_OFFCORE, .numasks = LIBPFM_ARRAY_SIZE(hsw_offcore_response), .ngrp = 3, .umasks = hsw_offcore_response, /* identical to actual umasks list for this event */ }, }; papi-5.3.0/src/libpfm4/lib/events/s390x_cpumf_events.h0000600003276200002170000003774112247131124022166 0ustar ralphundrgrad#ifndef __S390X_CPUMF_EVENTS_H__ #define __S390X_CPUMF_EVENTS_H__ #define __stringify(x) #x #define STRINGIFY(x) __stringify(x) /* CPUMF counter sets */ #define CPUMF_CTRSET_BASIC 0 #define CPUMF_CTRSET_PROBLEM_STATE 1 #define CPUMF_CTRSET_CRYPTO 2 #define CPUMF_CTRSET_EXTENDED 3 static const pme_cpumf_ctr_t cpumf_generic_ctr[] = { /* Basic counter set */ { .ctrnum = 0, .ctrset = CPUMF_CTRSET_BASIC, .name = "CPU_CYCLES", .desc = "Cycle Count", }, { .ctrnum = 1, .ctrset = CPUMF_CTRSET_BASIC, .name = "INSTRUCTIONS", .desc = "Instruction Count", }, { .ctrnum = 2, .ctrset = CPUMF_CTRSET_BASIC, .name = "L1I_DRCT_WRITES", .desc = "Level-1 I-Cache Directory Write Count", }, { .ctrnum = 3, .ctrset = CPUMF_CTRSET_BASIC, .name = "L1I_PENALTY_CYCLES", .desc = "Level-1 I-Cache Penalty Cycle Count", }, { .ctrnum = 4, .ctrset = CPUMF_CTRSET_BASIC, .name = "L1D_DRCT_WRITES", .desc = "Level-1 D-Cache Directory Write Count", }, { .ctrnum = 5, .ctrset = CPUMF_CTRSET_BASIC, .name = "L1D_PENALTY_CYCLES", .desc = "Level-1 D-Cache Penalty Cycle Count", }, /* Problem-state counter set */ { .ctrnum = 32, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_CPU_CYCLES", .desc = "Problem-State Cycle Count", }, { .ctrnum = 33, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_INSTRUCTIONS", .desc = "Problem-State Instruction Count", }, { .ctrnum = 34, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_L1I_DRCT_WRITES", .desc = "Problem-State Level-1 I-Cache Directory Write Count", }, { .ctrnum = 35, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_L1I_PENALTY_CYCLES", .desc = "Problem-State Level-1 I-Cache Penalty Cycle Count", }, { .ctrnum = 36, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_L1D_DRCT_WRITES", .desc = "Problem-State Level-1 D-Cache Directory Write Count", }, { .ctrnum = 37, .ctrset = CPUMF_CTRSET_PROBLEM_STATE, .name = "PROBLEM_STATE_L1D_PENALTY_CYCLES", .desc = "Problem-State Level-1 D-Cache Penalty Cycle Count", }, /* Crypto-activity counter set */ { .ctrnum = 64, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "PRNG_FUNCTIONS", .desc = "Total number of the PRNG functions issued by the CPU", }, { .ctrnum = 65, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "PRNG_CYCLES", .desc = "Total number of CPU cycles when the DEA/AES " "coprocessor is busy performing PRNG functions " "issued by the CPU", }, { .ctrnum = 66, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "PRNG_BLOCKED_FUNCTIONS", .desc = "Total number of the PRNG functions that are issued " "by the CPU and are blocked because the DEA/AES " "coprocessor is busy performing a function issued " "by another CPU", }, { .ctrnum = 67, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "PRNG_BLOCKED_CYCLES", .desc = "Total number of CPU cycles blocked for the PRNG " "functions issued by the CPU because the DEA/AES " "coprocessor is busy performing a function issued " "by another CPU", }, { .ctrnum = 68, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "SHA_FUNCTIONS", .desc = "Total number of SHA functions issued by the CPU", }, { .ctrnum = 69, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "SHA_CYCLES", .desc = "Total number of CPU cycles when the SHA coprocessor " "is busy performing the SHA functions issued by the " "CPU", }, { .ctrnum = 70, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "SHA_BLOCKED_FUNCTIONS", .desc = "Total number of the SHA functions that are issued by " "the CPU and are blocked because the SHA coprocessor " "is busy performing a function issued by another CPU", }, { .ctrnum = 71, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "SHA_BLOCKED_CYCLES", .desc = "Total number of CPU cycles blocked for the SHA " "functions issued by the CPU because the SHA " "coprocessor is busy performing a function issued by " "another CPU", }, { .ctrnum = 72, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "DEA_FUNCTIONS", .desc = "Total number of the DEA functions issued by the CPU", }, { .ctrnum = 73, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "DEA_CYCLES", .desc = "Total number of CPU cycles when the DEA/AES coprocessor" " is busy performing the DEA functions issued by the CPU", }, { .ctrnum = 74, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "DEA_BLOCKED_FUNCTIONS", .desc = "Total number of the DEA functions that are issued by " "the CPU and are blocked because the DEA/AES coprocessor" " is busy performing a function issued by another CPU", }, { .ctrnum = 75, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "DEA_BLOCKED_CYCLES", .desc = "Total number of CPU cycles blocked for the DEA functions" " issued by the CPU because the DEA/AES coprocessor is " "busy performing a function issued by another CPU", }, { .ctrnum = 76, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "AES_FUNCTIONS", .desc = "Total number of AES functions issued by the CPU", }, { .ctrnum = 77, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "AES_CYCLES", .desc = "Total number of CPU cycles when the DEA/AES coprocessor" " is busy performing the AES functions issued by the CPU", }, { .ctrnum = 78, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "AES_BLOCKED_FUNCTIONS", .desc = "Total number of AES functions that are issued by the CPU" " and are blocked because the DEA/AES coprocessor is" " busy performing a function issued by another CPU", }, { .ctrnum = 79, .ctrset = CPUMF_CTRSET_CRYPTO, .name = "AES_BLOCKED_CYCLES", .desc = "Total number of CPU cycles blocked for the AES functions" " issued by the CPU because the DEA/AES coprocessor is" " busy performing a function issued by another CPU", }, }; /* Extended counter set for IBM System z10 */ static const pme_cpumf_ctr_t cpumf_ctr_set_ext_z10[] = { { .ctrnum = 128, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_L2_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from the" " Level-2 (L1.5) cache", }, { .ctrnum = 129, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_L2_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the installed cache line was sourced from the" " Level-2 (L1.5) cache", }, { .ctrnum = 130, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_L3_LOCAL_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the installed cache line was sourced from the" " Level-3 cache that is on the same book as the" " Instruction cache (Local L2 cache)", }, { .ctrnum = 131, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_L3_LOCAL_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the installtion cache line was source from the" " Level-3 cache that is on the same book as the Data" " cache (Local L2 cache)", }, { .ctrnum = 132, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_L3_REMOTE_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the installed cache line was sourced from a" " Level-3 cache that is not on the same book as the" " Instruction cache (Remote L2 cache)", }, { .ctrnum = 133, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_L3_REMOTE_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the installed cache line was sourced from a" " Level-3 cache that is not on the same book as the" " Data cache (Remote L2 cache)", }, { .ctrnum = 134, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_LMEM_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the installed cache line was sourced from" " memory that is attached to the same book as the" " Data cache (Local Memory)", }, { .ctrnum = 135, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_LMEM_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache where the" " installed cache line was sourced from memory that" " is attached to the s ame book as the Instruction" " cahe (local Memory)", }, { .ctrnum = 136, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_RO_EXCL_WRITES", .desc = "A directory write to the Level-1 D-Cache where the" " line was originally in a Read-Only state in the" " cache but has been updated to be in the Exclusive" " state that allows stores to the cache line", }, { .ctrnum = 137, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_CACHELINE_INVALIDATES", .desc = "A cache line in the Level-1 I-Cache has been" " invalidated by a store on the same CPU as the Level-1" " I-Cache", }, { .ctrnum = 138, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "ITLB1_WRITES", .desc = "A translation entry has been written into the Level-1" " Instruction Translation Lookaside Buffer", }, { .ctrnum = 139, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "DTLB1_WRITES", .desc = "A translation entry has been written to the Level-1" " Data Translation Lookaside Buffer", }, { .ctrnum = 140, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_PTE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Page Table Entry arrays", }, { .ctrnum = 141, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_CRSTE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Common Region Segment Table Entry arrays", }, { .ctrnum = 142, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_CRSTE_HPAGE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Common Region Segment Table Entry arrays for a" " one-megabyte large page translation", }, { .ctrnum = 145, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "ITLB1_MISSES", .desc = "Level-1 Instruction TLB miss in progress. Incremented" " by one for every cycle an ITLB1 miss is in progress", }, { .ctrnum = 146, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "DTLB1_MISSES", .desc = "Level-1 Data TLB miss in progress. Incremented by one" " for every cycle an DTLB1 miss is in progress", }, { .ctrnum = 147, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L2C_STORES_SENT", .desc = "Incremented by one for every store sent to" " Level-2 (L1.5) cache", }, }; /* Extended counter set for IBM zEnterprise 196 */ static const pme_cpumf_ctr_t cpumf_ctr_set_ext_z196[] = { { .ctrnum = 128, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_L2_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from the" " Level-2 cache", }, { .ctrnum = 129, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_L2_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from" " the Level-2 cache", }, { .ctrnum = 130, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "DTLB1_MISSES", .desc = "Level-1 Data TLB miss in progress. Incremented by one" " for every cycle a DTLB1 miss is in progress.", }, { .ctrnum = 131, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "ITLB1_MISSES", .desc = "Level-1 Instruction TLB miss in progress. Incremented" " by one for every cycle a ITLB1 miss is in progress.", }, { .ctrnum = 133, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L2C_STORES_SENT", .desc = "Incremented by one for every store sent to" " Level-2 cache", }, { .ctrnum = 134, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_OFFBOOK_L3_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from an" " Off Book Level-3 cache", }, { .ctrnum = 135, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_ONBOOK_L4_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from an" " On Book Level-4 cache", }, { .ctrnum = 136, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_ONBOOK_L4_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from an" " On Book Level-4 cache", }, { .ctrnum = 137, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_RO_EXCL_WRITES", .desc = "A directory write to the Level-1 D-Cache where the" " line was originally in a Read-Only state in the" " cache but has been updated to be in the Exclusive" " state that allows stores to the cache line", }, { .ctrnum = 138, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_OFFBOOK_L4_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from an" " Off Book Level-4 cache", }, { .ctrnum = 139, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_OFFBOOK_L4_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from an" " Off Book Level-4 cache", }, { .ctrnum = 140, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "DTLB1_HPAGE_WRITES", .desc = "A translation entry has been written to the Level-1" " Data Translation Lookaside Buffer for a one-megabyte" " page", }, { .ctrnum = 141, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_LMEM_SOURCED_WRITES", .desc = "A directory write to the Level-1 D-Cache where the" " installed cache line was sourced from memory that" " is attached to the same book as the Data cache" " (Local Memory)", }, { .ctrnum = 142, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_LMEM_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache where the" " installed cache line was sourced from memory that" " is attached to the same book as the Instruction" " cache (Local Memory)", }, { .ctrnum = 143, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_OFFBOOK_L3_SOURCED_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from an" " Off Book Level-3 cache", }, { .ctrnum = 144, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "DTLB1_WRITES", .desc = "A translation entry has been written to the Level-1" " Data Translation Lookaside Buffer", }, { .ctrnum = 145, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "ITLB1_WRITES", .desc = "A translation entry has been written to the Level-1" " Instruction Translation Lookaside Buffer", }, { .ctrnum = 146, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_PTE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Page Table Entry arrays", }, { .ctrnum = 147, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_CRSTE_HPAGE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Common Region Segment Table Entry arrays for a" " one-megabyte large page translation", }, { .ctrnum = 148, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "TLB2_CRSTE_WRITES", .desc = "A translation entry has been written to the Level-2" " TLB Common Region Segment Table Entry arrays", }, { .ctrnum = 150, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_ONCHIP_L3_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from an" " On Chip Level-3 cache", }, { .ctrnum = 152, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1D_OFFCHIP_L3_WRITES", .desc = "A directory write to the Level-1 D-Cache directory" " where the returned cache line was sourced from an" " Off Chip/On Book Level-3 cache", }, { .ctrnum = 153, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_ONCHIP_L3_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from an" " On Chip Level-3 cache", }, { .ctrnum = 155, .ctrset = CPUMF_CTRSET_EXTENDED, .name = "L1I_OFFCHIP_L3_WRITES", .desc = "A directory write to the Level-1 I-Cache directory" " where the returned cache line was sourced from an" " Off Chip/On Book Level-3 cache", }, }; #if 0 { .ctrnum = , .ctrset = CPUMF_CTRSET_EXTENDED, .name = "", .desc = "", }, #endif #endif /* __S390X_CPUMF_EVENTS_H__ */ papi-5.3.0/src/libpfm4/lib/events/intel_wsm_unc_events.h0000600003276200002170000011444012247131123022743 0ustar ralphundrgrad/* * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: wsm_unc (Intel Westmere uncore) */ static const intel_x86_umask_t wsm_unc_unc_dram_open[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 open comamnds issued for read or write", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 open comamnds issued for read or write", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 open comamnds issued for read or write", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_gc_occupancy[]={ { .uname = "READ_TRACKER", .udesc = "In the read tracker", .ucode = 0x100, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_page_close[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 page close", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 page close", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 page close", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_page_miss[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 page miss", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 page miss", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 page miss", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_pre_all[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 precharge all commands", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 precharge all commands", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 precharge all commands", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_read_cas[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 read CAS commands", .ucode = 0x100, }, { .uname = "AUTOPRE_CH0", .udesc = "DRAM Channel 0 read CAS auto page close commands", .ucode = 0x200, }, { .uname = "CH1", .udesc = "DRAM Channel 1 read CAS commands", .ucode = 0x400, }, { .uname = "AUTOPRE_CH1", .udesc = "DRAM Channel 1 read CAS auto page close commands", .ucode = 0x800, }, { .uname = "CH2", .udesc = "DRAM Channel 2 read CAS commands", .ucode = 0x1000, }, { .uname = "AUTOPRE_CH2", .udesc = "DRAM Channel 2 read CAS auto page close commands", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_refresh[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 refresh commands", .ucode = 0x100, }, { .uname = "CH1", .udesc = "DRAM Channel 1 refresh commands", .ucode = 0x200, }, { .uname = "CH2", .udesc = "DRAM Channel 2 refresh commands", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_dram_write_cas[]={ { .uname = "CH0", .udesc = "DRAM Channel 0 write CAS commands", .ucode = 0x100, }, { .uname = "AUTOPRE_CH0", .udesc = "DRAM Channel 0 write CAS auto page close commands", .ucode = 0x200, }, { .uname = "CH1", .udesc = "DRAM Channel 1 write CAS commands", .ucode = 0x400, }, { .uname = "AUTOPRE_CH1", .udesc = "DRAM Channel 1 write CAS auto page close commands", .ucode = 0x800, }, { .uname = "CH2", .udesc = "DRAM Channel 2 write CAS commands", .ucode = 0x1000, }, { .uname = "AUTOPRE_CH2", .udesc = "DRAM Channel 2 write CAS auto page close commands", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_gq_alloc[]={ { .uname = "READ_TRACKER", .udesc = "GQ read tracker requests", .ucode = 0x100, }, { .uname = "RT_LLC_MISS", .udesc = "GQ read tracker LLC misses", .ucode = 0x200, }, { .uname = "RT_TO_LLC_RESP", .udesc = "GQ read tracker LLC requests", .ucode = 0x400, }, { .uname = "RT_TO_RTID_ACQUIRED", .udesc = "GQ read tracker LLC miss to RTID acquired", .ucode = 0x800, }, { .uname = "WT_TO_RTID_ACQUIRED", .udesc = "GQ write tracker LLC miss to RTID acquired", .ucode = 0x1000, }, { .uname = "WRITE_TRACKER", .udesc = "GQ write tracker LLC misses", .ucode = 0x2000, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "GQ peer probe tracker requests", .ucode = 0x4000, }, }; static const intel_x86_umask_t wsm_unc_unc_gq_cycles_full[]={ { .uname = "READ_TRACKER", .udesc = "Cycles GQ read tracker is full.", .ucode = 0x100, }, { .uname = "WRITE_TRACKER", .udesc = "Cycles GQ write tracker is full.", .ucode = 0x200, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "Cycles GQ peer probe tracker is full.", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_gq_cycles_not_empty[]={ { .uname = "READ_TRACKER", .udesc = "Cycles GQ read tracker is busy", .ucode = 0x100, }, { .uname = "WRITE_TRACKER", .udesc = "Cycles GQ write tracker is busy", .ucode = 0x200, }, { .uname = "PEER_PROBE_TRACKER", .udesc = "Cycles GQ peer probe tracker is busy", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_gq_data_from[]={ { .uname = "QPI", .udesc = "Cycles GQ data is imported from Quickpath interface", .ucode = 0x100, }, { .uname = "QMC", .udesc = "Cycles GQ data is imported from Quickpath memory interface", .ucode = 0x200, }, { .uname = "LLC", .udesc = "Cycles GQ data is imported from LLC", .ucode = 0x400, }, { .uname = "CORES_02", .udesc = "Cycles GQ data is imported from Cores 0 and 2", .ucode = 0x800, }, { .uname = "CORES_13", .udesc = "Cycles GQ data is imported from Cores 1 and 3", .ucode = 0x1000, }, }; static const intel_x86_umask_t wsm_unc_unc_gq_data_to[]={ { .uname = "QPI_QMC", .udesc = "Cycles GQ data sent to the QPI or QMC", .ucode = 0x100, }, { .uname = "LLC", .udesc = "Cycles GQ data sent to LLC", .ucode = 0x200, }, { .uname = "CORES", .udesc = "Cycles GQ data sent to cores", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_llc_hits[]={ { .uname = "READ", .udesc = "Number of LLC read hits", .ucode = 0x100, }, { .uname = "WRITE", .udesc = "Number of LLC write hits", .ucode = 0x200, }, { .uname = "PROBE", .udesc = "Number of LLC peer probe hits", .ucode = 0x400, }, { .uname = "ANY", .udesc = "Number of LLC hits", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_llc_lines_in[]={ { .uname = "M_STATE", .udesc = "LLC lines allocated in M state", .ucode = 0x100, }, { .uname = "E_STATE", .udesc = "LLC lines allocated in E state", .ucode = 0x200, }, { .uname = "S_STATE", .udesc = "LLC lines allocated in S state", .ucode = 0x400, }, { .uname = "F_STATE", .udesc = "LLC lines allocated in F state", .ucode = 0x800, }, { .uname = "ANY", .udesc = "LLC lines allocated", .ucode = 0xf00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_llc_lines_out[]={ { .uname = "M_STATE", .udesc = "LLC lines victimized in M state", .ucode = 0x100, }, { .uname = "E_STATE", .udesc = "LLC lines victimized in E state", .ucode = 0x200, }, { .uname = "S_STATE", .udesc = "LLC lines victimized in S state", .ucode = 0x400, }, { .uname = "I_STATE", .udesc = "LLC lines victimized in I state", .ucode = 0x800, }, { .uname = "F_STATE", .udesc = "LLC lines victimized in F state", .ucode = 0x1000, }, { .uname = "ANY", .udesc = "LLC lines victimized", .ucode = 0x1f00, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_llc_miss[]={ { .uname = "READ", .udesc = "Number of LLC read misses", .ucode = 0x100, }, { .uname = "WRITE", .udesc = "Number of LLC write misses", .ucode = 0x200, }, { .uname = "PROBE", .udesc = "Number of LLC peer probe misses", .ucode = 0x400, }, { .uname = "ANY", .udesc = "Number of LLC misses", .ucode = 0x300, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_address_conflicts[]={ { .uname = "2WAY", .udesc = "QHL 2 way address conflicts", .ucode = 0x200, }, { .uname = "3WAY", .udesc = "QHL 3 way address conflicts", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_conflict_cycles[]={ { .uname = "IOH", .udesc = "QHL IOH Tracker conflict cycles", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "QHL Remote Tracker conflict cycles", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "QHL Local Tracker conflict cycles", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_cycles_full[]={ { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker is full", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker is full", .ucode = 0x400, }, { .uname = "IOH", .udesc = "Cycles QHL IOH Tracker is full", .ucode = 0x100, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_cycles_not_empty[]={ { .uname = "IOH", .udesc = "Cycles QHL IOH is busy", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker is busy", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker is busy", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_frc_ack_cnflts[]={ { .uname = "LOCAL", .udesc = "QHL FrcAckCnflts sent to local home", .ucode = 0x400, .uflags= INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_sleeps[]={ { .uname = "IOH_ORDER", .udesc = "Due to IOH ordering (write after read) conflicts", .ucode = 0x100, }, { .uname = "REMOTE_ORDER", .udesc = "Due to remote socket ordering (write after read) conflicts", .ucode = 0x200, }, { .uname = "LOCAL_ORDER", .udesc = "Due to local socket ordering (write after read) conflicts", .ucode = 0x400, }, { .uname = "IOH_CONFLICT", .udesc = "Due to IOH address conflicts", .ucode = 0x800, }, { .uname = "REMOTE_CONFLICT", .udesc = "Due to remote socket address conflicts", .ucode = 0x1000, }, { .uname = "LOCAL_CONFLICT", .udesc = "Due to local socket address conflicts", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_occupancy[]={ { .uname = "IOH", .udesc = "Cycles QHL IOH Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x100, }, { .uname = "REMOTE", .udesc = "Cycles QHL Remote Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x200, }, { .uname = "LOCAL", .udesc = "Cycles QHL Local Tracker Allocate to Deallocate Read Occupancy", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_qhl_requests[]={ { .uname = "LOCAL_READS", .udesc = "Quickpath Home Logic local read requests", .ucode = 0x1000, }, { .uname = "LOCAL_WRITES", .udesc = "Quickpath Home Logic local write requests", .ucode = 0x2000, }, { .uname = "REMOTE_READS", .udesc = "Quickpath Home Logic remote read requests", .ucode = 0x400, }, { .uname = "IOH_READS", .udesc = "Quickpath Home Logic IOH read requests", .ucode = 0x100, }, { .uname = "IOH_WRITES", .udesc = "Quickpath Home Logic IOH write requests", .ucode = 0x200, }, { .uname = "REMOTE_WRITES", .udesc = "Quickpath Home Logic remote write requests", .ucode = 0x800, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_busy[]={ { .uname = "READ_CH0", .udesc = "Cycles QMC channel 0 busy with a read request", .ucode = 0x100, }, { .uname = "READ_CH1", .udesc = "Cycles QMC channel 1 busy with a read request", .ucode = 0x200, }, { .uname = "READ_CH2", .udesc = "Cycles QMC channel 2 busy with a read request", .ucode = 0x400, }, { .uname = "WRITE_CH0", .udesc = "Cycles QMC channel 0 busy with a write request", .ucode = 0x800, }, { .uname = "WRITE_CH1", .udesc = "Cycles QMC channel 1 busy with a write request", .ucode = 0x1000, }, { .uname = "WRITE_CH2", .udesc = "Cycles QMC channel 2 busy with a write request", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_cancel[]={ { .uname = "CH0", .udesc = "QMC channel 0 cancels", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 cancels", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 cancels", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC cancels", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_critical_priority_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 critical priority read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 critical priority read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 critical priority read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC critical priority read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_high_priority_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 high priority read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 high priority read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 high priority read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC high priority read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_isoc_full[]={ { .uname = "READ_CH0", .udesc = "Cycles DRAM channel 0 full with isochronous read requests", .ucode = 0x100, }, { .uname = "READ_CH1", .udesc = "Cycles DRAM channel 1 full with isochronous read requests", .ucode = 0x200, }, { .uname = "READ_CH2", .udesc = "Cycles DRAM channel 2 full with isochronous read requests", .ucode = 0x400, }, { .uname = "WRITE_CH0", .udesc = "Cycles DRAM channel 0 full with isochronous write requests", .ucode = 0x800, }, { .uname = "WRITE_CH1", .udesc = "Cycles DRAM channel 1 full with isochronous write requests", .ucode = 0x1000, }, { .uname = "WRITE_CH2", .udesc = "Cycles DRAM channel 2 full with isochronous write requests", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_imc_isoc_occupancy[]={ { .uname = "CH0", .udesc = "IMC channel 0 isochronous read request occupancy", .ucode = 0x100, }, { .uname = "CH1", .udesc = "IMC channel 1 isochronous read request occupancy", .ucode = 0x200, }, { .uname = "CH2", .udesc = "IMC channel 2 isochronous read request occupancy", .ucode = 0x400, }, { .uname = "ANY", .udesc = "IMC isochronous read request occupancy", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_normal_reads[]={ { .uname = "CH0", .udesc = "QMC channel 0 normal read requests", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 normal read requests", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 normal read requests", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC normal read requests", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_occupancy[]={ { .uname = "CH0", .udesc = "IMC channel 0 normal read request occupancy", .ucode = 0x100, }, { .uname = "CH1", .udesc = "IMC channel 1 normal read request occupancy", .ucode = 0x200, }, { .uname = "CH2", .udesc = "IMC channel 2 normal read request occupancy", .ucode = 0x400, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_priority_updates[]={ { .uname = "CH0", .udesc = "QMC channel 0 priority updates", .ucode = 0x100, }, { .uname = "CH1", .udesc = "QMC channel 1 priority updates", .ucode = 0x200, }, { .uname = "CH2", .udesc = "QMC channel 2 priority updates", .ucode = 0x400, }, { .uname = "ANY", .udesc = "QMC priority updates", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_imc_retry[]={ { .uname = "CH0", .udesc = "Channel 0", .ucode = 0x100, }, { .uname = "CH1", .udesc = "Channel 1", .ucode = 0x200, }, { .uname = "CH2", .udesc = "Channel 2", .ucode = 0x400, }, { .uname = "ANY", .udesc = "Any channel", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t wsm_unc_unc_qmc_writes[]={ { .uname = "FULL_CH0", .udesc = "QMC channel 0 full cache line writes", .ucode = 0x100, .grpid = 0, }, { .uname = "FULL_CH1", .udesc = "QMC channel 1 full cache line writes", .ucode = 0x200, .grpid = 0, }, { .uname = "FULL_CH2", .udesc = "QMC channel 2 full cache line writes", .ucode = 0x400, .grpid = 0, }, { .uname = "FULL_ANY", .udesc = "QMC full cache line writes", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 0, }, { .uname = "PARTIAL_CH0", .udesc = "QMC channel 0 partial cache line writes", .ucode = 0x800, .grpid = 1, }, { .uname = "PARTIAL_CH1", .udesc = "QMC channel 1 partial cache line writes", .ucode = 0x1000, .grpid = 1, }, { .uname = "PARTIAL_CH2", .udesc = "QMC channel 2 partial cache line writes", .ucode = 0x2000, .grpid = 1, }, { .uname = "PARTIAL_ANY", .udesc = "QMC partial cache line writes", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, .grpid = 1, }, }; static const intel_x86_umask_t wsm_unc_unc_qpi_rx_no_ppt_credit[]={ { .uname = "STALLS_LINK_0", .udesc = "Link 0 snoop stalls due to no PPT entry", .ucode = 0x100, }, { .uname = "STALLS_LINK_1", .udesc = "Link 1 snoop stalls due to no PPT entry", .ucode = 0x200, }, }; static const intel_x86_umask_t wsm_unc_unc_qpi_tx_header[]={ { .uname = "BUSY_LINK_0", .udesc = "Cycles link 0 outbound header busy", .ucode = 0x200, }, { .uname = "BUSY_LINK_1", .udesc = "Cycles link 1 outbound header busy", .ucode = 0x800, }, }; static const intel_x86_umask_t wsm_unc_unc_qpi_tx_stalled_multi_flit[]={ { .uname = "DRS_LINK_0", .udesc = "Cycles QPI outbound link 0 DRS stalled", .ucode = 0x100, }, { .uname = "NCB_LINK_0", .udesc = "Cycles QPI outbound link 0 NCB stalled", .ucode = 0x200, }, { .uname = "NCS_LINK_0", .udesc = "Cycles QPI outbound link 0 NCS stalled", .ucode = 0x400, }, { .uname = "DRS_LINK_1", .udesc = "Cycles QPI outbound link 1 DRS stalled", .ucode = 0x800, }, { .uname = "NCB_LINK_1", .udesc = "Cycles QPI outbound link 1 NCB stalled", .ucode = 0x1000, }, { .uname = "NCS_LINK_1", .udesc = "Cycles QPI outbound link 1 NCS stalled", .ucode = 0x2000, }, { .uname = "LINK_0", .udesc = "Cycles QPI outbound link 0 multi flit stalled", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LINK_1", .udesc = "Cycles QPI outbound link 1 multi flit stalled", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_unc_unc_qpi_tx_stalled_single_flit[]={ { .uname = "HOME_LINK_0", .udesc = "Cycles QPI outbound link 0 HOME stalled", .ucode = 0x100, }, { .uname = "SNOOP_LINK_0", .udesc = "Cycles QPI outbound link 0 SNOOP stalled", .ucode = 0x200, }, { .uname = "NDR_LINK_0", .udesc = "Cycles QPI outbound link 0 NDR stalled", .ucode = 0x400, }, { .uname = "HOME_LINK_1", .udesc = "Cycles QPI outbound link 1 HOME stalled", .ucode = 0x800, }, { .uname = "SNOOP_LINK_1", .udesc = "Cycles QPI outbound link 1 SNOOP stalled", .ucode = 0x1000, }, { .uname = "NDR_LINK_1", .udesc = "Cycles QPI outbound link 1 NDR stalled", .ucode = 0x2000, }, { .uname = "LINK_0", .udesc = "Cycles QPI outbound link 0 single flit stalled", .ucode = 0x700, .uflags= INTEL_X86_NCOMBO, }, { .uname = "LINK_1", .udesc = "Cycles QPI outbound link 1 single flit stalled", .ucode = 0x3800, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_unc_unc_snp_resp_to_local_home[]={ { .uname = "I_STATE", .udesc = "Local home snoop response - LLC does not have cache line", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Local home snoop response - LLC has cache line in S state", .ucode = 0x200, }, { .uname = "FWD_S_STATE", .udesc = "Local home snoop response - LLC forwarding cache line in S state.", .ucode = 0x400, }, { .uname = "FWD_I_STATE", .udesc = "Local home snoop response - LLC has forwarded a modified cache line", .ucode = 0x800, }, { .uname = "CONFLICT", .udesc = "Local home conflict snoop response", .ucode = 0x1000, }, { .uname = "WB", .udesc = "Local home snoop response - LLC has cache line in the M state", .ucode = 0x2000, }, }; static const intel_x86_umask_t wsm_unc_unc_snp_resp_to_remote_home[]={ { .uname = "I_STATE", .udesc = "Remote home snoop response - LLC does not have cache line", .ucode = 0x100, }, { .uname = "S_STATE", .udesc = "Remote home snoop response - LLC has cache line in S state", .ucode = 0x200, }, { .uname = "FWD_S_STATE", .udesc = "Remote home snoop response - LLC forwarding cache line in S state.", .ucode = 0x400, }, { .uname = "FWD_I_STATE", .udesc = "Remote home snoop response - LLC has forwarded a modified cache line", .ucode = 0x800, }, { .uname = "CONFLICT", .udesc = "Remote home conflict snoop response", .ucode = 0x1000, }, { .uname = "WB", .udesc = "Remote home snoop response - LLC has cache line in the M state", .ucode = 0x2000, }, { .uname = "HITM", .udesc = "Remote home snoop response - LLC HITM", .ucode = 0x2400, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t wsm_unc_unc_thermal_throttling_temp[]={ { .uname = "CORE_0", .udesc = "Core 0", .ucode = 0x100, }, { .uname = "CORE_1", .udesc = "Core 1", .ucode = 0x200, }, { .uname = "CORE_2", .udesc = "Core 2", .ucode = 0x400, }, { .uname = "CORE_3", .udesc = "Core 3", .ucode = 0x800, }, }; static const intel_x86_entry_t intel_wsm_unc_pe[]={ { .name = "UNC_CLK_UNHALTED", .desc = "Uncore clockticks.", .modmsk =0x0, .cntmsk = 0x100000, .code = 0xff, .flags = INTEL_X86_FIXED, }, { .name = "UNC_DRAM_OPEN", .desc = "DRAM open comamnds issued for read or write", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x60, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_open), .ngrp = 1, .umasks = wsm_unc_unc_dram_open, }, { .name = "UNC_GC_OCCUPANCY", .desc = "Number of queue entries", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gc_occupancy), .ngrp = 1, .umasks = wsm_unc_unc_gc_occupancy, }, { .name = "UNC_DRAM_PAGE_CLOSE", .desc = "DRAM page close due to idle timer expiration", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x61, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_page_close), .ngrp = 1, .umasks = wsm_unc_unc_dram_page_close, }, { .name = "UNC_DRAM_PAGE_MISS", .desc = "DRAM Channel 0 page miss", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x62, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_page_miss), .ngrp = 1, .umasks = wsm_unc_unc_dram_page_miss, }, { .name = "UNC_DRAM_PRE_ALL", .desc = "DRAM Channel 0 precharge all commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x66, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_pre_all), .ngrp = 1, .umasks = wsm_unc_unc_dram_pre_all, }, { .name = "UNC_DRAM_THERMAL_THROTTLED", .desc = "Uncore cycles DRAM was throttled due to its temperature being above thermal throttling threshold", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x67, }, { .name = "UNC_DRAM_READ_CAS", .desc = "DRAM Channel 0 read CAS commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x63, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_read_cas), .ngrp = 1, .umasks = wsm_unc_unc_dram_read_cas, }, { .name = "UNC_DRAM_REFRESH", .desc = "DRAM Channel 0 refresh commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x65, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_refresh), .ngrp = 1, .umasks = wsm_unc_unc_dram_refresh, }, { .name = "UNC_DRAM_WRITE_CAS", .desc = "DRAM Channel 0 write CAS commands", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x64, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_dram_write_cas), .ngrp = 1, .umasks = wsm_unc_unc_dram_write_cas, }, { .name = "UNC_GQ_ALLOC", .desc = "GQ read tracker requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x3, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gq_alloc), .ngrp = 1, .umasks = wsm_unc_unc_gq_alloc, }, { .name = "UNC_GQ_CYCLES_FULL", .desc = "Cycles GQ read tracker is full.", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x0, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gq_cycles_full), .ngrp = 1, .umasks = wsm_unc_unc_gq_cycles_full, }, { .name = "UNC_GQ_CYCLES_NOT_EMPTY", .desc = "Cycles GQ read tracker is busy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x1, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gq_cycles_not_empty), .ngrp = 1, .umasks = wsm_unc_unc_gq_cycles_not_empty, }, { .name = "UNC_GQ_DATA_FROM", .desc = "Cycles GQ data is imported", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x4, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gq_data_from), .ngrp = 1, .umasks = wsm_unc_unc_gq_data_from, }, { .name = "UNC_GQ_DATA_TO", .desc = "Cycles GQ data is exported", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x5, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_gq_data_to), .ngrp = 1, .umasks = wsm_unc_unc_gq_data_to, }, { .name = "UNC_LLC_HITS", .desc = "Number of LLC read hits", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x8, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_llc_hits), .ngrp = 1, .umasks = wsm_unc_unc_llc_hits, }, { .name = "UNC_LLC_LINES_IN", .desc = "LLC lines allocated in M state", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0xa, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_llc_lines_in), .ngrp = 1, .umasks = wsm_unc_unc_llc_lines_in, }, { .name = "UNC_LLC_LINES_OUT", .desc = "LLC lines victimized in M state", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0xb, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_llc_lines_out), .ngrp = 1, .umasks = wsm_unc_unc_llc_lines_out, }, { .name = "UNC_LLC_MISS", .desc = "Number of LLC read misses", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x9, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_llc_miss), .ngrp = 1, .umasks = wsm_unc_unc_llc_miss, }, { .name = "UNC_QHL_ADDRESS_CONFLICTS", .desc = "QHL 2 way address conflicts", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x24, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_address_conflicts), .ngrp = 1, .umasks = wsm_unc_unc_qhl_address_conflicts, }, { .name = "UNC_QHL_CONFLICT_CYCLES", .desc = "QHL IOH Tracker conflict cycles", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x25, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_conflict_cycles), .ngrp = 1, .umasks = wsm_unc_unc_qhl_conflict_cycles, }, { .name = "UNC_QHL_CYCLES_FULL", .desc = "Cycles QHL Remote Tracker is full", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x21, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_cycles_full), .ngrp = 1, .umasks = wsm_unc_unc_qhl_cycles_full, }, { .name = "UNC_QHL_CYCLES_NOT_EMPTY", .desc = "Cycles QHL Tracker is not empty", .modmsk =0x0, .cntmsk = 0x1fe00000, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_cycles_not_empty), .ngrp = 1, .umasks = wsm_unc_unc_qhl_cycles_not_empty, }, { .name = "UNC_QHL_FRC_ACK_CNFLTS", .desc = "QHL FrcAckCnflts sent to local home", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x33, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_frc_ack_cnflts), .ngrp = 1, .umasks = wsm_unc_unc_qhl_frc_ack_cnflts, }, { .name = "UNC_QHL_SLEEPS", .desc = "Number of occurrences a request was put to sleep", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x34, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_sleeps), .ngrp = 1, .umasks = wsm_unc_unc_qhl_sleeps, }, { .name = "UNC_QHL_OCCUPANCY", .desc = "Cycles QHL Tracker Allocate to Deallocate Read Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x23, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_occupancy), .ngrp = 1, .umasks = wsm_unc_unc_qhl_occupancy, }, { .name = "UNC_QHL_REQUESTS", .desc = "Quickpath Home Logic local read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x20, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qhl_requests), .ngrp = 1, .umasks = wsm_unc_unc_qhl_requests, }, { .name = "UNC_QHL_TO_QMC_BYPASS", .desc = "Number of requests to QMC that bypass QHL", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x26, }, { .name = "UNC_QMC_BUSY", .desc = "Cycles QMC busy with a read request", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x29, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_busy), .ngrp = 1, .umasks = wsm_unc_unc_qmc_busy, }, { .name = "UNC_QMC_CANCEL", .desc = "QMC cancels", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x30, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_cancel), .ngrp = 1, .umasks = wsm_unc_unc_qmc_cancel, }, { .name = "UNC_QMC_CRITICAL_PRIORITY_READS", .desc = "QMC critical priority read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2e, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_critical_priority_reads), .ngrp = 1, .umasks = wsm_unc_unc_qmc_critical_priority_reads, }, { .name = "UNC_QMC_HIGH_PRIORITY_READS", .desc = "QMC high priority read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2d, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_high_priority_reads), .ngrp = 1, .umasks = wsm_unc_unc_qmc_high_priority_reads, }, { .name = "UNC_QMC_ISOC_FULL", .desc = "Cycles DRAM full with isochronous (ISOC) read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x28, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_isoc_full), .ngrp = 1, .umasks = wsm_unc_unc_qmc_isoc_full, }, { .name = "UNC_IMC_ISOC_OCCUPANCY", .desc = "IMC isochronous (ISOC) Read Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2b, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_imc_isoc_occupancy), .ngrp = 1, .umasks = wsm_unc_unc_imc_isoc_occupancy, }, { .name = "UNC_QMC_NORMAL_READS", .desc = "QMC normal read requests", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2c, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_normal_reads), .ngrp = 1, .umasks = wsm_unc_unc_qmc_normal_reads, }, { .name = "UNC_QMC_OCCUPANCY", .desc = "QMC Occupancy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2a, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_occupancy), .ngrp = 1, .umasks = wsm_unc_unc_qmc_occupancy, }, { .name = "UNC_QMC_PRIORITY_UPDATES", .desc = "QMC priority updates", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x31, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_priority_updates), .ngrp = 1, .umasks = wsm_unc_unc_qmc_priority_updates, }, { .name = "UNC_IMC_RETRY", .desc = "Number of IMC DRAM channel retries (retries occur in RAS mode only)", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x32, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_imc_retry), .ngrp = 1, .umasks = wsm_unc_unc_imc_retry, }, { .name = "UNC_QMC_WRITES", .desc = "QMC cache line writes", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x2f, .flags= INTEL_X86_GRP_EXCL, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qmc_writes), .ngrp = 2, .umasks = wsm_unc_unc_qmc_writes, }, { .name = "UNC_QPI_RX_NO_PPT_CREDIT", .desc = "Link 0 snoop stalls due to no PPT entry", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x43, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qpi_rx_no_ppt_credit), .ngrp = 1, .umasks = wsm_unc_unc_qpi_rx_no_ppt_credit, }, { .name = "UNC_QPI_TX_HEADER", .desc = "Cycles link 0 outbound header busy", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x42, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qpi_tx_header), .ngrp = 1, .umasks = wsm_unc_unc_qpi_tx_header, }, { .name = "UNC_QPI_TX_STALLED_MULTI_FLIT", .desc = "Cycles QPI outbound stalls", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x41, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qpi_tx_stalled_multi_flit), .ngrp = 1, .umasks = wsm_unc_unc_qpi_tx_stalled_multi_flit, }, { .name = "UNC_QPI_TX_STALLED_SINGLE_FLIT", .desc = "Cycles QPI outbound link stalls", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x40, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_qpi_tx_stalled_single_flit), .ngrp = 1, .umasks = wsm_unc_unc_qpi_tx_stalled_single_flit, }, { .name = "UNC_SNP_RESP_TO_LOCAL_HOME", .desc = "Local home snoop response", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x6, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_snp_resp_to_local_home), .ngrp = 1, .umasks = wsm_unc_unc_snp_resp_to_local_home, }, { .name = "UNC_SNP_RESP_TO_REMOTE_HOME", .desc = "Remote home snoop response", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x7, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_snp_resp_to_remote_home), .ngrp = 1, .umasks = wsm_unc_unc_snp_resp_to_remote_home, }, { .name = "UNC_THERMAL_THROTTLING_TEMP", .desc = "Uncore cycles that the PCU records core temperature above threshold", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_thermal_throttling_temp), .ngrp = 1, .umasks = wsm_unc_unc_thermal_throttling_temp, }, { .name = "UNC_THERMAL_THROTTLED_TEMP", .desc = "Uncore cycles that the PCU records that core is in power throttled state due to temperature being above threshold", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x81, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_thermal_throttling_temp), .ngrp = 1, .umasks = wsm_unc_unc_thermal_throttling_temp, /* identical to actual umasks list for this event */ }, { .name = "UNC_PROCHOT_ASSERTION", .desc = "Number of system ssertions of PROCHOT indicating the entire processor has exceeded the thermal limit", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x82, }, { .name = "UNC_THERMAL_THROTTLING_PROCHOT", .desc = "Uncore cycles that the PCU records that core is in power throttled state due PROCHOT assertions", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x83, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_thermal_throttling_temp), .ngrp = 1, .umasks = wsm_unc_unc_thermal_throttling_temp, /* identical to actual umasks list for this event */ }, { .name = "UNC_TURBO_MODE", .desc = "Uncore cycles that a core is operating in turbo mode", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x84, .numasks = LIBPFM_ARRAY_SIZE(wsm_unc_unc_thermal_throttling_temp), .ngrp = 1, .umasks = wsm_unc_unc_thermal_throttling_temp, /* identical to actual umasks list for this event */ }, { .name = "UNC_CYCLES_UNHALTED_L3_FLL_ENABLE", .desc = "Uncore cycles where at least one core is unhalted and all L3 ways are enabled", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x85, }, { .name = "UNC_CYCLES_UNHALTED_L3_FLL_DISABLE", .desc = "Uncore cycles where at least one core is unhalted and all L3 ways are disabled", .modmsk = NHM_UNC_ATTRS, .cntmsk = 0x1fe00000, .code = 0x86, }, }; papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_imc_events.h0000600003276200002170000002370412247131123024076 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: snbep_unc_imc (Intel SandyBridge-EP IMC uncore PMU) */ static const intel_x86_umask_t snbep_unc_m_cas_count[]={ { .uname = "ALL", .udesc = "Counts total number of DRAM CAS commands issued on this channel", .ucode = 0xf00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "RD", .udesc = "Counts all DRAM reads on this channel, incl. underfills", .ucode = 0x300, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RD_REG", .udesc = "Counts number of DRAM read CAS commands issued on this channel, incl. regular read CAS and those with implicit precharge", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RD_UNDERFILL", .udesc = "Counts number of underfill reads issued by the memory controller", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WR", .udesc = "Counts number of DRAM write CAS commands on this channel", .ucode = 0xc00, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WR_RMM", .udesc = "Counts Number of opportunistic DRAM write CAS commands issued on this channel", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WR_WMM", .udesc = "Counts number of DRAM write CAS commands issued on this channel while in Write-Major mode", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_m_dram_refresh[]={ { .uname = "HIGH", .udesc = "TBD", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PANIC", .udesc = "TBD", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_m_major_modes[]={ { .uname = "ISOCH", .udesc = "Counts cycles in ISOCH Major maode", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PARTIAL", .udesc = "Counts cycles in Partial Major mode", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "READ", .udesc = "Counts cycles in Read Major mode", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WRITE", .udesc = "Counts cycles in Write Major mode", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_m_power_cke_cycles[]={ { .uname = "RANK0", .udesc = "Count cycles for rank 0", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK1", .udesc = "Count cycles for rank 1", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK2", .udesc = "Count cycles for rank 2", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK3", .udesc = "Count cycles for rank 3", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK4", .udesc = "Count cycles for rank 4", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK5", .udesc = "Count cycles for rank 5", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK6", .udesc = "Count cycles for rank 6", .ucode = 0x4000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RANK7", .udesc = "Count cycles for rank 7", .ucode = 0x8000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_m_preemption[]={ { .uname = "RD_PREEMPT_RD", .udesc = "Counts read over read preemptions", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "RD_PREEMPT_WR", .udesc = "Counts read over write preemptions", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_m_pre_count[]={ { .uname = "PAGE_CLOSE", .udesc = "Counts number of DRAM precharge commands sent on this channel as a result of the page close counter expiring", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "PAGE_MISS", .udesc = "Counts number of DRAM precharge commands sent on this channel as a result of page misses", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_m_pe[]={ { .name = "UNC_M_CLOCKTICKS", .desc = "IMC Uncore clockticks", .modmsk = 0x0, .cntmsk = 0x100000000ull, .code = 0xff, /* perf pseudo encoding for fixed counter */ .flags = INTEL_X86_FIXED, }, { .name = "UNC_M_ACT_COUNT", .desc = "DRAM Activate Count", .code = 0x1, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_CAS_COUNT", .desc = "DRAM RD_CAS and WR_CAS Commands.", .code = 0x4, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_cas_count), .umasks = snbep_unc_m_cas_count }, { .name = "UNC_M_DRAM_PRE_ALL", .desc = "DRAM Precharge All Commands", .code = 0x6, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_DRAM_REFRESH", .desc = "Number of DRAM Refreshes Issued", .code = 0x5, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_dram_refresh), .umasks = snbep_unc_m_dram_refresh }, { .name = "UNC_M_ECC_CORRECTABLE_ERRORS", .desc = "ECC Correctable Errors", .code = 0x9, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_MAJOR_MODES", .desc = "Cycles in a Major Mode", .code = 0x7, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_major_modes), .umasks = snbep_unc_m_major_modes }, { .name = "UNC_M_POWER_CHANNEL_DLLOFF", .desc = "Channel DLLOFF Cycles", .code = 0x84, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_POWER_CHANNEL_PPD", .desc = "Channel PPD Cycles", .code = 0x85, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_POWER_CKE_CYCLES", .desc = "CKE_ON_CYCLES by Rank", .code = 0x83, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_power_cke_cycles), .umasks = snbep_unc_m_power_cke_cycles }, { .name = "UNC_M_POWER_CRITICAL_THROTTLE_CYCLES", .desc = "Critical Throttle Cycles", .code = 0x86, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_POWER_SELF_REFRESH", .desc = "Clock-Enabled Self-Refresh", .code = 0x43, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_POWER_THROTTLE_CYCLES", .desc = "Throttle Cycles for Rank 0", .code = 0x41, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_power_cke_cycles), .umasks = snbep_unc_m_power_cke_cycles /* identical to snbep_unc_m_power_cke_cycles */ }, { .name = "UNC_M_PREEMPTION", .desc = "Read Preemption Count", .code = 0x8, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_preemption), .umasks = snbep_unc_m_preemption }, { .name = "UNC_M_PRE_COUNT", .desc = "DRAM Precharge commands.", .code = 0x2, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_IMC_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_m_pre_count), .umasks = snbep_unc_m_pre_count }, { .name = "UNC_M_RPQ_CYCLES_FULL", .desc = "Read Pending Queue Full Cycles", .code = 0x12, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_RPQ_CYCLES_NE", .desc = "Read Pending Queue Not Empty", .code = 0x11, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_RPQ_INSERTS", .desc = "Read Pending Queue Allocations", .code = 0x10, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_RPQ_OCCUPANCY", .desc = "Read Pending Queue Occupancy", .code = 0x80, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_CYCLES_FULL", .desc = "Write Pending Queue Full Cycles", .code = 0x22, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_CYCLES_NE", .desc = "Write Pending Queue Not Empty", .code = 0x21, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_INSERTS", .desc = "Write Pending Queue Allocations", .code = 0x20, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_OCCUPANCY", .desc = "Write Pending Queue Occupancy", .code = 0x81, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_READ_HIT", .desc = "Write Pending Queue CAM Match", .code = 0x23, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, { .name = "UNC_M_WPQ_WRITE_HIT", .desc = "Write Pending Queue CAM Match", .code = 0x24, .cntmsk = 0xf, .modmsk = SNBEP_UNC_IMC_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/intel_snb_unc_events.h0000600003276200002170000002010712247131123022713 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * PMU: snb_unc (Intel Sandy Bridge uncore PMU) */ static const intel_x86_umask_t snb_unc_cbo_xsnp_response[]={ { .uname = "MISS", .udesc = "Number of snoop misses", .ucode = 0x100, .grpid = 0, }, { .uname = "INVAL", .udesc = "Number of snoop invalidates of a non-modified line", .ucode = 0x200, .grpid = 0, }, { .uname = "HIT", .udesc = "Number of snoop hits of a non-modified line", .ucode = 0x400, .grpid = 0, }, { .uname = "HITM", .udesc = "Number of snoop hits of a modified line", .ucode = 0x800, .grpid = 0, }, { .uname = "INVAL_M", .udesc = "Number of snoop invalidates of a modified line", .ucode = 0x1000, .grpid = 0, }, { .uname = "ANY_SNP", .udesc = "Number of snoops", .ucode = 0x1f00, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "EXTERNAL_FILTER", .udesc = "Filter on cross-core snoops initiated by this Cbox due to external snoop request", .ucode = 0x2000, .grpid = 1, .uflags = INTEL_X86_NCOMBO, }, { .uname = "XCORE_FILTER", .udesc = "Filter on cross-core snoops initiated by this Cbox due to processor core memory request", .ucode = 0x4000, .grpid = 1, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "EVICTION_FILTER", .udesc = "Filter on cross-core snoops initiated by this Cbox due to LLC eviction", .ucode = 0x8000, .grpid = 1, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_unc_cbo_cache_lookup[]={ { .uname = "STATE_M", .udesc = "Number of LLC lookup requests for a line in modified state", .ucode = 0x100, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STATE_E", .udesc = "Number of LLC lookup requests for a line in exclusive state", .ucode = 0x200, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STATE_S", .udesc = "Number of LLC lookup requests for a line in shared state", .ucode = 0x400, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STATE_I", .udesc = "Number of LLC lookup requests for a line in invalid state", .ucode = 0x800, .grpid = 0, .uflags = INTEL_X86_NCOMBO, }, { .uname = "STATE_MESI", .udesc = "Number of LLC lookup requests for a line", .ucode = 0xf00, .grpid = 0, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "READ_FILTER", .udesc = "Filter on processor core initiated cacheable read requests", .ucode = 0x1000, .grpid = 1, .uflags = INTEL_X86_NCOMBO, }, { .uname = "WRITE_FILTER", .udesc = "Filter on processor core initiated cacheable write requests", .ucode = 0x2000, .grpid = 1, .uflags = INTEL_X86_NCOMBO, }, { .uname = "EXTSNP_FILTER", .udesc = "Filter on external snoop requests", .ucode = 0x4000, .grpid = 1, .uflags = INTEL_X86_NCOMBO, }, { .uname = "ANY_FILTER", .udesc = "Filter on any IRQ or IPQ initiated requests including uncacheable, non-coherent requests", .ucode = 0x8000, .grpid = 1, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_unc_arb_trk_occupancy[]={ { .uname = "ALL", .udesc = "Counts cycles weighted by the number of requests waiting for data returning from the memory controller, (includes coherent and non-coherent requests initiated by cores, processor graphic units, or LLC)", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_unc_arb_trk[]={ { .uname = "ALL", .udesc = "Counts number of coherent and in-coherent requests initiated by cores, processor graphic units, or LLC", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, { .uname = "WRITES", .udesc = "Counts the number of allocated write entries, include full, partial, and LLC evictions", .ucode = 0x2000, .uflags= INTEL_X86_NCOMBO, }, { .uname = "EVICTIONS", .udesc = "Counts the number of LLC evictions allocated", .ucode = 0x8000, .uflags= INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snb_unc_arb_coh_trk_occupancy[]={ { .uname = "ALL", .udesc = "Cycles weighted by number of requests pending in Coherency Tracker", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snb_unc_arb_coh_trk_request[]={ { .uname = "ALL", .udesc = "Number of requests allocated in Coherency Tracker", .ucode = 0x100, .uflags= INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_entry_t intel_snb_unc_cbo0_pe[]={ { .name = "UNC_CLOCKTICKS", .desc = "uncore clock ticks", .cntmsk = 1ULL << 32, .code = 0xff, /* perf_event pseudo encoding */ .flags = INTEL_X86_FIXED, }, { .name = "UNC_CBO_XSNP_RESPONSE", .desc = "Snoop responses", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0xff, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_cbo_xsnp_response), .ngrp = 2, .umasks = snb_unc_cbo_xsnp_response, }, { .name = "UNC_CBO_CACHE_LOOKUP", .desc = "LLC cache lookups", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0xff, .code = 0x34, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_cbo_cache_lookup), .ngrp = 2, .umasks = snb_unc_cbo_cache_lookup, }, }; static const intel_x86_entry_t intel_snb_unc_cbo_pe[]={ { .name = "UNC_CBO_XSNP_RESPONSE", .desc = "Snoop responses (must provide a snoop type and filter)", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0xff, .code = 0x22, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_cbo_xsnp_response), .ngrp = 2, .umasks = snb_unc_cbo_xsnp_response, }, { .name = "UNC_CBO_CACHE_LOOKUP", .desc = "LLC cache lookups", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0xff, .code = 0x34, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_cbo_cache_lookup), .ngrp = 2, .umasks = snb_unc_cbo_cache_lookup, }, }; static const intel_x86_entry_t intel_snb_unc_arb_pe[]={ { .name = "UNC_ARB_TRK_OCCUPANCY", .desc = "ARB tracker occupancy", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0x1, .code = 0x80, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_arb_trk_occupancy), .ngrp = 1, .umasks = snb_unc_arb_trk_occupancy, }, { .name = "UNC_ARB_COH_TRK_OCCUPANCY", .desc = "Coherency traffic occupancy", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0x1, .code = 0x83, .flags= INTEL_X86_PEBS, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_arb_coh_trk_occupancy), .ngrp = 1, .umasks = snb_unc_arb_coh_trk_occupancy, }, { .name = "UNC_ARB_COH_TRK_REQUEST", .desc = "Coherency traffic requests", .modmsk = INTEL_SNB_UNC_ATTRS, .cntmsk = 0x1, .code = 0x84, .numasks = LIBPFM_ARRAY_SIZE(snb_unc_arb_coh_trk_request), .ngrp = 1, .umasks = snb_unc_arb_coh_trk_request, }, }; papi-5.3.0/src/libpfm4/lib/events/mips_74k_events.h0000600003276200002170000004713012247131123021533 0ustar ralphundrgrad/* * Copyright (c) 2011 Samara Technology Group, Inc * Contributed by Philip Mucci * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Based on: * MIPS32 74KTM Processor Core Family Software Users' Manual * Document Number: MD00519 Revision 01.05 March 30, 2011 */ static const mips_entry_t mips_74k_pe []={ { .name = "CYCLES", /* BOTH */ .code = 0x0, .desc = "Cycles", }, { .name = "INSTRUCTIONS", /* BOTH */ .code = 0x1, .desc = "Instructions graduated", }, { .name = "PREDICTED_JR_31", .code = 0x2, .desc = "jr $31 (return) instructions whose target is predicted", }, { .name = "JR_31_MISPREDICTIONS", .code = 0x82, .desc = "jr $31 (return) predicted but guessed wrong", }, { .name = "REDIRECT_STALLS", .code = 0x3, .desc = "Cycles where no instruction is fetched because it has no next address candidate. This includes stalls due to register indirect jumps such as jr, stalls following a wait or eret and stalls dues to exceptions from instruction fetch", }, { .name = "JR_31_NO_PREDICTIONS", .code = 0x83, .desc = "jr $31 (return) instructions fetched and not predicted using RPS", }, { .name = "ITLB_ACCESSES", .code = 0x4, .desc = "ITLB accesses", }, { .name = "ITLB_MISSES", .code = 0x84, .desc = "ITLB misses, which result in a JTLB access", }, { .name = "JTLB_INSN_MISSES", .code = 0x85, .desc = "JTLB instruction access misses (will lead to an exception)", }, { .name = "ICACHE_ACCESSES", .code = 0x6, .desc = "Instruction cache accesses. 74K cores have a 128-bit connection to the I-cache and fetch 4 instructions every access. This counts every such access, including accesses for instructions which are eventually discarded. For example, following a branch which is incorrectly predicted, the 74K core will continue to fetch instructions, which will eventually get thrown away", }, { .name = "ICACHE_MISSES", .code = 0x86, .desc = "I-cache misses. Includes misses resulting from fetch-ahead and speculation", }, { .name = "ICACHE_MISS_STALLS", .code = 0x7, .desc = "Cycles where no instruction is fetched because we missed in the I-cache", }, { .name = "UNCACHED_IFETCH_STALLS", .code = 0x8, .desc = "Cycles where no instruction is fetched because we're waiting for an I-fetch from uncached memory", }, { .name = "PDTRACE_BACK_STALLS", .code = 0x88, .desc = "PDTrace back stalls", }, { .name = "IFU_REPLAYS", .code = 0x9, .desc = "Number of times the instruction fetch pipeline is flushed and replayed because the IFU buffers are full and unable to accept any instructions", }, { .name = "KILLED_FETCH_SLOTS", .code = 0x89, .desc = "Valid fetch slots killed due to taken branches/jumps or stalling instructions", }, { .name = "DDQ0_FULL_DR_STALLS", .code = 0xd, .desc = "Cycles where no instructions are brought into the IDU because the ALU instruction candidate pool is full", }, { .name = "DDQ1_FULL_DR_STALLS", .code = 0x8d, .desc = "Cycles where no instructions are brought into the IDU because the AGEN instruction candidate pool is full", }, { .name = "ALCB_FULL_DR_STALLS", .code = 0xe, .desc = "Cycles where no instructions can be added to the issue pool, because we have run out of ALU completion buffers (CBs)", }, { .name = "AGCB_FULL_DR_STALLS", .code = 0x8e, .desc = "Cycles where no instructions can be added to the issue pool, because we have run out of AGEN completion buffers (CBs)", }, { .name = "CLDQ_FULL_DR_STALLS", .code = 0xf, .desc = "Cycles where no instructions can be added to the issue pool, because we've used all the FIFO entries in the CLDQ which keep track of data coming back from the FPU", }, { .name = "IODQ_FULL_DR_STALLS", .code = 0x8f, .desc = "Cycles where no instructions can be added to the issue pool, because we've filled the in order FIFO used for coprocessor 1 instructions (IOIQ)", }, { .name = "ALU_EMPTY_CYCLES", .code = 0x10, .desc = "Cycles with no ALU-pipe issue; no instructions available", }, { .name = "AGEN_EMPTY_CYCLES", .code = 0x90, .desc = "Cycles with no AGEN-pipe issue; no instructions available", }, { .name = "ALU_OPERANDS_NOT_READY_CYCLES", .code = 0x11, .desc = "Cycles with no ALU-pipe issue; we have instructions, but operands not ready", }, { .name = "AGEN_OPERANDS_NOT_READY_CYCLES", .code = 0x91, .desc = "Cycles with no AGEN-pipe issue; we have instructions, but operands not ready", }, { .name = "ALU_NO_ISSUE_CYCLES", .code = 0x12, .desc = "Cycles with no ALU-pipe issue; we have instructions, but some resource is unavailable. This includes, operands are not ready (same as event 17), div in progress inhibits MDU instructions, CorExtend resource limitation", }, { .name = "AGEN_NO_ISSUE_CYCLES", .code = 0x92, .desc = "Cycles with no AGEN-pipe issue; we have instructions, but some resource is unavailable. This includes, operands are not ready (same as event 17), Non-issued stores blocking ready to issue loads, issued cacheops blocking ready to issue loads", }, { .name = "ALU_BUBBLE_CYCLES", .code = 0x13, .desc = "ALU-pipe bubble issued. The resulting empty pipe stage guarantees that some resource will be unused for a cycle, sometime soon. Used, for example, to guarantee an opportunity to write mfc1 data into a CB", }, { .name = "AGEN_BUBBLE_CYCLES", .code = 0x93, .desc = "AGEN-pipe bubble issued. The resulting empty pipe stage guarantees that some resource will be unused for a cycle, sometime soon. Used, for example, to allow access to the data cache for refill or eviction", }, { .name = "SINGLE_ISSUE_CYCLES", .code = 0x14, .desc = "Cycles when one instruction is issued", }, { .name = "DUAL_ISSUE_CYCLES", .code = 0x94, .desc = "Cycles when two instructions are issued (one ALU, one AGEN)", }, { .name = "OOO_ALU_ISSUE_CYCLES", .code = 0x15, .desc = "Cycles when instructions are issued out of order into the ALU pipe. i.e. instruction issued is not the oldest in the pool", }, { .name = "OOO_AGEN_ISSUE_CYCLES", .code = 0x95, .desc = "Cycles when instructions are issued out of order into the AGEN pipe. i.e. instruction issued is not the oldest in the pool", }, { .name = "JALR_JALR_HB_INSNS", .code = 0x16, .desc = "Graduated JAR/JALR.HB", }, { .name = "DCACHE_LINE_REFILL_REQUESTS", .code = 0x96, .desc = "D-Cache line refill (not LD/ST misses)", }, { .name = "DCACHE_LOAD_ACCESSES", .code = 0x17, .desc = "Cacheable loads - Counts all accesses to the D-cache caused by load instructions. This count includes instructions that do not graduate", }, { .name = "DCACHE_ACCESSES", .code = 0x97, .desc = "All D-cache accesses (loads, stores, prefetch, cacheop etc). This count includes instructions that do not graduate", }, { .name = "DCACHE_WRITEBACKS", .code = 0x18, .desc = "D-Cache writebacks", }, { .name = "DCACHE_MISSES", .code = 0x98, .desc = "D-cache misses. This count is per instruction at grad- uation and includes load, store, prefetch, synci and address based cacheops", }, { .name = "JTLB_DATA_ACCESSES", .code = 0x19, .desc = "JTLB d-side (data side as opposed to instruction side) accesses", }, { .name = "JTLB_DATA_MISSES", .code = 0x99, .desc = "JTLB translation fails on d-side (data side as opposed to instruction side) accesses. This count includes instructions that do not graduate", }, { .name = "LOAD_STORE_REPLAYS", .code = 0x1a, .desc = "Load/store instruction redirects, which happen when the load/store follows too closely on a possibly matching cacheop", }, { .name = "DCACHE_VTAG_MISMATCH", .code = 0x9a, .desc = "The 74K core's D-cache has an auxiliary virtual tag, used to pick the right line early. When (occasionally) the physical tag match and virtual tag match do not line up, it is treated as a cache miss - in processing the miss the virtual tag is correcyed for future accesses. This event counts those bogus misses", }, { .name = "L2_CACHE_WRITEBACKS", .code = 0x1c, .desc = "L2 cache writebacks", }, { .name = "L2_CACHE_ACCESSES", .code = 0x9c, .desc = "L2 cache accesses", }, { .name = "L2_CACHE_MISSES", .code = 0x1d, .desc = "L2 cache misses", }, { .name = "L2_CACHE_MISS_CYCLES", .code = 0x9d, .desc = "L2 cache miss cycles", }, { .name = "FSB_FULL_STALLS", .code = 0x1e, .desc = "Cycles Fill Store Buffer(FSB) are full and cause a pipe stall", }, { .name = "FSB_OVER_50_FULL", .code = 0x9e, .desc = "Cycles Fill Store Buffer(FSB) > 1/2 full", }, { .name = "LDQ_FULL_STALLS", .code = 0x1f, .desc = "Cycles Load Data Queue (LDQ) are full and cause a pipe stall", }, { .name = "LDQ_OVER_50_FULL", .code = 0x9f, .desc = "Cycles Load Data Queue(LDQ) > 1/2 full", }, { .name = "WBB_FULL_STALLS", .code = 0x20, .desc = "Cycles Writeback Buffer(WBB) are full and cause a pipe stall", }, { .name = "WBB_OVER_50_FULL", .code = 0xa0, .desc = "Cycles Writeback Buffer(WBB) > 1/2 full", }, { .name = "LOAD_MISS_CONSUMER_REPLAYS", .code = 0x23, .desc = "Replays following optimistic issue of instruction dependent on load which missed. Counted only when the dependent instruction graduates", }, { .name = "FPU_LOAD_INSNS", .code = 0xa3, .desc = "Floating Point Load instructions graduated", }, { .name = "JR_NON_31_INSNS", .code = 0x24, .desc = "jr (not $31) instructions graduated", }, { .name = "MISPREDICTED_JR_31_INSNS", .code = 0xa4, .desc = "jr $31 mispredicted at graduation", }, { .name = "INT_BRANCH_INSNS", .code = 0x25, .desc = "Integer branch instructions graduated", }, { .name = "FPU_BRANCH_INSNS", .code = 0xa5, .desc = "Floating point branch instructions graduated", }, { .name = "BRANCH_LIKELY_INSNS", .code = 0x26, .desc = "Branch-likely instructions graduated", }, { .name = "MISPREDICTED_BRANCH_LIKELY_INSNS", .code = 0xa6, .desc = "Mispredicted branch-likely instructions graduated", }, { .name = "COND_BRANCH_INSNS", .code = 0x27, .desc = "Conditional branches graduated", }, { .name = "MISPREDICTED_BRANCH_INSNS", .code = 0xa7, .desc = "Mispredicted conditional branches graduated", }, { .name = "INTEGER_INSNS", .code = 0x28, .desc = "Integer instructions graduated (includes nop, ssnop, ehb as well as all arithmetic, locial, shift and extract type operations)", }, { .name = "FPU_INSNS", .code = 0xa8, .desc = "Floating point instructions graduated (but not counting floating point load/store)", }, { .name = "LOAD_INSNS", .code = 0x29, .desc = "Loads graduated (includes floating point)", }, { .name = "STORE_INSNS", .code = 0xa9, .desc = "Stores graduated (includes floating point). Of sc instructions, only successful ones are counted", }, { .name = "J_JAL_INSNS", .code = 0x2a, .desc = "j/jal graduated", }, { .name = "MIPS16_INSNS", .code = 0xaa, .desc = "MIPS16e instructions graduated", }, { .name = "NOP_INSNS", .code = 0x2b, .desc = "no-ops graduated - included (sll, nop, ssnop, ehb)", }, { .name = "NT_MUL_DIV_INSNS", .code = 0xab, .desc = "integer multiply/divides graduated", }, { .name = "DSP_INSNS", .code = 0x2c, .desc = "DSP instructions graduated", }, { .name = "ALU_DSP_SATURATION_INSNS", .code = 0xac, .desc = "ALU-DSP instructions graduated, result was saturated", }, { .name = "DSP_BRANCH_INSNS", .code = 0x2d, .desc = "DSP branch instructions graduated", }, { .name = "MDU_DSP_SATURATION_INSNS", .code = 0xad, .desc = "MDU-DSP instructions graduated, result was saturated", }, { .name = "UNCACHED_LOAD_INSNS", .code = 0x2e, .desc = "Uncached loads graduated", }, { .name = "UNCACHED_STORE_INSNS", .code = 0xae, .desc = "Uncached stores graduated", }, { .name = "EJTAG_INSN_TRIGGERS", .code = 0x31, .desc = "EJTAG instruction triggers", }, { .name = "EJTAG_DATA_TRIGGERS", .code = 0xb1, .desc = "EJTAG data triggers", }, { .name = "CP1_BRANCH_MISPREDICTIONS", .code = 0x32, .desc = "CP1 branches mispredicted", }, { .name = "SC_INSNS", .code = 0x33, .desc = "sc instructions graduated", }, { .name = "FAILED_SC_INSNS", .code = 0xb3, .desc = "sc instructions failed", }, { .name = "PREFETCH_INSNS", .code = 0x34, .desc = "prefetch instructions graduated at the top of LSGB", }, { .name = "CACHE_HIT_PREFETCH_INSNS", .code = 0xb4, .desc = "prefetch instructions which did nothing, because they hit in the cache", }, { .name = "NO_INSN_CYCLES", .code = 0x35, .desc = "Cycles where no instructions graduated", }, { .name = "LOAD_MISS_INSNS", .code = 0xb5, .desc = "Load misses graduated. Includes floating point loads", }, { .name = "ONE_INSN_CYCLES", .code = 0x36, .desc = "Cycles where one instruction graduated", }, { .name = "TWO_INSNS_CYCLES", .code = 0xb6, .desc = "Cycles where two instructions graduated", }, { .name = "GFIFO_BLOCKED_CYCLES", .code = 0x37, .desc = "GFifo blocked cycles", }, { .name = "FPU_STORE_INSNS", .code = 0xb7, .desc = "Floating point stores graduated", }, { .name = "GFIFO_BLOCKED_TLB_CACHE", .code = 0x38, .desc = "GFifo blocked due to TLB or Cacheop", }, { .name = "NO_INSTRUCTIONS_FROM_REPLAY_CYCLES", .code = 0xb8, .desc = "Number of cycles no instructions graduated from the time the pipe was flushed because of a replay until the first new instruction graduates. This is an indicator of the graduation bandwidth loss due to replay. Often times this replay is a result of event 25 and therefor an indicator of bandwidth lost due to cache misses", }, { .name = "MISPREDICTION_BRANCH_NODELAY_CYCLES", .code = 0x39, /* even counters event 57 (raw 57) */ .desc = "Slot 0 misprediction branch instruction graduation cycles without the delay slot" }, { .name = "MISPREDICTION_BRANCH_DELAY_WAIT_CYCLES", .code = 0xb9, /* even counters event 57 (raw 57) */ .desc = "Cycles waiting for delay slot to graduate on a mispredicted branch", }, { .name = "EXCEPTIONS_TAKEN", .code = 0x3a, .desc = "Exceptions taken", }, { .name = "GRADUATION_REPLAYS", .code = 0xba, .desc = "Replays initiated from graduation", }, { .name = "COREEXTEND_EVENTS", .code = 0x3b, .desc = "Implementation specific CorExtend event. The integrator of this core may connect the core pin UDI_perfcnt_event to an event to be counted. This is intended for use with the CorExtend interface", }, { .name = "DSPRAM_EVENTS", .code = 0xbe, .desc = "Implementation-specific DSPRAM event. The integrator of this core may connect the core pin SP_prf_c13_e62_xx to the event to be counted", }, { .name = "L2_CACHE_SINGLE_BIT_ERRORS", .code = 0x3f, .desc = "L2 single-bit errors which were detected", }, { .name = "SYSTEM_EVENT_0", .code = 0x40, .desc = "SI_Event[0] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[0] to an event to be counted", }, { .name = "SYSTEM_EVENT_1", .code = 0xc0, .desc = "SI_Event[1] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[1] to an event to be counted", }, { .name = "SYSTEM_EVENT_2", .code = 0x41, .desc = "SI_Event[2] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[2] to an event to be counted", }, { .name = "SYSTEM_EVENT_3", .code = 0xc1, .desc = "SI_Event[3] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[3] to an event to be counted", }, { .name = "SYSTEM_EVENT_4", .code = 0x42, .desc = "SI_Event[4] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[4] to an event to be counted", }, { .name = "SYSTEM_EVENT_5", .code = 0xc2, .desc = "SI_Event[5] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[5] to an event to be counted", }, { .name = "SYSTEM_EVENT_6", .code = 0x43, .desc = "SI_Event[6] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[6] to an event to be counted", }, { .name = "SYSTEM_EVENT_7", .code = 0xc3, .desc = "SI_Event[7] - Implementation-specific system event. The integrator of this core may connect the core pin SI_PCEvent[7] to an event to be counted", }, { .name = "OCP_ALL_REQUESTS", .code = 0x44, .desc = "All OCP requests accepted", }, { .name = "OCP_ALL_CACHEABLE_REQUESTS", .code = 0xc4, .desc = "All OCP cacheable requests accepted", }, { .name = "OCP_READ_REQUESTS", .code = 0x45, .desc = "OCP read requests accepted", }, { .name = "OCP_READ_CACHEABLE_REQUESTS", .code = 0xc5, .desc = "OCP cacheable read requests accepted", }, { .name = "OCP_WRITE_REQUESTS", .code = 0x46, .desc = "OCP write requests accepted", }, { .name = "OCP_WRITE_CACHEABLE REQUESTS", .code = 0xc6, .desc = "OCP cacheable write requests accepted", }, { .name = "OCP_WRITE_DATA_SENT", .code = 0xc7, .desc = "OCP write data sent", }, { .name = "OCP_READ_DATA_RECEIVED", .code = 0xc8, .desc = "OCP read data received", }, { .name = "FSB_LESS_25_FULL", .code = 0x4a, .desc = "Cycles fill store buffer (FSB) < 1/4 full", }, { .name = "FSB_25_50_FULL", .code = 0xca, .desc = "Cycles fill store buffer (FSB) 1/4 to 1/2 full", }, { .name = "LDQ_LESS_25_FULL", .code = 0x4b, .desc = "Cycles load data queue (LDQ) < 1/4 full", }, { .name = "LDQ_25_50_FULL", .code = 0xcb, .desc = "Cycles load data queue (LDQ) 1/4 to 1/2 full", }, { .name = "WBB_LESS_25_FULL", .code = 0x4c, .desc = "Cycles writeback buffer (WBB) < 1/4 full", }, { .name = "WBB_25_50_FULL", .code = 0xcc, .desc = "Cycles writeback buffer (WBB) 1/4 to 1/2 full", }, }; papi-5.3.0/src/libpfm4/lib/events/sparc_niagara1_events.h0000600003276200002170000000206112247131124022744 0ustar ralphundrgradstatic const sparc_entry_t niagara1_pe[] = { /* PIC1 Niagara-1 events */ { .name = "Instr_cnt", .desc = "Number of instructions completed", .ctrl = PME_CTRL_S1, .code = 0x0, }, /* PIC0 Niagara-1 events */ { .name = "SB_full", .desc = "Store-buffer full", .ctrl = PME_CTRL_S0, .code = 0x0, }, { .name = "FP_instr_cnt", .desc = "FPU instructions", .ctrl = PME_CTRL_S0, .code = 0x1, }, { .name = "IC_miss", .desc = "I-cache miss", .ctrl = PME_CTRL_S0, .code = 0x2, }, { .name = "DC_miss", .desc = "D-cache miss", .ctrl = PME_CTRL_S0, .code = 0x3, }, { .name = "ITLB_miss", .desc = "I-TLB miss", .ctrl = PME_CTRL_S0, .code = 0x4, }, { .name = "DTLB_miss", .desc = "D-TLB miss", .ctrl = PME_CTRL_S0, .code = 0x5, }, { .name = "L2_imiss", .desc = "E-cache instruction fetch miss", .ctrl = PME_CTRL_S0, .code = 0x6, }, { .name = "L2_dmiss_ld", .desc = "E-cache data load miss", .ctrl = PME_CTRL_S0, .code = 0x7, }, }; #define PME_SPARC_NIAGARA1_EVENT_COUNT (sizeof(niagara1_pe)/sizeof(sparc_entry_t)) papi-5.3.0/src/libpfm4/lib/events/ppc970_events.h0000600003276200002170000020032112247131124021112 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __PPC970_EVENTS_H__ #define __PPC970_EVENTS_H__ /* * File: ppc970_events.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * * Note: This code was automatically generated and should not be modified by * hand. * */ #define PPC970_PME_PM_LSU_REJECT_RELOAD_CDF 0 #define PPC970_PME_PM_MRK_LSU_SRQ_INST_VALID 1 #define PPC970_PME_PM_FPU1_SINGLE 2 #define PPC970_PME_PM_FPU0_STALL3 3 #define PPC970_PME_PM_TB_BIT_TRANS 4 #define PPC970_PME_PM_GPR_MAP_FULL_CYC 5 #define PPC970_PME_PM_MRK_ST_CMPL 6 #define PPC970_PME_PM_FPU0_STF 7 #define PPC970_PME_PM_FPU1_FMA 8 #define PPC970_PME_PM_LSU1_FLUSH_ULD 9 #define PPC970_PME_PM_MRK_INST_FIN 10 #define PPC970_PME_PM_MRK_LSU0_FLUSH_UST 11 #define PPC970_PME_PM_LSU_LRQ_S0_ALLOC 12 #define PPC970_PME_PM_FPU_FDIV 13 #define PPC970_PME_PM_FPU0_FULL_CYC 14 #define PPC970_PME_PM_FPU_SINGLE 15 #define PPC970_PME_PM_FPU0_FMA 16 #define PPC970_PME_PM_MRK_LSU1_FLUSH_ULD 17 #define PPC970_PME_PM_LSU1_FLUSH_LRQ 18 #define PPC970_PME_PM_DTLB_MISS 19 #define PPC970_PME_PM_MRK_ST_MISS_L1 20 #define PPC970_PME_PM_EXT_INT 21 #define PPC970_PME_PM_MRK_LSU1_FLUSH_LRQ 22 #define PPC970_PME_PM_MRK_ST_GPS 23 #define PPC970_PME_PM_GRP_DISP_SUCCESS 24 #define PPC970_PME_PM_LSU1_LDF 25 #define PPC970_PME_PM_LSU0_SRQ_STFWD 26 #define PPC970_PME_PM_CR_MAP_FULL_CYC 27 #define PPC970_PME_PM_MRK_LSU0_FLUSH_ULD 28 #define PPC970_PME_PM_LSU_DERAT_MISS 29 #define PPC970_PME_PM_FPU0_SINGLE 30 #define PPC970_PME_PM_FPU1_FDIV 31 #define PPC970_PME_PM_FPU1_FEST 32 #define PPC970_PME_PM_FPU0_FRSP_FCONV 33 #define PPC970_PME_PM_GCT_EMPTY_SRQ_FULL 34 #define PPC970_PME_PM_MRK_ST_CMPL_INT 35 #define PPC970_PME_PM_FLUSH_BR_MPRED 36 #define PPC970_PME_PM_FXU_FIN 37 #define PPC970_PME_PM_FPU_STF 38 #define PPC970_PME_PM_DSLB_MISS 39 #define PPC970_PME_PM_FXLS1_FULL_CYC 40 #define PPC970_PME_PM_LSU_LMQ_LHR_MERGE 41 #define PPC970_PME_PM_MRK_STCX_FAIL 42 #define PPC970_PME_PM_FXU0_BUSY_FXU1_IDLE 43 #define PPC970_PME_PM_MRK_DATA_FROM_L25_SHR 44 #define PPC970_PME_PM_LSU_FLUSH_ULD 45 #define PPC970_PME_PM_MRK_BRU_FIN 46 #define PPC970_PME_PM_IERAT_XLATE_WR 47 #define PPC970_PME_PM_DATA_FROM_MEM 48 #define PPC970_PME_PM_FPR_MAP_FULL_CYC 49 #define PPC970_PME_PM_FPU1_FULL_CYC 50 #define PPC970_PME_PM_FPU0_FIN 51 #define PPC970_PME_PM_GRP_BR_REDIR 52 #define PPC970_PME_PM_THRESH_TIMEO 53 #define PPC970_PME_PM_FPU_FSQRT 54 #define PPC970_PME_PM_MRK_LSU0_FLUSH_LRQ 55 #define PPC970_PME_PM_PMC1_OVERFLOW 56 #define PPC970_PME_PM_FXLS0_FULL_CYC 57 #define PPC970_PME_PM_FPU0_ALL 58 #define PPC970_PME_PM_DATA_TABLEWALK_CYC 59 #define PPC970_PME_PM_FPU0_FEST 60 #define PPC970_PME_PM_DATA_FROM_L25_MOD 61 #define PPC970_PME_PM_LSU0_REJECT_ERAT_MISS 62 #define PPC970_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC 63 #define PPC970_PME_PM_LSU0_REJECT_RELOAD_CDF 64 #define PPC970_PME_PM_FPU_FEST 65 #define PPC970_PME_PM_0INST_FETCH 66 #define PPC970_PME_PM_LD_MISS_L1_LSU0 67 #define PPC970_PME_PM_LSU1_REJECT_RELOAD_CDF 68 #define PPC970_PME_PM_L1_PREF 69 #define PPC970_PME_PM_FPU1_STALL3 70 #define PPC970_PME_PM_BRQ_FULL_CYC 71 #define PPC970_PME_PM_PMC8_OVERFLOW 72 #define PPC970_PME_PM_PMC7_OVERFLOW 73 #define PPC970_PME_PM_WORK_HELD 74 #define PPC970_PME_PM_MRK_LD_MISS_L1_LSU0 75 #define PPC970_PME_PM_FXU_IDLE 76 #define PPC970_PME_PM_INST_CMPL 77 #define PPC970_PME_PM_LSU1_FLUSH_UST 78 #define PPC970_PME_PM_LSU0_FLUSH_ULD 79 #define PPC970_PME_PM_LSU_FLUSH 80 #define PPC970_PME_PM_INST_FROM_L2 81 #define PPC970_PME_PM_LSU1_REJECT_LMQ_FULL 82 #define PPC970_PME_PM_PMC2_OVERFLOW 83 #define PPC970_PME_PM_FPU0_DENORM 84 #define PPC970_PME_PM_FPU1_FMOV_FEST 85 #define PPC970_PME_PM_GRP_DISP_REJECT 86 #define PPC970_PME_PM_LSU_LDF 87 #define PPC970_PME_PM_INST_DISP 88 #define PPC970_PME_PM_DATA_FROM_L25_SHR 89 #define PPC970_PME_PM_L1_DCACHE_RELOAD_VALID 90 #define PPC970_PME_PM_MRK_GRP_ISSUED 91 #define PPC970_PME_PM_FPU_FMA 92 #define PPC970_PME_PM_MRK_CRU_FIN 93 #define PPC970_PME_PM_MRK_LSU1_FLUSH_UST 94 #define PPC970_PME_PM_MRK_FXU_FIN 95 #define PPC970_PME_PM_LSU1_REJECT_ERAT_MISS 96 #define PPC970_PME_PM_BR_ISSUED 97 #define PPC970_PME_PM_PMC4_OVERFLOW 98 #define PPC970_PME_PM_EE_OFF 99 #define PPC970_PME_PM_INST_FROM_L25_MOD 100 #define PPC970_PME_PM_ITLB_MISS 101 #define PPC970_PME_PM_FXU1_BUSY_FXU0_IDLE 102 #define PPC970_PME_PM_GRP_DISP_VALID 103 #define PPC970_PME_PM_MRK_GRP_DISP 104 #define PPC970_PME_PM_LSU_FLUSH_UST 105 #define PPC970_PME_PM_FXU1_FIN 106 #define PPC970_PME_PM_GRP_CMPL 107 #define PPC970_PME_PM_FPU_FRSP_FCONV 108 #define PPC970_PME_PM_MRK_LSU0_FLUSH_SRQ 109 #define PPC970_PME_PM_LSU_LMQ_FULL_CYC 110 #define PPC970_PME_PM_ST_REF_L1_LSU0 111 #define PPC970_PME_PM_LSU0_DERAT_MISS 112 #define PPC970_PME_PM_LSU_SRQ_SYNC_CYC 113 #define PPC970_PME_PM_FPU_STALL3 114 #define PPC970_PME_PM_LSU_REJECT_ERAT_MISS 115 #define PPC970_PME_PM_MRK_DATA_FROM_L2 116 #define PPC970_PME_PM_LSU0_FLUSH_SRQ 117 #define PPC970_PME_PM_FPU0_FMOV_FEST 118 #define PPC970_PME_PM_LD_REF_L1_LSU0 119 #define PPC970_PME_PM_LSU1_FLUSH_SRQ 120 #define PPC970_PME_PM_GRP_BR_MPRED 121 #define PPC970_PME_PM_LSU_LMQ_S0_ALLOC 122 #define PPC970_PME_PM_LSU0_REJECT_LMQ_FULL 123 #define PPC970_PME_PM_ST_REF_L1 124 #define PPC970_PME_PM_MRK_VMX_FIN 125 #define PPC970_PME_PM_LSU_SRQ_EMPTY_CYC 126 #define PPC970_PME_PM_FPU1_STF 127 #define PPC970_PME_PM_RUN_CYC 128 #define PPC970_PME_PM_LSU_LMQ_S0_VALID 129 #define PPC970_PME_PM_LSU0_LDF 130 #define PPC970_PME_PM_LSU_LRQ_S0_VALID 131 #define PPC970_PME_PM_PMC3_OVERFLOW 132 #define PPC970_PME_PM_MRK_IMR_RELOAD 133 #define PPC970_PME_PM_MRK_GRP_TIMEO 134 #define PPC970_PME_PM_FPU_FMOV_FEST 135 #define PPC970_PME_PM_GRP_DISP_BLK_SB_CYC 136 #define PPC970_PME_PM_XER_MAP_FULL_CYC 137 #define PPC970_PME_PM_ST_MISS_L1 138 #define PPC970_PME_PM_STOP_COMPLETION 139 #define PPC970_PME_PM_MRK_GRP_CMPL 140 #define PPC970_PME_PM_ISLB_MISS 141 #define PPC970_PME_PM_SUSPENDED 142 #define PPC970_PME_PM_CYC 143 #define PPC970_PME_PM_LD_MISS_L1_LSU1 144 #define PPC970_PME_PM_STCX_FAIL 145 #define PPC970_PME_PM_LSU1_SRQ_STFWD 146 #define PPC970_PME_PM_GRP_DISP 147 #define PPC970_PME_PM_L2_PREF 148 #define PPC970_PME_PM_FPU1_DENORM 149 #define PPC970_PME_PM_DATA_FROM_L2 150 #define PPC970_PME_PM_FPU0_FPSCR 151 #define PPC970_PME_PM_MRK_DATA_FROM_L25_MOD 152 #define PPC970_PME_PM_FPU0_FSQRT 153 #define PPC970_PME_PM_LD_REF_L1 154 #define PPC970_PME_PM_MRK_L1_RELOAD_VALID 155 #define PPC970_PME_PM_1PLUS_PPC_CMPL 156 #define PPC970_PME_PM_INST_FROM_L1 157 #define PPC970_PME_PM_EE_OFF_EXT_INT 158 #define PPC970_PME_PM_PMC6_OVERFLOW 159 #define PPC970_PME_PM_LSU_LRQ_FULL_CYC 160 #define PPC970_PME_PM_IC_PREF_INSTALL 161 #define PPC970_PME_PM_DC_PREF_OUT_OF_STREAMS 162 #define PPC970_PME_PM_MRK_LSU1_FLUSH_SRQ 163 #define PPC970_PME_PM_GCT_FULL_CYC 164 #define PPC970_PME_PM_INST_FROM_MEM 165 #define PPC970_PME_PM_FLUSH_LSU_BR_MPRED 166 #define PPC970_PME_PM_FXU_BUSY 167 #define PPC970_PME_PM_ST_REF_L1_LSU1 168 #define PPC970_PME_PM_MRK_LD_MISS_L1 169 #define PPC970_PME_PM_L1_WRITE_CYC 170 #define PPC970_PME_PM_LSU_REJECT_LMQ_FULL 171 #define PPC970_PME_PM_FPU_ALL 172 #define PPC970_PME_PM_LSU_SRQ_S0_ALLOC 173 #define PPC970_PME_PM_INST_FROM_L25_SHR 174 #define PPC970_PME_PM_GRP_MRK 175 #define PPC970_PME_PM_BR_MPRED_CR 176 #define PPC970_PME_PM_DC_PREF_STREAM_ALLOC 177 #define PPC970_PME_PM_FPU1_FIN 178 #define PPC970_PME_PM_LSU_REJECT_SRQ 179 #define PPC970_PME_PM_BR_MPRED_TA 180 #define PPC970_PME_PM_CRQ_FULL_CYC 181 #define PPC970_PME_PM_LD_MISS_L1 182 #define PPC970_PME_PM_INST_FROM_PREF 183 #define PPC970_PME_PM_STCX_PASS 184 #define PPC970_PME_PM_DC_INV_L2 185 #define PPC970_PME_PM_LSU_SRQ_FULL_CYC 186 #define PPC970_PME_PM_LSU0_FLUSH_LRQ 187 #define PPC970_PME_PM_LSU_SRQ_S0_VALID 188 #define PPC970_PME_PM_LARX_LSU0 189 #define PPC970_PME_PM_GCT_EMPTY_CYC 190 #define PPC970_PME_PM_FPU1_ALL 191 #define PPC970_PME_PM_FPU1_FSQRT 192 #define PPC970_PME_PM_FPU_FIN 193 #define PPC970_PME_PM_LSU_SRQ_STFWD 194 #define PPC970_PME_PM_MRK_LD_MISS_L1_LSU1 195 #define PPC970_PME_PM_FXU0_FIN 196 #define PPC970_PME_PM_MRK_FPU_FIN 197 #define PPC970_PME_PM_PMC5_OVERFLOW 198 #define PPC970_PME_PM_SNOOP_TLBIE 199 #define PPC970_PME_PM_FPU1_FRSP_FCONV 200 #define PPC970_PME_PM_FPU0_FDIV 201 #define PPC970_PME_PM_LD_REF_L1_LSU1 202 #define PPC970_PME_PM_HV_CYC 203 #define PPC970_PME_PM_LR_CTR_MAP_FULL_CYC 204 #define PPC970_PME_PM_FPU_DENORM 205 #define PPC970_PME_PM_LSU0_REJECT_SRQ 206 #define PPC970_PME_PM_LSU1_REJECT_SRQ 207 #define PPC970_PME_PM_LSU1_DERAT_MISS 208 #define PPC970_PME_PM_IC_PREF_REQ 209 #define PPC970_PME_PM_MRK_LSU_FIN 210 #define PPC970_PME_PM_MRK_DATA_FROM_MEM 211 #define PPC970_PME_PM_LSU0_FLUSH_UST 212 #define PPC970_PME_PM_LSU_FLUSH_LRQ 213 #define PPC970_PME_PM_LSU_FLUSH_SRQ 214 static const pme_power_entry_t ppc970_pe[] = { [ PPC970_PME_PM_LSU_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU_REJECT_RELOAD_CDF", .pme_code = 0x6920, .pme_short_desc = "LSU reject due to reload CDF or tag update collision", .pme_long_desc = "LSU reject due to reload CDF or tag update collision", }, [ PPC970_PME_PM_MRK_LSU_SRQ_INST_VALID ] = { .pme_name = "PM_MRK_LSU_SRQ_INST_VALID", .pme_code = 0x936, .pme_short_desc = "Marked instruction valid in SRQ", .pme_long_desc = "This signal is asserted every cycle when a marked request is resident in the Store Request Queue", }, [ PPC970_PME_PM_FPU1_SINGLE ] = { .pme_name = "PM_FPU1_SINGLE", .pme_code = 0x127, .pme_short_desc = "FPU1 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing single precision instruction.", }, [ PPC970_PME_PM_FPU0_STALL3 ] = { .pme_name = "PM_FPU0_STALL3", .pme_code = 0x121, .pme_short_desc = "FPU0 stalled in pipe3", .pme_long_desc = "This signal indicates that fp0 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ PPC970_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x8005, .pme_short_desc = "Time Base bit transition", .pme_long_desc = "When the selected time base bit (as specified in MMCR0[TBSEL])transitions from 0 to 1 ", }, [ PPC970_PME_PM_GPR_MAP_FULL_CYC ] = { .pme_name = "PM_GPR_MAP_FULL_CYC", .pme_code = 0x335, .pme_short_desc = "Cycles GPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the gpr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x1003, .pme_short_desc = "Marked store instruction completed", .pme_long_desc = "A sampled store has completed (data home)", }, [ PPC970_PME_PM_FPU0_STF ] = { .pme_name = "PM_FPU0_STF", .pme_code = 0x122, .pme_short_desc = "FPU0 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a store instruction.", }, [ PPC970_PME_PM_FPU1_FMA ] = { .pme_name = "PM_FPU1_FMA", .pme_code = 0x105, .pme_short_desc = "FPU1 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_LSU1_FLUSH_ULD ] = { .pme_name = "PM_LSU1_FLUSH_ULD", .pme_code = 0x804, .pme_short_desc = "LSU1 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970_PME_PM_MRK_INST_FIN ] = { .pme_name = "PM_MRK_INST_FIN", .pme_code = 0x7005, .pme_short_desc = "Marked instruction finished", .pme_long_desc = "One of the execution units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970_PME_PM_MRK_LSU0_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU0_FLUSH_UST", .pme_code = 0x711, .pme_short_desc = "LSU0 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 0 because it was unaligned", }, [ PPC970_PME_PM_LSU_LRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LRQ_S0_ALLOC", .pme_code = 0x826, .pme_short_desc = "LRQ slot 0 allocated", .pme_long_desc = "LRQ slot zero was allocated", }, [ PPC970_PME_PM_FPU_FDIV ] = { .pme_name = "PM_FPU_FDIV", .pme_code = 0x1100, .pme_short_desc = "FPU executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_FPU0_FULL_CYC ] = { .pme_name = "PM_FPU0_FULL_CYC", .pme_code = 0x303, .pme_short_desc = "Cycles FPU0 issue queue full", .pme_long_desc = "The issue queue for FPU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970_PME_PM_FPU_SINGLE ] = { .pme_name = "PM_FPU_SINGLE", .pme_code = 0x5120, .pme_short_desc = "FPU executed single precision instruction", .pme_long_desc = "FPU is executing single precision instruction. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_FPU0_FMA ] = { .pme_name = "PM_FPU0_FMA", .pme_code = 0x101, .pme_short_desc = "FPU0 executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_MRK_LSU1_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU1_FLUSH_ULD", .pme_code = 0x714, .pme_short_desc = "LSU1 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 1 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970_PME_PM_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_LSU1_FLUSH_LRQ", .pme_code = 0x806, .pme_short_desc = "LSU1 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x704, .pme_short_desc = "Data TLB misses", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ PPC970_PME_PM_MRK_ST_MISS_L1 ] = { .pme_name = "PM_MRK_ST_MISS_L1", .pme_code = 0x723, .pme_short_desc = "Marked L1 D cache store misses", .pme_long_desc = "A marked store missed the dcache", }, [ PPC970_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x8002, .pme_short_desc = "External interrupts", .pme_long_desc = "An external interrupt occurred", }, [ PPC970_PME_PM_MRK_LSU1_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_LRQ", .pme_code = 0x716, .pme_short_desc = "LSU1 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970_PME_PM_MRK_ST_GPS ] = { .pme_name = "PM_MRK_ST_GPS", .pme_code = 0x6003, .pme_short_desc = "Marked store sent to GPS", .pme_long_desc = "A sampled store has been sent to the memory subsystem", }, [ PPC970_PME_PM_GRP_DISP_SUCCESS ] = { .pme_name = "PM_GRP_DISP_SUCCESS", .pme_code = 0x5001, .pme_short_desc = "Group dispatch success", .pme_long_desc = "Number of groups sucessfully dispatched (not rejected)", }, [ PPC970_PME_PM_LSU1_LDF ] = { .pme_name = "PM_LSU1_LDF", .pme_code = 0x734, .pme_short_desc = "LSU1 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 1", }, [ PPC970_PME_PM_LSU0_SRQ_STFWD ] = { .pme_name = "PM_LSU0_SRQ_STFWD", .pme_code = 0x820, .pme_short_desc = "LSU0 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 0", }, [ PPC970_PME_PM_CR_MAP_FULL_CYC ] = { .pme_name = "PM_CR_MAP_FULL_CYC", .pme_code = 0x304, .pme_short_desc = "Cycles CR logical operation mapper full", .pme_long_desc = "The ISU sends a signal indicating that the cr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970_PME_PM_MRK_LSU0_FLUSH_ULD ] = { .pme_name = "PM_MRK_LSU0_FLUSH_ULD", .pme_code = 0x710, .pme_short_desc = "LSU0 marked unaligned load flushes", .pme_long_desc = "A marked load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x6700, .pme_short_desc = "DERAT misses", .pme_long_desc = "Total D-ERAT Misses (Unit 0 + Unit 1). Requests that miss the Derat are rejected and retried until the request hits in the Erat. This may result in multiple erat misses for the same instruction.", }, [ PPC970_PME_PM_FPU0_SINGLE ] = { .pme_name = "PM_FPU0_SINGLE", .pme_code = 0x123, .pme_short_desc = "FPU0 executed single precision instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing single precision instruction.", }, [ PPC970_PME_PM_FPU1_FDIV ] = { .pme_name = "PM_FPU1_FDIV", .pme_code = 0x104, .pme_short_desc = "FPU1 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ PPC970_PME_PM_FPU1_FEST ] = { .pme_name = "PM_FPU1_FEST", .pme_code = 0x116, .pme_short_desc = "FPU1 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ PPC970_PME_PM_FPU0_FRSP_FCONV ] = { .pme_name = "PM_FPU0_FRSP_FCONV", .pme_code = 0x111, .pme_short_desc = "FPU0 executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when fp0 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_GCT_EMPTY_SRQ_FULL ] = { .pme_name = "PM_GCT_EMPTY_SRQ_FULL", .pme_code = 0x200b, .pme_short_desc = "GCT empty caused by SRQ full", .pme_long_desc = "GCT empty caused by SRQ full", }, [ PPC970_PME_PM_MRK_ST_CMPL_INT ] = { .pme_name = "PM_MRK_ST_CMPL_INT", .pme_code = 0x3003, .pme_short_desc = "Marked store completed with intervention", .pme_long_desc = "A marked store previously sent to the memory subsystem completed (data home) after requiring intervention", }, [ PPC970_PME_PM_FLUSH_BR_MPRED ] = { .pme_name = "PM_FLUSH_BR_MPRED", .pme_code = 0x316, .pme_short_desc = "Flush caused by branch mispredict", .pme_long_desc = "Flush caused by branch mispredict", }, [ PPC970_PME_PM_FXU_FIN ] = { .pme_name = "PM_FXU_FIN", .pme_code = 0x3330, .pme_short_desc = "FXU produced a result", .pme_long_desc = "The fixed point unit (Unit 0 + Unit 1) finished a marked instruction. Instructions that finish may not necessary complete.", }, [ PPC970_PME_PM_FPU_STF ] = { .pme_name = "PM_FPU_STF", .pme_code = 0x6120, .pme_short_desc = "FPU executed store instruction", .pme_long_desc = "FPU is executing a store instruction. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_DSLB_MISS ] = { .pme_name = "PM_DSLB_MISS", .pme_code = 0x705, .pme_short_desc = "Data SLB misses", .pme_long_desc = "A SLB miss for a data request occurred. SLB misses trap to the operating system to resolve", }, [ PPC970_PME_PM_FXLS1_FULL_CYC ] = { .pme_name = "PM_FXLS1_FULL_CYC", .pme_code = 0x314, .pme_short_desc = "Cycles FXU1/LS1 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970_PME_PM_LSU_LMQ_LHR_MERGE ] = { .pme_name = "PM_LSU_LMQ_LHR_MERGE", .pme_code = 0x935, .pme_short_desc = "LMQ LHR merges", .pme_long_desc = "A dcache miss occured for the same real cache line address as an earlier request already in the Load Miss Queue and was merged into the LMQ entry.", }, [ PPC970_PME_PM_MRK_STCX_FAIL ] = { .pme_name = "PM_MRK_STCX_FAIL", .pme_code = 0x726, .pme_short_desc = "Marked STCX failed", .pme_long_desc = "A marked stcx (stwcx or stdcx) failed", }, [ PPC970_PME_PM_FXU0_BUSY_FXU1_IDLE ] = { .pme_name = "PM_FXU0_BUSY_FXU1_IDLE", .pme_code = 0x7002, .pme_short_desc = "FXU0 busy FXU1 idle", .pme_long_desc = "FXU0 is busy while FXU1 was idle", }, [ PPC970_PME_PM_MRK_DATA_FROM_L25_SHR ] = { .pme_name = "PM_MRK_DATA_FROM_L25_SHR", .pme_code = 0x193d, .pme_short_desc = "Marked data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a marked demand load", }, [ PPC970_PME_PM_LSU_FLUSH_ULD ] = { .pme_name = "PM_LSU_FLUSH_ULD", .pme_code = 0x1800, .pme_short_desc = "LRQ unaligned load flushes", .pme_long_desc = "A load was flushed because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970_PME_PM_MRK_BRU_FIN ] = { .pme_name = "PM_MRK_BRU_FIN", .pme_code = 0x2005, .pme_short_desc = "Marked instruction BRU processing finished", .pme_long_desc = "The branch unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970_PME_PM_IERAT_XLATE_WR ] = { .pme_name = "PM_IERAT_XLATE_WR", .pme_code = 0x430, .pme_short_desc = "Translation written to ierat", .pme_long_desc = "This signal will be asserted each time the I-ERAT is written. This indicates that an ERAT miss has been serviced. ERAT misses will initiate a sequence resulting in the ERAT being written. ERAT misses that are later ignored will not be counted unless the ERAT is written before the instruction stream is changed, This should be a fairly accurate count of ERAT missed (best available).", }, [ PPC970_PME_PM_DATA_FROM_MEM ] = { .pme_name = "PM_DATA_FROM_MEM", .pme_code = 0x3837, .pme_short_desc = "Data loaded from memory", .pme_long_desc = "Data loaded from memory", }, [ PPC970_PME_PM_FPR_MAP_FULL_CYC ] = { .pme_name = "PM_FPR_MAP_FULL_CYC", .pme_code = 0x301, .pme_short_desc = "Cycles FPR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the FPR mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970_PME_PM_FPU1_FULL_CYC ] = { .pme_name = "PM_FPU1_FULL_CYC", .pme_code = 0x307, .pme_short_desc = "Cycles FPU1 issue queue full", .pme_long_desc = "The issue queue for FPU unit 1 cannot accept any more instructions. Issue is stopped", }, [ PPC970_PME_PM_FPU0_FIN ] = { .pme_name = "PM_FPU0_FIN", .pme_code = 0x113, .pme_short_desc = "FPU0 produced a result", .pme_long_desc = "fp0 finished, produced a result This only indicates finish, not completion. ", }, [ PPC970_PME_PM_GRP_BR_REDIR ] = { .pme_name = "PM_GRP_BR_REDIR", .pme_code = 0x326, .pme_short_desc = "Group experienced branch redirect", .pme_long_desc = "Group experienced branch redirect", }, [ PPC970_PME_PM_THRESH_TIMEO ] = { .pme_name = "PM_THRESH_TIMEO", .pme_code = 0x2003, .pme_short_desc = "Threshold timeout", .pme_long_desc = "The threshold timer expired", }, [ PPC970_PME_PM_FPU_FSQRT ] = { .pme_name = "PM_FPU_FSQRT", .pme_code = 0x6100, .pme_short_desc = "FPU executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when FPU is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_MRK_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_LRQ", .pme_code = 0x712, .pme_short_desc = "LSU0 marked LRQ flushes", .pme_long_desc = "A marked load was flushed by unit 0 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x200a, .pme_short_desc = "PMC1 Overflow", .pme_long_desc = "PMC1 Overflow", }, [ PPC970_PME_PM_FXLS0_FULL_CYC ] = { .pme_name = "PM_FXLS0_FULL_CYC", .pme_code = 0x310, .pme_short_desc = "Cycles FXU0/LS0 queue full", .pme_long_desc = "The issue queue for FXU/LSU unit 0 cannot accept any more instructions. Issue is stopped", }, [ PPC970_PME_PM_FPU0_ALL ] = { .pme_name = "PM_FPU0_ALL", .pme_code = 0x103, .pme_short_desc = "FPU0 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ PPC970_PME_PM_DATA_TABLEWALK_CYC ] = { .pme_name = "PM_DATA_TABLEWALK_CYC", .pme_code = 0x707, .pme_short_desc = "Cycles doing data tablewalks", .pme_long_desc = "This signal is asserted every cycle when a tablewalk is active. While a tablewalk is active any request attempting to access the TLB will be rejected and retried.", }, [ PPC970_PME_PM_FPU0_FEST ] = { .pme_name = "PM_FPU0_FEST", .pme_code = 0x112, .pme_short_desc = "FPU0 executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. ", }, [ PPC970_PME_PM_DATA_FROM_L25_MOD ] = { .pme_name = "PM_DATA_FROM_L25_MOD", .pme_code = 0x383d, .pme_short_desc = "Data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a demand load", }, [ PPC970_PME_PM_LSU0_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU0_REJECT_ERAT_MISS", .pme_code = 0x923, .pme_short_desc = "LSU0 reject due to ERAT miss", .pme_long_desc = "LSU0 reject due to ERAT miss", }, [ PPC970_PME_PM_LSU_LMQ_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_LMQ_SRQ_EMPTY_CYC", .pme_code = 0x2002, .pme_short_desc = "Cycles LMQ and SRQ empty", .pme_long_desc = "Cycles when both the LMQ and SRQ are empty (LSU is idle)", }, [ PPC970_PME_PM_LSU0_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU0_REJECT_RELOAD_CDF", .pme_code = 0x922, .pme_short_desc = "LSU0 reject due to reload CDF or tag update collision", .pme_long_desc = "LSU0 reject due to reload CDF or tag update collision", }, [ PPC970_PME_PM_FPU_FEST ] = { .pme_name = "PM_FPU_FEST", .pme_code = 0x3110, .pme_short_desc = "FPU executed FEST instruction", .pme_long_desc = "This signal is active for one cycle when executing one of the estimate instructions. This could be fres* or frsqrte* where XYZ* means XYZ or XYZ. Combined Unit 0 + Unit 1.", }, [ PPC970_PME_PM_0INST_FETCH ] = { .pme_name = "PM_0INST_FETCH", .pme_code = 0x442d, .pme_short_desc = "No instructions fetched", .pme_long_desc = "No instructions were fetched this cycles (due to IFU hold, redirect, or icache miss)", }, [ PPC970_PME_PM_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_LD_MISS_L1_LSU0", .pme_code = 0x812, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 0, missed the dcache", }, [ PPC970_PME_PM_LSU1_REJECT_RELOAD_CDF ] = { .pme_name = "PM_LSU1_REJECT_RELOAD_CDF", .pme_code = 0x926, .pme_short_desc = "LSU1 reject due to reload CDF or tag update collision", .pme_long_desc = "LSU1 reject due to reload CDF or tag update collision", }, [ PPC970_PME_PM_L1_PREF ] = { .pme_name = "PM_L1_PREF", .pme_code = 0x731, .pme_short_desc = "L1 cache data prefetches", .pme_long_desc = "A request to prefetch data into the L1 was made", }, [ PPC970_PME_PM_FPU1_STALL3 ] = { .pme_name = "PM_FPU1_STALL3", .pme_code = 0x125, .pme_short_desc = "FPU1 stalled in pipe3", .pme_long_desc = "This signal indicates that fp1 has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. ", }, [ PPC970_PME_PM_BRQ_FULL_CYC ] = { .pme_name = "PM_BRQ_FULL_CYC", .pme_code = 0x305, .pme_short_desc = "Cycles branch queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu br unit cannot accept any more group (queue is full of groups).", }, [ PPC970_PME_PM_PMC8_OVERFLOW ] = { .pme_name = "PM_PMC8_OVERFLOW", .pme_code = 0x100a, .pme_short_desc = "PMC8 Overflow", .pme_long_desc = "PMC8 Overflow", }, [ PPC970_PME_PM_PMC7_OVERFLOW ] = { .pme_name = "PM_PMC7_OVERFLOW", .pme_code = 0x800a, .pme_short_desc = "PMC7 Overflow", .pme_long_desc = "PMC7 Overflow", }, [ PPC970_PME_PM_WORK_HELD ] = { .pme_name = "PM_WORK_HELD", .pme_code = 0x2001, .pme_short_desc = "Work held", .pme_long_desc = "RAS Unit has signaled completion to stop and there are groups waiting to complete", }, [ PPC970_PME_PM_MRK_LD_MISS_L1_LSU0 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU0", .pme_code = 0x720, .pme_short_desc = "LSU0 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 0, missed the dcache", }, [ PPC970_PME_PM_FXU_IDLE ] = { .pme_name = "PM_FXU_IDLE", .pme_code = 0x5002, .pme_short_desc = "FXU idle", .pme_long_desc = "FXU0 and FXU1 are both idle", }, [ PPC970_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x1, .pme_short_desc = "Instructions completed", .pme_long_desc = "Number of Eligible Instructions that completed. ", }, [ PPC970_PME_PM_LSU1_FLUSH_UST ] = { .pme_name = "PM_LSU1_FLUSH_UST", .pme_code = 0x805, .pme_short_desc = "LSU1 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ PPC970_PME_PM_LSU0_FLUSH_ULD ] = { .pme_name = "PM_LSU0_FLUSH_ULD", .pme_code = 0x800, .pme_short_desc = "LSU0 unaligned load flushes", .pme_long_desc = "A load was flushed from unit 0 because it was unaligned (crossed a 64byte boundary, or 32 byte if it missed the L1)", }, [ PPC970_PME_PM_LSU_FLUSH ] = { .pme_name = "PM_LSU_FLUSH", .pme_code = 0x315, .pme_short_desc = "Flush initiated by LSU", .pme_long_desc = "Flush initiated by LSU", }, [ PPC970_PME_PM_INST_FROM_L2 ] = { .pme_name = "PM_INST_FROM_L2", .pme_code = 0x1426, .pme_short_desc = "Instructions fetched from L2", .pme_long_desc = "An instruction fetch group was fetched from L2. Fetch Groups can contain up to 8 instructions", }, [ PPC970_PME_PM_LSU1_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU1_REJECT_LMQ_FULL", .pme_code = 0x925, .pme_short_desc = "LSU1 reject due to LMQ full or missed data coming", .pme_long_desc = "LSU1 reject due to LMQ full or missed data coming", }, [ PPC970_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x300a, .pme_short_desc = "PMC2 Overflow", .pme_long_desc = "PMC2 Overflow", }, [ PPC970_PME_PM_FPU0_DENORM ] = { .pme_name = "PM_FPU0_DENORM", .pme_code = 0x120, .pme_short_desc = "FPU0 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ PPC970_PME_PM_FPU1_FMOV_FEST ] = { .pme_name = "PM_FPU1_FMOV_FEST", .pme_code = 0x114, .pme_short_desc = "FPU1 executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ PPC970_PME_PM_GRP_DISP_REJECT ] = { .pme_name = "PM_GRP_DISP_REJECT", .pme_code = 0x324, .pme_short_desc = "Group dispatch rejected", .pme_long_desc = "A group that previously attempted dispatch was rejected.", }, [ PPC970_PME_PM_LSU_LDF ] = { .pme_name = "PM_LSU_LDF", .pme_code = 0x8730, .pme_short_desc = "LSU executed Floating Point load instruction", .pme_long_desc = "LSU executed Floating Point load instruction", }, [ PPC970_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x320, .pme_short_desc = "Instructions dispatched", .pme_long_desc = "The ISU sends the number of instructions dispatched.", }, [ PPC970_PME_PM_DATA_FROM_L25_SHR ] = { .pme_name = "PM_DATA_FROM_L25_SHR", .pme_code = 0x183d, .pme_short_desc = "Data loaded from L2.5 shared", .pme_long_desc = "DL1 was reloaded with shared (T or SL) data from the L2 of a chip on this MCM due to a demand load", }, [ PPC970_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0x834, .pme_short_desc = "L1 reload data source valid", .pme_long_desc = "The data source information is valid", }, [ PPC970_PME_PM_MRK_GRP_ISSUED ] = { .pme_name = "PM_MRK_GRP_ISSUED", .pme_code = 0x6005, .pme_short_desc = "Marked group issued", .pme_long_desc = "A sampled instruction was issued", }, [ PPC970_PME_PM_FPU_FMA ] = { .pme_name = "PM_FPU_FMA", .pme_code = 0x2100, .pme_short_desc = "FPU executed multiply-add instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing multiply-add kind of instruction. This could be fmadd*, fnmadd*, fmsub*, fnmsub* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_MRK_CRU_FIN ] = { .pme_name = "PM_MRK_CRU_FIN", .pme_code = 0x4005, .pme_short_desc = "Marked instruction CRU processing finished", .pme_long_desc = "The Condition Register Unit finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970_PME_PM_MRK_LSU1_FLUSH_UST ] = { .pme_name = "PM_MRK_LSU1_FLUSH_UST", .pme_code = 0x715, .pme_short_desc = "LSU1 marked unaligned store flushes", .pme_long_desc = "A marked store was flushed from unit 1 because it was unaligned (crossed a 4k boundary)", }, [ PPC970_PME_PM_MRK_FXU_FIN ] = { .pme_name = "PM_MRK_FXU_FIN", .pme_code = 0x6004, .pme_short_desc = "Marked instruction FXU processing finished", .pme_long_desc = "Marked instruction FXU processing finished", }, [ PPC970_PME_PM_LSU1_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU1_REJECT_ERAT_MISS", .pme_code = 0x927, .pme_short_desc = "LSU1 reject due to ERAT miss", .pme_long_desc = "LSU1 reject due to ERAT miss", }, [ PPC970_PME_PM_BR_ISSUED ] = { .pme_name = "PM_BR_ISSUED", .pme_code = 0x431, .pme_short_desc = "Branches issued", .pme_long_desc = "This signal will be asserted each time the ISU issues a branch instruction. This signal will be asserted each time the ISU selects a branch instruction to issue.", }, [ PPC970_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x500a, .pme_short_desc = "PMC4 Overflow", .pme_long_desc = "PMC4 Overflow", }, [ PPC970_PME_PM_EE_OFF ] = { .pme_name = "PM_EE_OFF", .pme_code = 0x333, .pme_short_desc = "Cycles MSR(EE) bit off", .pme_long_desc = "The number of Cycles MSR(EE) bit was off.", }, [ PPC970_PME_PM_INST_FROM_L25_MOD ] = { .pme_name = "PM_INST_FROM_L25_MOD", .pme_code = 0x6426, .pme_short_desc = "Instruction fetched from L2.5 modified", .pme_long_desc = "Instruction fetched from L2.5 modified", }, [ PPC970_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x700, .pme_short_desc = "Instruction TLB misses", .pme_long_desc = "A TLB miss for an Instruction Fetch has occurred", }, [ PPC970_PME_PM_FXU1_BUSY_FXU0_IDLE ] = { .pme_name = "PM_FXU1_BUSY_FXU0_IDLE", .pme_code = 0x4002, .pme_short_desc = "FXU1 busy FXU0 idle", .pme_long_desc = "FXU0 was idle while FXU1 was busy", }, [ PPC970_PME_PM_GRP_DISP_VALID ] = { .pme_name = "PM_GRP_DISP_VALID", .pme_code = 0x323, .pme_short_desc = "Group dispatch valid", .pme_long_desc = "Dispatch has been attempted for a valid group. Some groups may be rejected. The total number of successful dispatches is the number of dispatch valid minus dispatch reject.", }, [ PPC970_PME_PM_MRK_GRP_DISP ] = { .pme_name = "PM_MRK_GRP_DISP", .pme_code = 0x1002, .pme_short_desc = "Marked group dispatched", .pme_long_desc = "A group containing a sampled instruction was dispatched", }, [ PPC970_PME_PM_LSU_FLUSH_UST ] = { .pme_name = "PM_LSU_FLUSH_UST", .pme_code = 0x2800, .pme_short_desc = "SRQ unaligned store flushes", .pme_long_desc = "A store was flushed because it was unaligned", }, [ PPC970_PME_PM_FXU1_FIN ] = { .pme_name = "PM_FXU1_FIN", .pme_code = 0x336, .pme_short_desc = "FXU1 produced a result", .pme_long_desc = "The Fixed Point unit 1 finished an instruction and produced a result", }, [ PPC970_PME_PM_GRP_CMPL ] = { .pme_name = "PM_GRP_CMPL", .pme_code = 0x7003, .pme_short_desc = "Group completed", .pme_long_desc = "A group completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ PPC970_PME_PM_FPU_FRSP_FCONV ] = { .pme_name = "PM_FPU_FRSP_FCONV", .pme_code = 0x7110, .pme_short_desc = "FPU executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_MRK_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU0_FLUSH_SRQ", .pme_code = 0x713, .pme_short_desc = "LSU0 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970_PME_PM_LSU_LMQ_FULL_CYC ] = { .pme_name = "PM_LSU_LMQ_FULL_CYC", .pme_code = 0x837, .pme_short_desc = "Cycles LMQ full", .pme_long_desc = "The LMQ was full", }, [ PPC970_PME_PM_ST_REF_L1_LSU0 ] = { .pme_name = "PM_ST_REF_L1_LSU0", .pme_code = 0x811, .pme_short_desc = "LSU0 L1 D cache store references", .pme_long_desc = "A store executed on unit 0", }, [ PPC970_PME_PM_LSU0_DERAT_MISS ] = { .pme_name = "PM_LSU0_DERAT_MISS", .pme_code = 0x702, .pme_short_desc = "LSU0 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 0 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ PPC970_PME_PM_LSU_SRQ_SYNC_CYC ] = { .pme_name = "PM_LSU_SRQ_SYNC_CYC", .pme_code = 0x735, .pme_short_desc = "SRQ sync duration", .pme_long_desc = "This signal is asserted every cycle when a sync is in the SRQ.", }, [ PPC970_PME_PM_FPU_STALL3 ] = { .pme_name = "PM_FPU_STALL3", .pme_code = 0x2120, .pme_short_desc = "FPU stalled in pipe3", .pme_long_desc = "FPU has generated a stall in pipe3 due to overflow, underflow, massive cancel, convert to integer (sometimes), or convert from integer (always). This signal is active during the entire duration of the stall. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_LSU_REJECT_ERAT_MISS ] = { .pme_name = "PM_LSU_REJECT_ERAT_MISS", .pme_code = 0x5920, .pme_short_desc = "LSU reject due to ERAT miss", .pme_long_desc = "LSU reject due to ERAT miss", }, [ PPC970_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1937, .pme_short_desc = "Marked data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a marked demand load", }, [ PPC970_PME_PM_LSU0_FLUSH_SRQ ] = { .pme_name = "PM_LSU0_FLUSH_SRQ", .pme_code = 0x803, .pme_short_desc = "LSU0 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970_PME_PM_FPU0_FMOV_FEST ] = { .pme_name = "PM_FPU0_FMOV_FEST", .pme_code = 0x110, .pme_short_desc = "FPU0 executed FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when fp0 is executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ", }, [ PPC970_PME_PM_LD_REF_L1_LSU0 ] = { .pme_name = "PM_LD_REF_L1_LSU0", .pme_code = 0x810, .pme_short_desc = "LSU0 L1 D cache load references", .pme_long_desc = "A load executed on unit 0", }, [ PPC970_PME_PM_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_LSU1_FLUSH_SRQ", .pme_code = 0x807, .pme_short_desc = "LSU1 SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group. ", }, [ PPC970_PME_PM_GRP_BR_MPRED ] = { .pme_name = "PM_GRP_BR_MPRED", .pme_code = 0x327, .pme_short_desc = "Group experienced a branch mispredict", .pme_long_desc = "Group experienced a branch mispredict", }, [ PPC970_PME_PM_LSU_LMQ_S0_ALLOC ] = { .pme_name = "PM_LSU_LMQ_S0_ALLOC", .pme_code = 0x836, .pme_short_desc = "LMQ slot 0 allocated", .pme_long_desc = "The first entry in the LMQ was allocated.", }, [ PPC970_PME_PM_LSU0_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU0_REJECT_LMQ_FULL", .pme_code = 0x921, .pme_short_desc = "LSU0 reject due to LMQ full or missed data coming", .pme_long_desc = "LSU0 reject due to LMQ full or missed data coming", }, [ PPC970_PME_PM_ST_REF_L1 ] = { .pme_name = "PM_ST_REF_L1", .pme_code = 0x7810, .pme_short_desc = "L1 D cache store references", .pme_long_desc = "Total DL1 Store references", }, [ PPC970_PME_PM_MRK_VMX_FIN ] = { .pme_name = "PM_MRK_VMX_FIN", .pme_code = 0x3005, .pme_short_desc = "Marked instruction VMX processing finished", .pme_long_desc = "Marked instruction VMX processing finished", }, [ PPC970_PME_PM_LSU_SRQ_EMPTY_CYC ] = { .pme_name = "PM_LSU_SRQ_EMPTY_CYC", .pme_code = 0x4003, .pme_short_desc = "Cycles SRQ empty", .pme_long_desc = "The Store Request Queue is empty", }, [ PPC970_PME_PM_FPU1_STF ] = { .pme_name = "PM_FPU1_STF", .pme_code = 0x126, .pme_short_desc = "FPU1 executed store instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing a store instruction.", }, [ PPC970_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x1005, .pme_short_desc = "Run cycles", .pme_long_desc = "Processor Cycles gated by the run latch", }, [ PPC970_PME_PM_LSU_LMQ_S0_VALID ] = { .pme_name = "PM_LSU_LMQ_S0_VALID", .pme_code = 0x835, .pme_short_desc = "LMQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle when the first entry in the LMQ is valid. The LMQ had eight entries that are allocated FIFO", }, [ PPC970_PME_PM_LSU0_LDF ] = { .pme_name = "PM_LSU0_LDF", .pme_code = 0x730, .pme_short_desc = "LSU0 executed Floating Point load instruction", .pme_long_desc = "A floating point load was executed from LSU unit 0", }, [ PPC970_PME_PM_LSU_LRQ_S0_VALID ] = { .pme_name = "PM_LSU_LRQ_S0_VALID", .pme_code = 0x822, .pme_short_desc = "LRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Load Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ PPC970_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x400a, .pme_short_desc = "PMC3 Overflow", .pme_long_desc = "PMC3 Overflow", }, [ PPC970_PME_PM_MRK_IMR_RELOAD ] = { .pme_name = "PM_MRK_IMR_RELOAD", .pme_code = 0x722, .pme_short_desc = "Marked IMR reloaded", .pme_long_desc = "A DL1 reload occured due to marked load", }, [ PPC970_PME_PM_MRK_GRP_TIMEO ] = { .pme_name = "PM_MRK_GRP_TIMEO", .pme_code = 0x5005, .pme_short_desc = "Marked group completion timeout", .pme_long_desc = "The sampling timeout expired indicating that the previously sampled instruction is no longer in the processor", }, [ PPC970_PME_PM_FPU_FMOV_FEST ] = { .pme_name = "PM_FPU_FMOV_FEST", .pme_code = 0x8110, .pme_short_desc = "FPU executing FMOV or FEST instructions", .pme_long_desc = "This signal is active for one cycle when executing a move kind of instruction or one of the estimate instructions.. This could be fmr*, fneg*, fabs*, fnabs* , fres* or frsqrte* where XYZ* means XYZ or XYZ . Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_GRP_DISP_BLK_SB_CYC ] = { .pme_name = "PM_GRP_DISP_BLK_SB_CYC", .pme_code = 0x331, .pme_short_desc = "Cycles group dispatch blocked by scoreboard", .pme_long_desc = "The ISU sends a signal indicating that dispatch is blocked by scoreboard.", }, [ PPC970_PME_PM_XER_MAP_FULL_CYC ] = { .pme_name = "PM_XER_MAP_FULL_CYC", .pme_code = 0x302, .pme_short_desc = "Cycles XER mapper full", .pme_long_desc = "The ISU sends a signal indicating that the xer mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0x813, .pme_short_desc = "L1 D cache store misses", .pme_long_desc = "A store missed the dcache", }, [ PPC970_PME_PM_STOP_COMPLETION ] = { .pme_name = "PM_STOP_COMPLETION", .pme_code = 0x3001, .pme_short_desc = "Completion stopped", .pme_long_desc = "RAS Unit has signaled completion to stop", }, [ PPC970_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x4004, .pme_short_desc = "Marked group completed", .pme_long_desc = "A group containing a sampled instruction completed. Microcoded instructions that span multiple groups will generate this event once per group.", }, [ PPC970_PME_PM_ISLB_MISS ] = { .pme_name = "PM_ISLB_MISS", .pme_code = 0x701, .pme_short_desc = "Instruction SLB misses", .pme_long_desc = "A SLB miss for an instruction fetch as occurred", }, [ PPC970_PME_PM_SUSPENDED ] = { .pme_name = "PM_SUSPENDED", .pme_code = 0x0, .pme_short_desc = "Suspended", .pme_long_desc = "Suspended", }, [ PPC970_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x7, .pme_short_desc = "Processor cycles", .pme_long_desc = "Processor cycles", }, [ PPC970_PME_PM_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_LD_MISS_L1_LSU1", .pme_code = 0x816, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A load, executing on unit 1, missed the dcache", }, [ PPC970_PME_PM_STCX_FAIL ] = { .pme_name = "PM_STCX_FAIL", .pme_code = 0x721, .pme_short_desc = "STCX failed", .pme_long_desc = "A stcx (stwcx or stdcx) failed", }, [ PPC970_PME_PM_LSU1_SRQ_STFWD ] = { .pme_name = "PM_LSU1_SRQ_STFWD", .pme_code = 0x824, .pme_short_desc = "LSU1 SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load on unit 1", }, [ PPC970_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x2004, .pme_short_desc = "Group dispatches", .pme_long_desc = "A group was dispatched", }, [ PPC970_PME_PM_L2_PREF ] = { .pme_name = "PM_L2_PREF", .pme_code = 0x733, .pme_short_desc = "L2 cache prefetches", .pme_long_desc = "A request to prefetch data into L2 was made", }, [ PPC970_PME_PM_FPU1_DENORM ] = { .pme_name = "PM_FPU1_DENORM", .pme_code = 0x124, .pme_short_desc = "FPU1 received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized.", }, [ PPC970_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1837, .pme_short_desc = "Data loaded from L2", .pme_long_desc = "DL1 was reloaded from the local L2 due to a demand load", }, [ PPC970_PME_PM_FPU0_FPSCR ] = { .pme_name = "PM_FPU0_FPSCR", .pme_code = 0x130, .pme_short_desc = "FPU0 executed FPSCR instruction", .pme_long_desc = "This signal is active for one cycle when fp0 is executing fpscr move related instruction. This could be mtfsfi*, mtfsb0*, mtfsb1*. mffs*, mtfsf*, mcrsf* where XYZ* means XYZ, XYZs, XYZ., XYZs", }, [ PPC970_PME_PM_MRK_DATA_FROM_L25_MOD ] = { .pme_name = "PM_MRK_DATA_FROM_L25_MOD", .pme_code = 0x393d, .pme_short_desc = "Marked data loaded from L2.5 modified", .pme_long_desc = "DL1 was reloaded with modified (M) data from the L2 of a chip on this MCM due to a marked demand load", }, [ PPC970_PME_PM_FPU0_FSQRT ] = { .pme_name = "PM_FPU0_FSQRT", .pme_code = 0x102, .pme_short_desc = "FPU0 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_LD_REF_L1 ] = { .pme_name = "PM_LD_REF_L1", .pme_code = 0x8810, .pme_short_desc = "L1 D cache load references", .pme_long_desc = "Total DL1 Load references", }, [ PPC970_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0x934, .pme_short_desc = "Marked L1 reload data source valid", .pme_long_desc = "The source information is valid and is for a marked load", }, [ PPC970_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x5003, .pme_short_desc = "One or more PPC instruction completed", .pme_long_desc = "A group containing at least one PPC instruction completed. For microcoded instructions that span multiple groups, this will only occur once.", }, [ PPC970_PME_PM_INST_FROM_L1 ] = { .pme_name = "PM_INST_FROM_L1", .pme_code = 0x142d, .pme_short_desc = "Instruction fetched from L1", .pme_long_desc = "An instruction fetch group was fetched from L1. Fetch Groups can contain up to 8 instructions", }, [ PPC970_PME_PM_EE_OFF_EXT_INT ] = { .pme_name = "PM_EE_OFF_EXT_INT", .pme_code = 0x337, .pme_short_desc = "Cycles MSR(EE) bit off and external interrupt pending", .pme_long_desc = "Cycles MSR(EE) bit off and external interrupt pending", }, [ PPC970_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x700a, .pme_short_desc = "PMC6 Overflow", .pme_long_desc = "PMC6 Overflow", }, [ PPC970_PME_PM_LSU_LRQ_FULL_CYC ] = { .pme_name = "PM_LSU_LRQ_FULL_CYC", .pme_code = 0x312, .pme_short_desc = "Cycles LRQ full", .pme_long_desc = "The ISU sends this signal when the LRQ is full.", }, [ PPC970_PME_PM_IC_PREF_INSTALL ] = { .pme_name = "PM_IC_PREF_INSTALL", .pme_code = 0x427, .pme_short_desc = "Instruction prefetched installed in prefetch", .pme_long_desc = "New line coming into the prefetch buffer", }, [ PPC970_PME_PM_DC_PREF_OUT_OF_STREAMS ] = { .pme_name = "PM_DC_PREF_OUT_OF_STREAMS", .pme_code = 0x732, .pme_short_desc = "D cache out of streams", .pme_long_desc = "out of streams", }, [ PPC970_PME_PM_MRK_LSU1_FLUSH_SRQ ] = { .pme_name = "PM_MRK_LSU1_FLUSH_SRQ", .pme_code = 0x717, .pme_short_desc = "LSU1 marked SRQ flushes", .pme_long_desc = "A marked store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", }, [ PPC970_PME_PM_GCT_FULL_CYC ] = { .pme_name = "PM_GCT_FULL_CYC", .pme_code = 0x300, .pme_short_desc = "Cycles GCT full", .pme_long_desc = "The ISU sends a signal indicating the gct is full. ", }, [ PPC970_PME_PM_INST_FROM_MEM ] = { .pme_name = "PM_INST_FROM_MEM", .pme_code = 0x2426, .pme_short_desc = "Instruction fetched from memory", .pme_long_desc = "Instruction fetched from memory", }, [ PPC970_PME_PM_FLUSH_LSU_BR_MPRED ] = { .pme_name = "PM_FLUSH_LSU_BR_MPRED", .pme_code = 0x317, .pme_short_desc = "Flush caused by LSU or branch mispredict", .pme_long_desc = "Flush caused by LSU or branch mispredict", }, [ PPC970_PME_PM_FXU_BUSY ] = { .pme_name = "PM_FXU_BUSY", .pme_code = 0x6002, .pme_short_desc = "FXU busy", .pme_long_desc = "FXU0 and FXU1 are both busy", }, [ PPC970_PME_PM_ST_REF_L1_LSU1 ] = { .pme_name = "PM_ST_REF_L1_LSU1", .pme_code = 0x815, .pme_short_desc = "LSU1 L1 D cache store references", .pme_long_desc = "A store executed on unit 1", }, [ PPC970_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x1720, .pme_short_desc = "Marked L1 D cache load misses", .pme_long_desc = "Marked L1 D cache load misses", }, [ PPC970_PME_PM_L1_WRITE_CYC ] = { .pme_name = "PM_L1_WRITE_CYC", .pme_code = 0x434, .pme_short_desc = "Cycles writing to instruction L1", .pme_long_desc = "This signal is asserted each cycle a cache write is active.", }, [ PPC970_PME_PM_LSU_REJECT_LMQ_FULL ] = { .pme_name = "PM_LSU_REJECT_LMQ_FULL", .pme_code = 0x2920, .pme_short_desc = "LSU reject due to LMQ full or missed data coming", .pme_long_desc = "LSU reject due to LMQ full or missed data coming", }, [ PPC970_PME_PM_FPU_ALL ] = { .pme_name = "PM_FPU_ALL", .pme_code = 0x5100, .pme_short_desc = "FPU executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when FPU is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_LSU_SRQ_S0_ALLOC ] = { .pme_name = "PM_LSU_SRQ_S0_ALLOC", .pme_code = 0x825, .pme_short_desc = "SRQ slot 0 allocated", .pme_long_desc = "SRQ Slot zero was allocated", }, [ PPC970_PME_PM_INST_FROM_L25_SHR ] = { .pme_name = "PM_INST_FROM_L25_SHR", .pme_code = 0x5426, .pme_short_desc = "Instruction fetched from L2.5 shared", .pme_long_desc = "Instruction fetched from L2.5 shared", }, [ PPC970_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x5004, .pme_short_desc = "Group marked in IDU", .pme_long_desc = "A group was sampled (marked)", }, [ PPC970_PME_PM_BR_MPRED_CR ] = { .pme_name = "PM_BR_MPRED_CR", .pme_code = 0x432, .pme_short_desc = "Branch mispredictions due to CR bit setting", .pme_long_desc = "This signal is asserted when the branch execution unit detects a branch mispredict because the CR value is opposite of the predicted value. This signal is asserted after a branch issue event and will result in a branch redirect flush if not overridden by a flush of an older instruction.", }, [ PPC970_PME_PM_DC_PREF_STREAM_ALLOC ] = { .pme_name = "PM_DC_PREF_STREAM_ALLOC", .pme_code = 0x737, .pme_short_desc = "D cache new prefetch stream allocated", .pme_long_desc = "A new Prefetch Stream was allocated", }, [ PPC970_PME_PM_FPU1_FIN ] = { .pme_name = "PM_FPU1_FIN", .pme_code = 0x117, .pme_short_desc = "FPU1 produced a result", .pme_long_desc = "fp1 finished, produced a result. This only indicates finish, not completion. ", }, [ PPC970_PME_PM_LSU_REJECT_SRQ ] = { .pme_name = "PM_LSU_REJECT_SRQ", .pme_code = 0x1920, .pme_short_desc = "LSU SRQ rejects", .pme_long_desc = "LSU SRQ rejects", }, [ PPC970_PME_PM_BR_MPRED_TA ] = { .pme_name = "PM_BR_MPRED_TA", .pme_code = 0x433, .pme_short_desc = "Branch mispredictions due to target address", .pme_long_desc = "branch miss predict due to a target address prediction. This signal will be asserted each time the branch execution unit detects an incorrect target address prediction. This signal will be asserted after a valid branch execution unit issue and will cause a branch mispredict flush unless a flush is detected from an older instruction.", }, [ PPC970_PME_PM_CRQ_FULL_CYC ] = { .pme_name = "PM_CRQ_FULL_CYC", .pme_code = 0x311, .pme_short_desc = "Cycles CR issue queue full", .pme_long_desc = "The ISU sends a signal indicating that the issue queue that feeds the ifu cr unit cannot accept any more group (queue is full of groups).", }, [ PPC970_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x3810, .pme_short_desc = "L1 D cache load misses", .pme_long_desc = "Total DL1 Load references that miss the DL1", }, [ PPC970_PME_PM_INST_FROM_PREF ] = { .pme_name = "PM_INST_FROM_PREF", .pme_code = 0x342d, .pme_short_desc = "Instructions fetched from prefetch", .pme_long_desc = "An instruction fetch group was fetched from the prefetch buffer. Fetch Groups can contain up to 8 instructions", }, [ PPC970_PME_PM_STCX_PASS ] = { .pme_name = "PM_STCX_PASS", .pme_code = 0x725, .pme_short_desc = "Stcx passes", .pme_long_desc = "A stcx (stwcx or stdcx) instruction was successful", }, [ PPC970_PME_PM_DC_INV_L2 ] = { .pme_name = "PM_DC_INV_L2", .pme_code = 0x817, .pme_short_desc = "L1 D cache entries invalidated from L2", .pme_long_desc = "A dcache invalidated was received from the L2 because a line in L2 was castout.", }, [ PPC970_PME_PM_LSU_SRQ_FULL_CYC ] = { .pme_name = "PM_LSU_SRQ_FULL_CYC", .pme_code = 0x313, .pme_short_desc = "Cycles SRQ full", .pme_long_desc = "The ISU sends this signal when the srq is full.", }, [ PPC970_PME_PM_LSU0_FLUSH_LRQ ] = { .pme_name = "PM_LSU0_FLUSH_LRQ", .pme_code = 0x802, .pme_short_desc = "LSU0 LRQ flushes", .pme_long_desc = "A load was flushed by unit 1 because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970_PME_PM_LSU_SRQ_S0_VALID ] = { .pme_name = "PM_LSU_SRQ_S0_VALID", .pme_code = 0x821, .pme_short_desc = "SRQ slot 0 valid", .pme_long_desc = "This signal is asserted every cycle that the Store Request Queue slot zero is valid. The SRQ is 32 entries long and is allocated round-robin.", }, [ PPC970_PME_PM_LARX_LSU0 ] = { .pme_name = "PM_LARX_LSU0", .pme_code = 0x727, .pme_short_desc = "Larx executed on LSU0", .pme_long_desc = "A larx (lwarx or ldarx) was executed on side 0 (there is no coresponding unit 1 event since larx instructions can only execute on unit 0)", }, [ PPC970_PME_PM_GCT_EMPTY_CYC ] = { .pme_name = "PM_GCT_EMPTY_CYC", .pme_code = 0x1004, .pme_short_desc = "Cycles GCT empty", .pme_long_desc = "The Global Completion Table is completely empty", }, [ PPC970_PME_PM_FPU1_ALL ] = { .pme_name = "PM_FPU1_ALL", .pme_code = 0x107, .pme_short_desc = "FPU1 executed add, mult, sub, cmp or sel instruction", .pme_long_desc = "This signal is active for one cycle when fp1 is executing an add, mult, sub, compare, or fsel kind of instruction. This could be fadd*, fmul*, fsub*, fcmp**, fsel where XYZ* means XYZ, XYZs, XYZ., XYZs. and XYZ** means XYZu, XYZo", }, [ PPC970_PME_PM_FPU1_FSQRT ] = { .pme_name = "PM_FPU1_FSQRT", .pme_code = 0x106, .pme_short_desc = "FPU1 executed FSQRT instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp1 is executing a square root instruction. This could be fsqrt* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_FPU_FIN ] = { .pme_name = "PM_FPU_FIN", .pme_code = 0x4110, .pme_short_desc = "FPU produced a result", .pme_long_desc = "FPU finished, produced a result This only indicates finish, not completion. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_LSU_SRQ_STFWD ] = { .pme_name = "PM_LSU_SRQ_STFWD", .pme_code = 0x1820, .pme_short_desc = "SRQ store forwarded", .pme_long_desc = "Data from a store instruction was forwarded to a load", }, [ PPC970_PME_PM_MRK_LD_MISS_L1_LSU1 ] = { .pme_name = "PM_MRK_LD_MISS_L1_LSU1", .pme_code = 0x724, .pme_short_desc = "LSU1 L1 D cache load misses", .pme_long_desc = "A marked load, executing on unit 1, missed the dcache", }, [ PPC970_PME_PM_FXU0_FIN ] = { .pme_name = "PM_FXU0_FIN", .pme_code = 0x332, .pme_short_desc = "FXU0 produced a result", .pme_long_desc = "The Fixed Point unit 0 finished an instruction and produced a result", }, [ PPC970_PME_PM_MRK_FPU_FIN ] = { .pme_name = "PM_MRK_FPU_FIN", .pme_code = 0x7004, .pme_short_desc = "Marked instruction FPU processing finished", .pme_long_desc = "One of the Floating Point Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970_PME_PM_PMC5_OVERFLOW ] = { .pme_name = "PM_PMC5_OVERFLOW", .pme_code = 0x600a, .pme_short_desc = "PMC5 Overflow", .pme_long_desc = "PMC5 Overflow", }, [ PPC970_PME_PM_SNOOP_TLBIE ] = { .pme_name = "PM_SNOOP_TLBIE", .pme_code = 0x703, .pme_short_desc = "Snoop TLBIE", .pme_long_desc = "A TLB miss for a data request occurred. Requests that miss the TLB may be retried until the instruction is in the next to complete group (unless HID4 is set to allow speculative tablewalks). This may result in multiple TLB misses for the same instruction.", }, [ PPC970_PME_PM_FPU1_FRSP_FCONV ] = { .pme_name = "PM_FPU1_FRSP_FCONV", .pme_code = 0x115, .pme_short_desc = "FPU1 executed FRSP or FCONV instructions", .pme_long_desc = "This signal is active for one cycle when fp1 is executing frsp or convert kind of instruction. This could be frsp*, fcfid*, fcti* where XYZ* means XYZ, XYZs, XYZ., XYZs.", }, [ PPC970_PME_PM_FPU0_FDIV ] = { .pme_name = "PM_FPU0_FDIV", .pme_code = 0x100, .pme_short_desc = "FPU0 executed FDIV instruction", .pme_long_desc = "This signal is active for one cycle at the end of the microcode executed when fp0 is executing a divide instruction. This could be fdiv, fdivs, fdiv. fdivs.", }, [ PPC970_PME_PM_LD_REF_L1_LSU1 ] = { .pme_name = "PM_LD_REF_L1_LSU1", .pme_code = 0x814, .pme_short_desc = "LSU1 L1 D cache load references", .pme_long_desc = "A load executed on unit 1", }, [ PPC970_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x3004, .pme_short_desc = "Hypervisor Cycles", .pme_long_desc = "Cycles when the processor is executing in Hypervisor (MSR[HV] = 1 and MSR[PR]=0)", }, [ PPC970_PME_PM_LR_CTR_MAP_FULL_CYC ] = { .pme_name = "PM_LR_CTR_MAP_FULL_CYC", .pme_code = 0x306, .pme_short_desc = "Cycles LR/CTR mapper full", .pme_long_desc = "The ISU sends a signal indicating that the lr/ctr mapper cannot accept any more groups. Dispatch is stopped. Note: this condition indicates that a pool of mapper is full but the entire mapper may not be.", }, [ PPC970_PME_PM_FPU_DENORM ] = { .pme_name = "PM_FPU_DENORM", .pme_code = 0x1120, .pme_short_desc = "FPU received denormalized data", .pme_long_desc = "This signal is active for one cycle when one of the operands is denormalized. Combined Unit 0 + Unit 1", }, [ PPC970_PME_PM_LSU0_REJECT_SRQ ] = { .pme_name = "PM_LSU0_REJECT_SRQ", .pme_code = 0x920, .pme_short_desc = "LSU0 SRQ rejects", .pme_long_desc = "LSU0 SRQ rejects", }, [ PPC970_PME_PM_LSU1_REJECT_SRQ ] = { .pme_name = "PM_LSU1_REJECT_SRQ", .pme_code = 0x924, .pme_short_desc = "LSU1 SRQ rejects", .pme_long_desc = "LSU1 SRQ rejects", }, [ PPC970_PME_PM_LSU1_DERAT_MISS ] = { .pme_name = "PM_LSU1_DERAT_MISS", .pme_code = 0x706, .pme_short_desc = "LSU1 DERAT misses", .pme_long_desc = "A data request (load or store) from LSU Unit 1 missed the ERAT and resulted in an ERAT reload. Multiple instructions may miss the ERAT entry for the same 4K page, but only one reload will occur.", }, [ PPC970_PME_PM_IC_PREF_REQ ] = { .pme_name = "PM_IC_PREF_REQ", .pme_code = 0x426, .pme_short_desc = "Instruction prefetch requests", .pme_long_desc = "Asserted when a non-canceled prefetch is made to the cache interface unit (CIU).", }, [ PPC970_PME_PM_MRK_LSU_FIN ] = { .pme_name = "PM_MRK_LSU_FIN", .pme_code = 0x8004, .pme_short_desc = "Marked instruction LSU processing finished", .pme_long_desc = "One of the Load/Store Units finished a marked instruction. Instructions that finish may not necessary complete", }, [ PPC970_PME_PM_MRK_DATA_FROM_MEM ] = { .pme_name = "PM_MRK_DATA_FROM_MEM", .pme_code = 0x3937, .pme_short_desc = "Marked data loaded from memory", .pme_long_desc = "Marked data loaded from memory", }, [ PPC970_PME_PM_LSU0_FLUSH_UST ] = { .pme_name = "PM_LSU0_FLUSH_UST", .pme_code = 0x801, .pme_short_desc = "LSU0 unaligned store flushes", .pme_long_desc = "A store was flushed from unit 0 because it was unaligned (crossed a 4k boundary)", }, [ PPC970_PME_PM_LSU_FLUSH_LRQ ] = { .pme_name = "PM_LSU_FLUSH_LRQ", .pme_code = 0x6800, .pme_short_desc = "LRQ flushes", .pme_long_desc = "A load was flushed because a younger load executed before an older store executed and they had overlapping data OR two loads executed out of order and they have byte overlap and there was a snoop in between to an overlapped byte.", }, [ PPC970_PME_PM_LSU_FLUSH_SRQ ] = { .pme_name = "PM_LSU_FLUSH_SRQ", .pme_code = 0x5800, .pme_short_desc = "SRQ flushes", .pme_long_desc = "A store was flushed because younger load hits and older store that is already in the SRQ or in the same group.", } }; #endif papi-5.3.0/src/libpfm4/lib/events/arm_1176_events.h0000600003276200002170000000741112247131123021331 0ustar ralphundrgrad/* * Copyright (c) 2013 by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ /* * the various event names are the same as those given in the * file linux-2.6/arch/arm/kernel/perf_event_v6.c */ /* * ARM1176 Event Table */ static const arm_entry_t arm_1176_pe []={ {.name = "ICACHE_MISS", .code = 0x00, .desc = "Instruction cache miss (includes speculative accesses)" }, {.name = "IBUF_STALL", .code = 0x01, .desc = "Stall because instruction buffer cannot deliver an instruction" }, {.name = "DDEP_STALL", .code = 0x02, .desc = "Stall because of data dependency" }, {.name = "ITLB_MISS", .code = 0x03, .desc = "Instruction MicroTLB miss" }, {.name = "DTLB_MISS", .code = 0x04, .desc = "Data MicroTLB miss" }, {.name = "BR_EXEC", .code = 0x05, .desc = "Branch instruction executed" }, {.name = "BR_MISPREDICT", .code = 0x06, .desc = "Branch mispredicted" }, {.name = "INSTR_EXEC", .code = 0x07, .desc = "Instruction executed" }, {.name = "DCACHE_HIT", .code = 0x09, .desc = "Data cache hit" }, {.name = "DCACHE_ACCESS", .code = 0x0a, .desc = "Data cache access" }, {.name = "DCACHE_MISS", .code = 0x0b, .desc = "Data cache miss" }, {.name = "DCACHE_WBACK", .code = 0x0c, .desc = "Data cache writeback" }, {.name = "SW_PC_CHANGE", .code = 0x0d, .desc = "Software changed the PC." }, {.name = "MAIN_TLB_MISS", .code = 0x0f, .desc = "Main TLB miss" }, {.name = "EXPL_D_ACCESS", .code = 0x10, .desc = "Explicit external data cache access " }, {.name = "LSU_FULL_STALL", .code = 0x11, .desc = "Stall because of a full Load Store Unit request queue." }, {.name = "WBUF_DRAINED", .code = 0x12, .desc = "Write buffer drained due to data synchronization barrier or strongly ordered operation" }, {.name = "ETMEXTOUT_0", .code = 0x20, .desc = "ETMEXTOUT[0] was asserted" }, {.name = "ETMEXTOUT_1", .code = 0x21, .desc = "ETMEXTOUT[1] was asserted" }, {.name = "ETMEXTOUT", .code = 0x22, .desc = "Increment once for each of ETMEXTOUT[0] or ETMEXTOUT[1]" }, {.name = "PROC_CALL_EXEC", .code = 0x23, .desc = "Procedure call instruction executed" }, {.name = "PROC_RET_EXEC", .code = 0x24, .desc = "Procedure return instruction executed" }, {.name = "PROC_RET_EXEC_PRED", .code = 0x25, .desc = "Proceudre return instruction executed and address predicted" }, {.name = "PROC_RET_EXEC_PRED_INCORRECT", .code = 0x26, .desc = "Procedure return instruction executed and address predicted incorrectly" }, {.name = "CPU_CYCLES", .code = 0xff, .desc = "CPU cycles" }, }; #define ARM_1176_EVENT_COUNT (sizeof(arm_1176_pe)/sizeof(arm_entry_t)) papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_r2pcie_events.h0000600003276200002170000001301212247131123024501 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snbep_unc_r2pcie (Intel SandyBridge-EP R2PCIe uncore) */ static const intel_x86_umask_t snbep_unc_r2_ring_ad_used[]={ { .uname = "CCW_EVEN", .udesc = "Counter-clockwise and even ring polarity", .ucode = 0x400, }, { .uname = "CCW_ODD", .udesc = "Counter-clockwise and odd ring polarity", .ucode = 0x800, }, { .uname = "CW_EVEN", .udesc = "Clockwise and even ring polarity", .ucode = 0x100, }, { .uname = "CW_ODD", .udesc = "Clockwise and odd ring polarity", .ucode = 0x200, }, { .uname = "CW_ANY", .udesc = "Clockwise with any polarity", .ucode = 0x300, }, { .uname = "CCW_ANY", .udesc = "Counter-clockwise with any polarity", .ucode = 0xc00, }, { .uname = "ANY", .udesc = "any direction and any polarity", .ucode = 0xf00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_r2_ring_iv_used[]={ { .uname = "ANY", .udesc = "R2 IV Ring in Use", .ucode = 0xf00, .uflags = INTEL_X86_NCOMBO | INTEL_X86_DFL, }, }; static const intel_x86_umask_t snbep_unc_r2_rxr_cycles_ne[]={ { .uname = "DRS", .udesc = "DRS Ingress queue", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB", .udesc = "NCB Ingress queue", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "NCS Ingress queue", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_r2_txr_cycles_full[]={ { .uname = "AD", .udesc = "AD Egress queue", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "AK", .udesc = "AK Egress queue", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "BL", .udesc = "BL Egress queue", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_r2_pe[]={ { .name = "UNC_R2_CLOCKTICKS", .desc = "Number of uclks in domain", .code = 0x1, .cntmsk = 0xf, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, }, { .name = "UNC_R2_RING_AD_USED", .desc = "R2 AD Ring in Use", .code = 0x7, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_ring_ad_used), .umasks = snbep_unc_r2_ring_ad_used }, { .name = "UNC_R2_RING_AK_USED", .desc = "R2 AK Ring in Use", .code = 0x8, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_ring_ad_used), .umasks = snbep_unc_r2_ring_ad_used /* shared */ }, { .name = "UNC_R2_RING_BL_USED", .desc = "R2 BL Ring in Use", .code = 0x9, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_ring_ad_used), .umasks = snbep_unc_r2_ring_ad_used /* shared */ }, { .name = "UNC_R2_RING_IV_USED", .desc = "R2 IV Ring in Use", .code = 0xa, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_ring_iv_used), .umasks = snbep_unc_r2_ring_iv_used }, { .name = "UNC_R2_RXR_AK_BOUNCES", .desc = "AK Ingress Bounced", .code = 0x12, .cntmsk = 0x1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, }, { .name = "UNC_R2_RXR_CYCLES_NE", .desc = "Ingress Cycles Not Empty", .code = 0x10, .cntmsk = 0x3, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_rxr_cycles_ne), .umasks = snbep_unc_r2_rxr_cycles_ne }, { .name = "UNC_R2_TXR_CYCLES_FULL", .desc = "Egress Cycles Full", .code = 0x25, .cntmsk = 0x1, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_txr_cycles_full), .umasks = snbep_unc_r2_txr_cycles_full }, { .name = "UNC_R2_TXR_CYCLES_NE", .desc = "Egress Cycles Not Empty", .code = 0x23, .cntmsk = 0x1, .ngrp = 1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_r2_txr_cycles_full), .umasks = snbep_unc_r2_txr_cycles_full /* shared */ }, { .name = "UNC_R2_TXR_INSERTS", .desc = "Egress allocations", .code = 0x24, .cntmsk = 0x1, .modmsk = SNBEP_UNC_R2PCIE_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/events/power8_events.h0000600003276200002170000012346212247131124021326 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __POWER8_EVENTS_H__ #define __POWER8_EVENTS_H__ /* * File: power8_events.h * CVS: Author: Carl Love * carll.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2013. All Rights Reserved. * Contributed by * * Note: This code was automatically generated and should not be modified by * hand. * * Documentation on the PMU events will be published at: * http://www.power.org/documentation */ #define POWER8_PME_PM_1PLUS_PPC_CMPL 0 #define POWER8_PME_PM_1PLUS_PPC_DISP 1 #define POWER8_PME_PM_ANY_THRD_RUN_CYC 2 #define POWER8_PME_PM_BR_MPRED_CMPL 3 #define POWER8_PME_PM_BR_TAKEN_CMPL 4 #define POWER8_PME_PM_CYC 5 #define POWER8_PME_PM_DATA_FROM_L2MISS 6 #define POWER8_PME_PM_DATA_FROM_L3MISS 7 #define POWER8_PME_PM_DATA_FROM_MEM 8 #define POWER8_PME_PM_DTLB_MISS 9 #define POWER8_PME_PM_EXT_INT 10 #define POWER8_PME_PM_FLOP 11 #define POWER8_PME_PM_FLUSH 12 #define POWER8_PME_PM_GCT_NOSLOT_CYC 13 #define POWER8_PME_PM_IERAT_MISS 14 #define POWER8_PME_PM_INST_DISP 15 #define POWER8_PME_PM_INST_FROM_L3MISS 16 #define POWER8_PME_PM_ITLB_MISS 17 #define POWER8_PME_PM_L1_DCACHE_RELOAD_VALID 18 #define POWER8_PME_PM_L1_ICACHE_MISS 19 #define POWER8_PME_PM_LD_MISS_L1 20 #define POWER8_PME_PM_LSU_DERAT_MISS 21 #define POWER8_PME_PM_MRK_BR_MPRED_CMPL 22 #define POWER8_PME_PM_MRK_BR_TAKEN_CMPL 23 #define POWER8_PME_PM_MRK_DATA_FROM_L2MISS 24 #define POWER8_PME_PM_MRK_DATA_FROM_L3MISS 25 #define POWER8_PME_PM_MRK_DATA_FROM_MEM 26 #define POWER8_PME_PM_MRK_DERAT_MISS 27 #define POWER8_PME_PM_MRK_DTLB_MISS 28 #define POWER8_PME_PM_MRK_INST_CMPL 29 #define POWER8_PME_PM_MRK_INST_DISP 30 #define POWER8_PME_PM_MRK_INST_FROM_L3MISS 31 #define POWER8_PME_PM_MRK_L1_ICACHE_MISS 32 #define POWER8_PME_PM_MRK_L1_RELOAD_VALID 33 #define POWER8_PME_PM_MRK_LD_MISS_L1 34 #define POWER8_PME_PM_MRK_ST_CMPL 35 #define POWER8_PME_PM_RUN_CYC 36 #define POWER8_PME_PM_RUN_INST_CMPL 37 #define POWER8_PME_PM_RUN_PURR 38 #define POWER8_PME_PM_ST_FIN 39 #define POWER8_PME_PM_ST_MISS_L1 40 #define POWER8_PME_PM_TB_BIT_TRANS 41 #define POWER8_PME_PM_THRD_CONC_RUN_INST 42 #define POWER8_PME_PM_THRESH_EXC_1024 43 #define POWER8_PME_PM_THRESH_EXC_128 44 #define POWER8_PME_PM_THRESH_EXC_2048 45 #define POWER8_PME_PM_THRESH_EXC_256 46 #define POWER8_PME_PM_THRESH_EXC_32 47 #define POWER8_PME_PM_THRESH_EXC_4096 48 #define POWER8_PME_PM_THRESH_EXC_512 49 #define POWER8_PME_PM_THRESH_EXC_64 50 #define POWER8_PME_PM_THRESH_MET 51 #define POWER8_PME_PM_BR_2PATH 52 #define POWER8_PME_PM_BR_CMPL 53 #define POWER8_PME_PM_BR_MRK_2PATH 54 #define POWER8_PME_PM_CMPLU_STALL 55 #define POWER8_PME_PM_CMPLU_STALL_BRU 56 #define POWER8_PME_PM_CMPLU_STALL_BRU_CRU 57 #define POWER8_PME_PM_CMPLU_STALL_COQ_FULL 58 #define POWER8_PME_PM_CMPLU_STALL_DCACHE_MISS 59 #define POWER8_PME_PM_CMPLU_STALL_DMISS_L21_L31 60 #define POWER8_PME_PM_CMPLU_STALL_DMISS_L2L3 61 #define POWER8_PME_PM_CMPLU_STALL_DMISS_L2L3_CONFLICT 62 #define POWER8_PME_PM_CMPLU_STALL_DMISS_L3MISS 63 #define POWER8_PME_PM_CMPLU_STALL_DMISS_LMEM 64 #define POWER8_PME_PM_CMPLU_STALL_DMISS_REMOTE 65 #define POWER8_PME_PM_CMPLU_STALL_ERAT_MISS 66 #define POWER8_PME_PM_CMPLU_STALL_FLUSH 67 #define POWER8_PME_PM_CMPLU_STALL_FXLONG 68 #define POWER8_PME_PM_CMPLU_STALL_FXU 69 #define POWER8_PME_PM_CMPLU_STALL_HWSYNC 70 #define POWER8_PME_PM_CMPLU_STALL_LOAD_FINISH 71 #define POWER8_PME_PM_CMPLU_STALL_LSU 72 #define POWER8_PME_PM_CMPLU_STALL_LWSYNC 73 #define POWER8_PME_PM_CMPLU_STALL_MEM_ECC_DELAY 74 #define POWER8_PME_PM_CMPLU_STALL_NTCG_FLUSH 75 #define POWER8_PME_PM_CMPLU_STALL_OTHER_CMPL 76 #define POWER8_PME_PM_CMPLU_STALL_REJECT 77 #define POWER8_PME_PM_CMPLU_STALL_REJECT_LHS 78 #define POWER8_PME_PM_CMPLU_STALL_REJ_LMQ_FULL 79 #define POWER8_PME_PM_CMPLU_STALL_SCALAR 80 #define POWER8_PME_PM_CMPLU_STALL_SCALAR_LONG 81 #define POWER8_PME_PM_CMPLU_STALL_STORE 82 #define POWER8_PME_PM_CMPLU_STALL_ST_FWD 83 #define POWER8_PME_PM_CMPLU_STALL_THRD 84 #define POWER8_PME_PM_CMPLU_STALL_VECTOR 85 #define POWER8_PME_PM_CMPLU_STALL_VECTOR_LONG 86 #define POWER8_PME_PM_CMPLU_STALL_VSU 87 #define POWER8_PME_PM_DATA_FROM_L2 88 #define POWER8_PME_PM_DATA_FROM_L2_NO_CONFLICT 89 #define POWER8_PME_PM_DATA_FROM_L3 90 #define POWER8_PME_PM_DATA_FROM_L3MISS_MOD 91 #define POWER8_PME_PM_DATA_FROM_L3_NO_CONFLICT 92 #define POWER8_PME_PM_DATA_FROM_LMEM 93 #define POWER8_PME_PM_DATA_FROM_MEMORY 94 #define POWER8_PME_PM_DC_PREF_STREAM_STRIDED_CONF 95 #define POWER8_PME_PM_GCT_NOSLOT_BR_MPRED 96 #define POWER8_PME_PM_GCT_NOSLOT_BR_MPRED_ICMISS 97 #define POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_ISSQ 98 #define POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_OTHER 99 #define POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_SRQ 100 #define POWER8_PME_PM_GCT_NOSLOT_IC_L3MISS 101 #define POWER8_PME_PM_GCT_NOSLOT_IC_MISS 102 #define POWER8_PME_PM_GRP_DISP 103 #define POWER8_PME_PM_GRP_MRK 104 #define POWER8_PME_PM_HV_CYC 105 #define POWER8_PME_PM_INST_CMPL 106 #define POWER8_PME_PM_IOPS_CMPL 107 #define POWER8_PME_PM_LD_CMPL 108 #define POWER8_PME_PM_LD_L3MISS_PEND_CYC 109 #define POWER8_PME_PM_MRK_DATA_FROM_L2 110 #define POWER8_PME_PM_MRK_DATA_FROM_L2MISS_CYC 111 #define POWER8_PME_PM_MRK_DATA_FROM_L2_CYC 112 #define POWER8_PME_PM_MRK_DATA_FROM_L2_NO_CONFLICT 113 #define POWER8_PME_PM_MRK_DATA_FROM_L2_NO_CONFLICT_CYC 114 #define POWER8_PME_PM_MRK_DATA_FROM_L3 115 #define POWER8_PME_PM_MRK_DATA_FROM_L3MISS_CYC 116 #define POWER8_PME_PM_MRK_DATA_FROM_L3_CYC 117 #define POWER8_PME_PM_MRK_DATA_FROM_L3_NO_CONFLICT 118 #define POWER8_PME_PM_MRK_DATA_FROM_L3_NO_CONFLICT_CYC 119 #define POWER8_PME_PM_MRK_DATA_FROM_LL4 120 #define POWER8_PME_PM_MRK_DATA_FROM_LL4_CYC 121 #define POWER8_PME_PM_MRK_DATA_FROM_LMEM 122 #define POWER8_PME_PM_MRK_DATA_FROM_LMEM_CYC 123 #define POWER8_PME_PM_MRK_DATA_FROM_MEMORY 124 #define POWER8_PME_PM_MRK_DATA_FROM_MEMORY_CYC 125 #define POWER8_PME_PM_MRK_GRP_CMPL 126 #define POWER8_PME_PM_MRK_INST_DECODED 127 #define POWER8_PME_PM_MRK_L2_RC_DISP 128 #define POWER8_PME_PM_MRK_LD_MISS_L1_CYC 129 #define POWER8_PME_PM_MRK_STALL_CMPLU_CYC 130 #define POWER8_PME_PM_NEST_REF_CLK 131 #define POWER8_PME_PM_PMC1_OVERFLOW 132 #define POWER8_PME_PM_PMC2_OVERFLOW 133 #define POWER8_PME_PM_PMC3_OVERFLOW 134 #define POWER8_PME_PM_PMC4_OVERFLOW 135 #define POWER8_PME_PM_PMC6_OVERFLOW 136 #define POWER8_PME_PM_PPC_CMPL 137 #define POWER8_PME_PM_THRD_ALL_RUN_CYC 138 #define POWER8_PME_PM_THRESH_NOT_MET 139 static const pme_power_entry_t power8_pe[] = { [ POWER8_PME_PM_1PLUS_PPC_CMPL ] = { .pme_name = "PM_1PLUS_PPC_CMPL", .pme_code = 0x100f2, .pme_short_desc = "one or more ppc instructions completed", .pme_long_desc = "one or more ppc instructions finished", }, [ POWER8_PME_PM_1PLUS_PPC_DISP ] = { .pme_name = "PM_1PLUS_PPC_DISP", .pme_code = 0x400f2, .pme_short_desc = "Cycles at least one Instr Dispatched", .pme_long_desc = "Cycles at least one Instr Dispatched", }, [ POWER8_PME_PM_ANY_THRD_RUN_CYC ] = { .pme_name = "PM_ANY_THRD_RUN_CYC", .pme_code = 0x100fa, .pme_short_desc = "Any thread in run_cycles (was one thread in run_cycles)", .pme_long_desc = "One of threads in run_cycles", }, [ POWER8_PME_PM_BR_MPRED_CMPL ] = { .pme_name = "PM_BR_MPRED_CMPL", .pme_code = 0x400f6, .pme_short_desc = "Number of Branch Mispredicts", .pme_long_desc = "Number of Branch Mispredicts", }, [ POWER8_PME_PM_BR_TAKEN_CMPL ] = { .pme_name = "PM_BR_TAKEN_CMPL", .pme_code = 0x200fa, .pme_short_desc = "Branch Taken", .pme_long_desc = "New event for Branch Taken", }, [ POWER8_PME_PM_CYC ] = { .pme_name = "PM_CYC", .pme_code = 0x100f0, .pme_short_desc = "Cycles", .pme_long_desc = "Cycles", }, [ POWER8_PME_PM_DATA_FROM_L2MISS ] = { .pme_name = "PM_DATA_FROM_L2MISS", .pme_code = 0x200fe, .pme_short_desc = "Demand LD - L2 Miss (not L2 hit)", .pme_long_desc = "Demand LD - L2 Miss (not L2 hit)", }, [ POWER8_PME_PM_DATA_FROM_L3MISS ] = { .pme_name = "PM_DATA_FROM_L3MISS", .pme_code = 0x300fe, .pme_short_desc = "Demand LD - L3 Miss (not L2 hit and not L3 hit)", .pme_long_desc = "Demand LD - L3 Miss (not L2 hit and not L3 hit)", }, [ POWER8_PME_PM_DATA_FROM_MEM ] = { .pme_name = "PM_DATA_FROM_MEM", .pme_code = 0x400fe, .pme_short_desc = "Data cache reload from memory (including L4)", .pme_long_desc = "data from Memory", }, [ POWER8_PME_PM_DTLB_MISS ] = { .pme_name = "PM_DTLB_MISS", .pme_code = 0x300fc, .pme_short_desc = "Data PTEG Reloaded (DTLB Miss)", .pme_long_desc = "Data PTEG reload", }, [ POWER8_PME_PM_EXT_INT ] = { .pme_name = "PM_EXT_INT", .pme_code = 0x200f8, .pme_short_desc = "external interrupt", .pme_long_desc = "external interrupt", }, [ POWER8_PME_PM_FLOP ] = { .pme_name = "PM_FLOP", .pme_code = 0x100f4, .pme_short_desc = "Floating Point Operations Finished", .pme_long_desc = "Floating Point Operations Finished", }, [ POWER8_PME_PM_FLUSH ] = { .pme_name = "PM_FLUSH", .pme_code = 0x400f8, .pme_short_desc = "Flush (any type)", .pme_long_desc = "Flush (any type)", }, [ POWER8_PME_PM_GCT_NOSLOT_CYC ] = { .pme_name = "PM_GCT_NOSLOT_CYC", .pme_code = 0x100f8, .pme_short_desc = "Pipeline empty (No itags assigned , no GCT slots used)", .pme_long_desc = "No itags assigned", }, [ POWER8_PME_PM_IERAT_MISS ] = { .pme_name = "PM_IERAT_MISS", .pme_code = 0x100f6, .pme_short_desc = "IERAT Reloaded (Miss)", .pme_long_desc = "Cycles Instruction ERAT was reloaded", }, [ POWER8_PME_PM_INST_DISP ] = { .pme_name = "PM_INST_DISP", .pme_code = 0x200f2, .pme_short_desc = "Number of PPC Dispatched", .pme_long_desc = "Number of PPC Dispatched", }, [ POWER8_PME_PM_INST_FROM_L3MISS ] = { .pme_name = "PM_INST_FROM_L3MISS", .pme_code = 0x300fa, .pme_short_desc = "Inst from L3 miss", .pme_long_desc = "A Instruction cacheline request resolved from a location that was beyond the local L3 cache", }, [ POWER8_PME_PM_ITLB_MISS ] = { .pme_name = "PM_ITLB_MISS", .pme_code = 0x400fc, .pme_short_desc = "ITLB Reloaded", .pme_long_desc = "ITLB Reloaded (always zero on POWER6)", }, [ POWER8_PME_PM_L1_DCACHE_RELOAD_VALID ] = { .pme_name = "PM_L1_DCACHE_RELOAD_VALID", .pme_code = 0x300f6, .pme_short_desc = "DL1 reloaded due to Demand Load", .pme_long_desc = "DL1 reloaded due to Demand Load", }, [ POWER8_PME_PM_L1_ICACHE_MISS ] = { .pme_name = "PM_L1_ICACHE_MISS", .pme_code = 0x200fc, .pme_short_desc = "Demand iCache Miss", .pme_long_desc = "Demand iCache Miss", }, [ POWER8_PME_PM_LD_MISS_L1 ] = { .pme_name = "PM_LD_MISS_L1", .pme_code = 0x400f0, .pme_short_desc = "Load Missed L1", .pme_long_desc = "Load Missed L1", }, [ POWER8_PME_PM_LSU_DERAT_MISS ] = { .pme_name = "PM_LSU_DERAT_MISS", .pme_code = 0x200f6, .pme_short_desc = "DERAT Reloaded (Miss)", .pme_long_desc = "DERAT Reloaded due to a DERAT miss", }, [ POWER8_PME_PM_MRK_BR_MPRED_CMPL ] = { .pme_name = "PM_MRK_BR_MPRED_CMPL", .pme_code = 0x300e4, .pme_short_desc = "Marked Branch Mispredicted", .pme_long_desc = "Marked Branch Mispredicted", }, [ POWER8_PME_PM_MRK_BR_TAKEN_CMPL ] = { .pme_name = "PM_MRK_BR_TAKEN_CMPL", .pme_code = 0x100e2, .pme_short_desc = "Marked Branch Taken", .pme_long_desc = "Marked Branch Taken completed", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS", .pme_code = 0x400e8, .pme_short_desc = "Data cache reload L2 miss", .pme_long_desc = "sampled load resolved beyond L2", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3MISS ] = { .pme_name = "PM_MRK_DATA_FROM_L3MISS", .pme_code = 0x200e4, .pme_short_desc = "The processor's data cache was reloaded from a location other than the local core's L3 due to a marked load", .pme_long_desc = "sampled load resolved beyond L3", }, [ POWER8_PME_PM_MRK_DATA_FROM_MEM ] = { .pme_name = "PM_MRK_DATA_FROM_MEM", .pme_code = 0x200e0, .pme_short_desc = "The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a marked load", .pme_long_desc = "sampled load resolved from memory", }, [ POWER8_PME_PM_MRK_DERAT_MISS ] = { .pme_name = "PM_MRK_DERAT_MISS", .pme_code = 0x300e6, .pme_short_desc = "Erat Miss (TLB Access) All page sizes", .pme_long_desc = "Erat Miss (TLB Access) All page sizes", }, [ POWER8_PME_PM_MRK_DTLB_MISS ] = { .pme_name = "PM_MRK_DTLB_MISS", .pme_code = 0x400e4, .pme_short_desc = "Marked dtlb miss", .pme_long_desc = "sampled Instruction dtlb miss", }, [ POWER8_PME_PM_MRK_INST_CMPL ] = { .pme_name = "PM_MRK_INST_CMPL", .pme_code = 0x400e0, .pme_short_desc = "marked instruction completed", .pme_long_desc = "Marked group complete", }, [ POWER8_PME_PM_MRK_INST_DISP ] = { .pme_name = "PM_MRK_INST_DISP", .pme_code = 0x100e0, .pme_short_desc = "Marked Instruction dispatched", .pme_long_desc = "The thread has dispatched a randomly sampled marked instruction", }, [ POWER8_PME_PM_MRK_INST_FROM_L3MISS ] = { .pme_name = "PM_MRK_INST_FROM_L3MISS", .pme_code = 0x400e6, .pme_short_desc = "sampled instruction missed icache and came from beyond L3 A Instruction cacheline request for a marked/sampled instruction resolved from a location that was beyond the local L3 cache", .pme_long_desc = "sampled instruction missed icache and came from beyond L3 A Instruction cacheline request for a marked/sampled instruction resolved from a location that was beyond the local L3 cache", }, [ POWER8_PME_PM_MRK_L1_ICACHE_MISS ] = { .pme_name = "PM_MRK_L1_ICACHE_MISS", .pme_code = 0x100e4, .pme_short_desc = "Marked L1 Icache Miss", .pme_long_desc = "sampled Instruction suffered an icache Miss", }, [ POWER8_PME_PM_MRK_L1_RELOAD_VALID ] = { .pme_name = "PM_MRK_L1_RELOAD_VALID", .pme_code = 0x100ea, .pme_short_desc = "Marked demand reload", .pme_long_desc = "Sampled Instruction had a data reload", }, [ POWER8_PME_PM_MRK_LD_MISS_L1 ] = { .pme_name = "PM_MRK_LD_MISS_L1", .pme_code = 0x200e2, .pme_short_desc = "Marked DL1 Demand Miss counted at exec time", .pme_long_desc = "Marked DL1 Demand Miss", }, [ POWER8_PME_PM_MRK_ST_CMPL ] = { .pme_name = "PM_MRK_ST_CMPL", .pme_code = 0x300e2, .pme_short_desc = "Marked store completed", .pme_long_desc = "marked store completed and sent to nest", }, [ POWER8_PME_PM_RUN_CYC ] = { .pme_name = "PM_RUN_CYC", .pme_code = 0x600f4, .pme_short_desc = "Run_cycles", .pme_long_desc = "Run_cycles", }, [ POWER8_PME_PM_RUN_INST_CMPL ] = { .pme_name = "PM_RUN_INST_CMPL", .pme_code = 0x500fa, .pme_short_desc = "Run_Instructions", .pme_long_desc = "Run_Instructions", }, [ POWER8_PME_PM_RUN_PURR ] = { .pme_name = "PM_RUN_PURR", .pme_code = 0x400f4, .pme_short_desc = "Run_PURR", .pme_long_desc = "Run_PURR", }, [ POWER8_PME_PM_ST_FIN ] = { .pme_name = "PM_ST_FIN", .pme_code = 0x200f0, .pme_short_desc = "Store Instructions Finished (store sent to nest)", .pme_long_desc = "Store Instructions Finished", }, [ POWER8_PME_PM_ST_MISS_L1 ] = { .pme_name = "PM_ST_MISS_L1", .pme_code = 0x300f0, .pme_short_desc = "Store Missed L1", .pme_long_desc = "Store Missed L1", }, [ POWER8_PME_PM_TB_BIT_TRANS ] = { .pme_name = "PM_TB_BIT_TRANS", .pme_code = 0x300f8, .pme_short_desc = "timebase event", .pme_long_desc = "timebase event", }, [ POWER8_PME_PM_THRD_CONC_RUN_INST ] = { .pme_name = "PM_THRD_CONC_RUN_INST", .pme_code = 0x300f4, .pme_short_desc = "Concurrent Run Instructions", .pme_long_desc = "PPC Instructions Finished when both threads in run_cycles", }, [ POWER8_PME_PM_THRESH_EXC_1024 ] = { .pme_name = "PM_THRESH_EXC_1024", .pme_code = 0x300ea, .pme_short_desc = "Threshold counter exceeded a value of 1024 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 1024", .pme_long_desc = "Threshold counter exceeded a value of 1024 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 1024", }, [ POWER8_PME_PM_THRESH_EXC_128 ] = { .pme_name = "PM_THRESH_EXC_128", .pme_code = 0x400ea, .pme_short_desc = "Threshold counter exceeded a value of 128", .pme_long_desc = "Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 128", }, [ POWER8_PME_PM_THRESH_EXC_2048 ] = { .pme_name = "PM_THRESH_EXC_2048", .pme_code = 0x400ec, .pme_short_desc = "Threshold counter exceeded a value of 2048", .pme_long_desc = "Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 2048", }, [ POWER8_PME_PM_THRESH_EXC_256 ] = { .pme_name = "PM_THRESH_EXC_256", .pme_code = 0x100e8, .pme_short_desc = "Threshold counter exceed a count of 256", .pme_long_desc = "Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 256", }, [ POWER8_PME_PM_THRESH_EXC_32 ] = { .pme_name = "PM_THRESH_EXC_32", .pme_code = 0x200e6, .pme_short_desc = "Threshold counter exceeded a value of 32", .pme_long_desc = "Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 32", }, [ POWER8_PME_PM_THRESH_EXC_4096 ] = { .pme_name = "PM_THRESH_EXC_4096", .pme_code = 0x100e6, .pme_short_desc = "Threshold counter exceed a count of 4096", .pme_long_desc = "Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 4096", }, [ POWER8_PME_PM_THRESH_EXC_512 ] = { .pme_name = "PM_THRESH_EXC_512", .pme_code = 0x200e8, .pme_short_desc = "Threshold counter exceeded a value of 512 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 512", .pme_long_desc = "Threshold counter exceeded a value of 512 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 512", }, [ POWER8_PME_PM_THRESH_EXC_64 ] = { .pme_name = "PM_THRESH_EXC_64", .pme_code = 0x300e8, .pme_short_desc = "Threshold counter exceeded a value of 64 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 64", .pme_long_desc = "Threshold counter exceeded a value of 64 Architecture provides a thresholding counter in MMCRA, it has a start and stop events to configure and a programmable threshold, this event increments when the threshold exceeded a count of 64", }, [ POWER8_PME_PM_THRESH_MET ] = { .pme_name = "PM_THRESH_MET", .pme_code = 0x100ec, .pme_short_desc = "threshold exceeded", .pme_long_desc = "Threshold exceeded", }, [ POWER8_PME_PM_BR_2PATH ] = { .pme_name = "PM_BR_2PATH", .pme_code = 0x40036, .pme_short_desc = "two path branch", .pme_long_desc = "two path branch.", }, [ POWER8_PME_PM_BR_CMPL ] = { .pme_name = "PM_BR_CMPL", .pme_code = 0x40060, .pme_short_desc = "Branch Instruction completed", .pme_long_desc = "Branch Instruction completed.", }, [ POWER8_PME_PM_BR_MRK_2PATH ] = { .pme_name = "PM_BR_MRK_2PATH", .pme_code = 0x40138, .pme_short_desc = "marked two path branch", .pme_long_desc = "marked two path branch.", }, [ POWER8_PME_PM_CMPLU_STALL ] = { .pme_name = "PM_CMPLU_STALL", .pme_code = 0x1e054, .pme_short_desc = "Completion stall", .pme_long_desc = "Completion stall.", }, [ POWER8_PME_PM_CMPLU_STALL_BRU ] = { .pme_name = "PM_CMPLU_STALL_BRU", .pme_code = 0x4d018, .pme_short_desc = "Completion stall due to a Branch Unit", .pme_long_desc = "Completion stall due to a Branch Unit.", }, [ POWER8_PME_PM_CMPLU_STALL_BRU_CRU ] = { .pme_name = "PM_CMPLU_STALL_BRU_CRU", .pme_code = 0x2d018, .pme_short_desc = "Completion stall due to IFU", .pme_long_desc = "Completion stall due to IFU.", }, [ POWER8_PME_PM_CMPLU_STALL_COQ_FULL ] = { .pme_name = "PM_CMPLU_STALL_COQ_FULL", .pme_code = 0x30026, .pme_short_desc = "Completion stall due to CO q full", .pme_long_desc = "Completion stall due to CO q full.", }, [ POWER8_PME_PM_CMPLU_STALL_DCACHE_MISS ] = { .pme_name = "PM_CMPLU_STALL_DCACHE_MISS", .pme_code = 0x2c012, .pme_short_desc = "Completion stall by Dcache miss", .pme_long_desc = "Completion stall by Dcache miss.", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_L21_L31 ] = { .pme_name = "PM_CMPLU_STALL_DMISS_L21_L31", .pme_code = 0x2c018, .pme_short_desc = "Completion stall by Dcache miss which resolved on chip ( excluding local L2/L3)", .pme_long_desc = "Completion stall by Dcache miss which resolved on chip ( excluding local L2/L3).", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_L2L3 ] = { .pme_name = "PM_CMPLU_STALL_DMISS_L2L3", .pme_code = 0x2c016, .pme_short_desc = "Completion stall by Dcache miss which resolved in L2/L3", .pme_long_desc = "Completion stall by Dcache miss which resolved in L2/L3.", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_L2L3_CONFLICT ] = { .pme_name = "PM_CMPLU_STALL_DMISS_L2L3_CONFLICT", .pme_code = 0x4c016, .pme_short_desc = "Completion stall due to cache miss due to L2 l3 conflict", .pme_long_desc = "Completion stall due to cache miss resolving in core's L2/L3 with a conflict.", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_L3MISS ] = { .pme_name = "PM_CMPLU_STALL_DMISS_L3MISS", .pme_code = 0x4c01a, .pme_short_desc = "Completion stall due to cache miss resolving missed the L3", .pme_long_desc = "Completion stall due to cache miss resolving missed the L3.", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_LMEM ] = { .pme_name = "PM_CMPLU_STALL_DMISS_LMEM", .pme_code = 0x4c018, .pme_short_desc = "GCT empty by branch mispredict + IC miss", .pme_long_desc = "Completion stall due to cache miss resolving in core's Local Memory.", }, [ POWER8_PME_PM_CMPLU_STALL_DMISS_REMOTE ] = { .pme_name = "PM_CMPLU_STALL_DMISS_REMOTE", .pme_code = 0x2c01c, .pme_short_desc = "Completion stall by Dcache miss which resolved from remote chip (cache or memory)", .pme_long_desc = "Completion stall by Dcache miss which resolved on chip ( excluding local L2/L3).", }, [ POWER8_PME_PM_CMPLU_STALL_ERAT_MISS ] = { .pme_name = "PM_CMPLU_STALL_ERAT_MISS", .pme_code = 0x4c012, .pme_short_desc = "Completion stall due to LSU reject ERAT miss", .pme_long_desc = "Completion stall due to LSU reject ERAT miss.", }, [ POWER8_PME_PM_CMPLU_STALL_FLUSH ] = { .pme_name = "PM_CMPLU_STALL_FLUSH", .pme_code = 0x30038, .pme_short_desc = "completion stall due to flush by own thread", .pme_long_desc = "completion stall due to flush by own thread.", }, [ POWER8_PME_PM_CMPLU_STALL_FXLONG ] = { .pme_name = "PM_CMPLU_STALL_FXLONG", .pme_code = 0x4d016, .pme_short_desc = "Completion stall due to a long latency fixed point instruction", .pme_long_desc = "Completion stall due to a long latency fixed point instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_FXU ] = { .pme_name = "PM_CMPLU_STALL_FXU", .pme_code = 0x2d016, .pme_short_desc = "Completion stall due to FXU", .pme_long_desc = "Completion stall due to FXU.", }, [ POWER8_PME_PM_CMPLU_STALL_HWSYNC ] = { .pme_name = "PM_CMPLU_STALL_HWSYNC", .pme_code = 0x30036, .pme_short_desc = "completion stall due to hwsync", .pme_long_desc = "completion stall due to hwsync.", }, [ POWER8_PME_PM_CMPLU_STALL_LOAD_FINISH ] = { .pme_name = "PM_CMPLU_STALL_LOAD_FINISH", .pme_code = 0x4d014, .pme_short_desc = "Completion stall due to a Load finish", .pme_long_desc = "Completion stall due to a Load finish.", }, [ POWER8_PME_PM_CMPLU_STALL_LSU ] = { .pme_name = "PM_CMPLU_STALL_LSU", .pme_code = 0x2c010, .pme_short_desc = "Completion stall by LSU instruction", .pme_long_desc = "Completion stall by LSU instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_LWSYNC ] = { .pme_name = "PM_CMPLU_STALL_LWSYNC", .pme_code = 0x10036, .pme_short_desc = "completion stall due to isync/lwsync", .pme_long_desc = "completion stall due to isync/lwsync.", }, [ POWER8_PME_PM_CMPLU_STALL_MEM_ECC_DELAY ] = { .pme_name = "PM_CMPLU_STALL_MEM_ECC_DELAY", .pme_code = 0x30028, .pme_short_desc = "Completion stall due to mem ECC delay", .pme_long_desc = "Completion stall due to mem ECC delay.", }, [ POWER8_PME_PM_CMPLU_STALL_NTCG_FLUSH ] = { .pme_name = "PM_CMPLU_STALL_NTCG_FLUSH", .pme_code = 0x2e01e, .pme_short_desc = "Completion stall due to ntcg flush", .pme_long_desc = "Completion stall due to reject (load hit store).", }, [ POWER8_PME_PM_CMPLU_STALL_OTHER_CMPL ] = { .pme_name = "PM_CMPLU_STALL_OTHER_CMPL", .pme_code = 0x30006, .pme_short_desc = "Instructions core completed while this thread was stalled.", .pme_long_desc = "Instructions core completed while this thread was stalled.", }, [ POWER8_PME_PM_CMPLU_STALL_REJECT ] = { .pme_name = "PM_CMPLU_STALL_REJECT", .pme_code = 0x4c010, .pme_short_desc = "Completion stall due to LSU reject", .pme_long_desc = "Completion stall due to LSU reject.", }, [ POWER8_PME_PM_CMPLU_STALL_REJECT_LHS ] = { .pme_name = "PM_CMPLU_STALL_REJECT_LHS", .pme_code = 0x2c01a, .pme_short_desc = "Completion stall due to reject (load hit store)", .pme_long_desc = "Completion stall due to reject (load hit store).", }, [ POWER8_PME_PM_CMPLU_STALL_REJ_LMQ_FULL ] = { .pme_name = "PM_CMPLU_STALL_REJ_LMQ_FULL", .pme_code = 0x4c014, .pme_short_desc = "Completion stall due to LSU reject LMQ full", .pme_long_desc = "Completion stall due to LSU reject LMQ full.", }, [ POWER8_PME_PM_CMPLU_STALL_SCALAR ] = { .pme_name = "PM_CMPLU_STALL_SCALAR", .pme_code = 0x4d010, .pme_short_desc = "Completion stall due to VSU scalar instruction", .pme_long_desc = "Completion stall due to VSU scalar instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_SCALAR_LONG ] = { .pme_name = "PM_CMPLU_STALL_SCALAR_LONG", .pme_code = 0x2d010, .pme_short_desc = "Completion stall due to VSU scalar long latency instruction", .pme_long_desc = "Completion stall due to VSU scalar long latency instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_STORE ] = { .pme_name = "PM_CMPLU_STALL_STORE", .pme_code = 0x2c014, .pme_short_desc = "Completion stall by stores this includes store agent finishes in pipe LS0/LS1 and store data finishes in LS2/LS3", .pme_long_desc = "Completion stall by stores.", }, [ POWER8_PME_PM_CMPLU_STALL_ST_FWD ] = { .pme_name = "PM_CMPLU_STALL_ST_FWD", .pme_code = 0x4c01c, .pme_short_desc = "Completion stall due to store forward", .pme_long_desc = "Completion stall due to store forward.", }, [ POWER8_PME_PM_CMPLU_STALL_THRD ] = { .pme_name = "PM_CMPLU_STALL_THRD", .pme_code = 0x1001c, .pme_short_desc = "Completion Stalled due to thread conflict. Group ready to complete but it was another thread's turn", .pme_long_desc = "Completion stall due to thread conflict.", }, [ POWER8_PME_PM_CMPLU_STALL_VECTOR ] = { .pme_name = "PM_CMPLU_STALL_VECTOR", .pme_code = 0x2d014, .pme_short_desc = "Completion stall due to VSU vector instruction", .pme_long_desc = "Completion stall due to VSU vector instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_VECTOR_LONG ] = { .pme_name = "PM_CMPLU_STALL_VECTOR_LONG", .pme_code = 0x4d012, .pme_short_desc = "Completion stall due to VSU vector long instruction", .pme_long_desc = "Completion stall due to VSU vector long instruction.", }, [ POWER8_PME_PM_CMPLU_STALL_VSU ] = { .pme_name = "PM_CMPLU_STALL_VSU", .pme_code = 0x2d012, .pme_short_desc = "Completion stall due to VSU instruction", .pme_long_desc = "Completion stall due to VSU instruction.", }, [ POWER8_PME_PM_DATA_FROM_L2 ] = { .pme_name = "PM_DATA_FROM_L2", .pme_code = 0x1c042, .pme_short_desc = "The processor's data cache was reloaded from local core's L2 due to a demand load or demand load plus prefetch controlled by MMCR1[16]", .pme_long_desc = "The processor's data cache was reloaded from local core's L2 due to a demand load or demand load plus prefetch controlled by MMCR1[20].", }, [ POWER8_PME_PM_DATA_FROM_L2_NO_CONFLICT ] = { .pme_name = "PM_DATA_FROM_L2_NO_CONFLICT", .pme_code = 0x1c040, .pme_short_desc = "The processor's data cache was reloaded from local core's L2 without conflict due to a demand load or demand load plus prefetch controlled by MMCR1[16]", .pme_long_desc = "The processor's data cache was reloaded from local core's L2 without conflict due to a demand load or demand load plus prefetch controlled by MMCR1[20] .", }, [ POWER8_PME_PM_DATA_FROM_L3 ] = { .pme_name = "PM_DATA_FROM_L3", .pme_code = 0x4c042, .pme_short_desc = "The processor's data cache was reloaded from local core's L3 due to a demand load", .pme_long_desc = "The processor's data cache was reloaded from local core's L3 due to a demand load.", }, [ POWER8_PME_PM_DATA_FROM_L3MISS_MOD ] = { .pme_name = "PM_DATA_FROM_L3MISS_MOD", .pme_code = 0x4c04e, .pme_short_desc = "The processor's data cache was reloaded from a location other than the local core's L3 due to a demand load", .pme_long_desc = "The processor's data cache was reloaded from a location other than the local core's L3 due to a demand load.", }, [ POWER8_PME_PM_DATA_FROM_L3_NO_CONFLICT ] = { .pme_name = "PM_DATA_FROM_L3_NO_CONFLICT", .pme_code = 0x1c044, .pme_short_desc = "The processor's data cache was reloaded from local core's L3 without conflict due to a demand load or demand load plus prefetch controlled by MMCR1[16]", .pme_long_desc = "The processor's data cache was reloaded from local core's L3 without conflict due to a demand load or demand load plus prefetch controlled by MMCR1[20].", }, [ POWER8_PME_PM_DATA_FROM_LMEM ] = { .pme_name = "PM_DATA_FROM_LMEM", .pme_code = 0x2c048, .pme_short_desc = "The processor's data cache was reloaded from the local chip's Memory due to a demand load", .pme_long_desc = "The processor's data cache was reloaded from the local chip's Memory due to a demand load.", }, [ POWER8_PME_PM_DATA_FROM_MEMORY ] = { .pme_name = "PM_DATA_FROM_MEMORY", .pme_code = 0x2c04c, .pme_short_desc = "The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a demand load", .pme_long_desc = "The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a demand load.", }, [ POWER8_PME_PM_DC_PREF_STREAM_STRIDED_CONF ] = { .pme_name = "PM_DC_PREF_STREAM_STRIDED_CONF", .pme_code = 0x3e050, .pme_short_desc = "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software.", .pme_long_desc = "A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software..", }, [ POWER8_PME_PM_GCT_NOSLOT_BR_MPRED ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED", .pme_code = 0x4d01e, .pme_short_desc = "Gct empty fo this thread due to branch misprediction", .pme_long_desc = "Gct empty for this thread due to branch misprediction.", }, [ POWER8_PME_PM_GCT_NOSLOT_BR_MPRED_ICMISS ] = { .pme_name = "PM_GCT_NOSLOT_BR_MPRED_ICMISS", .pme_code = 0x4d01a, .pme_short_desc = "Gct empty for this thread due to Icache Miss and branch mispred", .pme_long_desc = "Gct empty for this thread due to Icache Miss and branch mispred.", }, [ POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_ISSQ ] = { .pme_name = "PM_GCT_NOSLOT_DISP_HELD_ISSQ", .pme_code = 0x2d01e, .pme_short_desc = "Gct empty for this thread due to dispatch hold on this thread due to Issue q full", .pme_long_desc = "Gct empty for this thread due to dispatch hold on this thread due to Issue q full.", }, [ POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_OTHER ] = { .pme_name = "PM_GCT_NOSLOT_DISP_HELD_OTHER", .pme_code = 0x2e010, .pme_short_desc = "Gct empty for this thread due to dispatch hold on this thread due to sync", .pme_long_desc = "Gct empty for this thread due to dispatch hold on this thread due to sync.", }, [ POWER8_PME_PM_GCT_NOSLOT_DISP_HELD_SRQ ] = { .pme_name = "PM_GCT_NOSLOT_DISP_HELD_SRQ", .pme_code = 0x2d01c, .pme_short_desc = "Gct empty for this thread due to dispatch hold on this thread due to SRQ full", .pme_long_desc = "Gct empty for this thread due to dispatch hold on this thread due to SRQ full.", }, [ POWER8_PME_PM_GCT_NOSLOT_IC_L3MISS ] = { .pme_name = "PM_GCT_NOSLOT_IC_L3MISS", .pme_code = 0x4e010, .pme_short_desc = "Gct empty for this thread due to icach l3 miss", .pme_long_desc = "Gct empty for this thread due to icach l3 miss.", }, [ POWER8_PME_PM_GCT_NOSLOT_IC_MISS ] = { .pme_name = "PM_GCT_NOSLOT_IC_MISS", .pme_code = 0x2d01a, .pme_short_desc = "Gct empty for this thread due to Icache Miss", .pme_long_desc = "Gct empty for this thread due to Icache Miss.", }, [ POWER8_PME_PM_GRP_DISP ] = { .pme_name = "PM_GRP_DISP", .pme_code = 0x3000a, .pme_short_desc = "group dispatch", .pme_long_desc = "dispatch_success (Group Dispatched).", }, [ POWER8_PME_PM_GRP_MRK ] = { .pme_name = "PM_GRP_MRK", .pme_code = 0x10130, .pme_short_desc = "Instruction Marked", .pme_long_desc = "Instruction marked in idu.", }, [ POWER8_PME_PM_HV_CYC ] = { .pme_name = "PM_HV_CYC", .pme_code = 0x2000a, .pme_short_desc = "cycles in hypervisor mode", .pme_long_desc = "cycles in hypervisor mode .", }, [ POWER8_PME_PM_INST_CMPL ] = { .pme_name = "PM_INST_CMPL", .pme_code = 0x10002, .pme_short_desc = "Number of PowerPC Instructions that completed.", .pme_long_desc = "PPC Instructions Finished (completed).", }, [ POWER8_PME_PM_IOPS_CMPL ] = { .pme_name = "PM_IOPS_CMPL", .pme_code = 0x10014, .pme_short_desc = "Internal Operations completed", .pme_long_desc = "IOPS Completed.", }, [ POWER8_PME_PM_LD_CMPL ] = { .pme_name = "PM_LD_CMPL", .pme_code = 0x1002e, .pme_short_desc = "count of Loads completed", .pme_long_desc = "count of Loads completed.", }, [ POWER8_PME_PM_LD_L3MISS_PEND_CYC ] = { .pme_name = "PM_LD_L3MISS_PEND_CYC", .pme_code = 0x10062, .pme_short_desc = "Cycles L3 miss was pending for this thread", .pme_long_desc = "Cycles L3 miss was pending for this thread.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2 ] = { .pme_name = "PM_MRK_DATA_FROM_L2", .pme_code = 0x1d142, .pme_short_desc = "The processor's data cache was reloaded from local core's L2 due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from local core's L2 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2MISS_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2MISS_CYC", .pme_code = 0x4c12e, .pme_short_desc = "Duration in cycles to reload from a location other than the local core's L2 due to a marked load", .pme_long_desc = "Duration in cycles to reload from a location other than the local core's L2 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2_CYC", .pme_code = 0x4c122, .pme_short_desc = "Duration in cycles to reload from local core's L2 due to a marked load", .pme_long_desc = "Duration in cycles to reload from local core's L2 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2_NO_CONFLICT ] = { .pme_name = "PM_MRK_DATA_FROM_L2_NO_CONFLICT", .pme_code = 0x1d140, .pme_short_desc = "The processor's data cache was reloaded from local core's L2 without conflict due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from local core's L2 without conflict due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L2_NO_CONFLICT_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L2_NO_CONFLICT_CYC", .pme_code = 0x4c120, .pme_short_desc = "Duration in cycles to reload from local core's L2 without conflict due to a marked load", .pme_long_desc = "Duration in cycles to reload from local core's L2 without conflict due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3 ] = { .pme_name = "PM_MRK_DATA_FROM_L3", .pme_code = 0x4d142, .pme_short_desc = "The processor's data cache was reloaded from local core's L3 due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from local core's L3 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3MISS_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3MISS_CYC", .pme_code = 0x2d12e, .pme_short_desc = "Duration in cycles to reload from a location other than the local core's L3 due to a marked load", .pme_long_desc = "Duration in cycles to reload from a location other than the local core's L3 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3_CYC", .pme_code = 0x2d122, .pme_short_desc = "Duration in cycles to reload from local core's L3 due to a marked load", .pme_long_desc = "Duration in cycles to reload from local core's L3 due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3_NO_CONFLICT ] = { .pme_name = "PM_MRK_DATA_FROM_L3_NO_CONFLICT", .pme_code = 0x1d144, .pme_short_desc = "The processor's data cache was reloaded from local core's L3 without conflict due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from local core's L3 without conflict due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_L3_NO_CONFLICT_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_L3_NO_CONFLICT_CYC", .pme_code = 0x4c124, .pme_short_desc = "Duration in cycles to reload from local core's L3 without conflict due to a marked load", .pme_long_desc = "Duration in cycles to reload from local core's L3 without conflict due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_LL4 ] = { .pme_name = "PM_MRK_DATA_FROM_LL4", .pme_code = 0x1d14c, .pme_short_desc = "The processor's data cache was reloaded from the local chip's L4 cache due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from the local chip's L4 cache due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_LL4_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_LL4_CYC", .pme_code = 0x4c12c, .pme_short_desc = "Duration in cycles to reload from the local chip's L4 cache due to a marked load", .pme_long_desc = "Duration in cycles to reload from the local chip's L4 cache due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_LMEM ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM", .pme_code = 0x2d148, .pme_short_desc = "The processor's data cache was reloaded from the local chip's Memory due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from the local chip's Memory due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_LMEM_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_LMEM_CYC", .pme_code = 0x4d128, .pme_short_desc = "Duration in cycles to reload from the local chip's Memory due to a marked load", .pme_long_desc = "Duration in cycles to reload from the local chip's Memory due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_MEMORY ] = { .pme_name = "PM_MRK_DATA_FROM_MEMORY", .pme_code = 0x2d14c, .pme_short_desc = "The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a marked load", .pme_long_desc = "The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a marked load.", }, [ POWER8_PME_PM_MRK_DATA_FROM_MEMORY_CYC ] = { .pme_name = "PM_MRK_DATA_FROM_MEMORY_CYC", .pme_code = 0x4d12c, .pme_short_desc = "Duration in cycles to reload from a memory location including L4 from local remote or distant due to a marked load", .pme_long_desc = "Duration in cycles to reload from a memory location including L4 from local remote or distant due to a marked load.", }, [ POWER8_PME_PM_MRK_GRP_CMPL ] = { .pme_name = "PM_MRK_GRP_CMPL", .pme_code = 0x40130, .pme_short_desc = "marked instruction finished (completed)", .pme_long_desc = "marked instruction finished (completed).", }, [ POWER8_PME_PM_MRK_INST_DECODED ] = { .pme_name = "PM_MRK_INST_DECODED", .pme_code = 0x20130, .pme_short_desc = "marked instruction decoded", .pme_long_desc = "marked instruction decoded. Name from ISU?", }, [ POWER8_PME_PM_MRK_L2_RC_DISP ] = { .pme_name = "PM_MRK_L2_RC_DISP", .pme_code = 0x20114, .pme_short_desc = "Marked Instruction RC dispatched in L2", .pme_long_desc = "Marked Instruction RC dispatched in L2.", }, [ POWER8_PME_PM_MRK_LD_MISS_L1_CYC ] = { .pme_name = "PM_MRK_LD_MISS_L1_CYC", .pme_code = 0x4013e, .pme_short_desc = "Marked ld latency", .pme_long_desc = "Marked ld latency.", }, [ POWER8_PME_PM_MRK_STALL_CMPLU_CYC ] = { .pme_name = "PM_MRK_STALL_CMPLU_CYC", .pme_code = 0x3013e, .pme_short_desc = "Marked Group completion Stall", .pme_long_desc = "Marked Group Completion Stall cycles (use edge detect to count).", }, [ POWER8_PME_PM_NEST_REF_CLK ] = { .pme_name = "PM_NEST_REF_CLK", .pme_code = 0x3006e, .pme_short_desc = "Nest reference clocks", .pme_long_desc = "Nest reference clocks.", }, [ POWER8_PME_PM_PMC1_OVERFLOW ] = { .pme_name = "PM_PMC1_OVERFLOW", .pme_code = 0x20010, .pme_short_desc = "Overflow from counter 1", .pme_long_desc = "Overflow from counter 1.", }, [ POWER8_PME_PM_PMC2_OVERFLOW ] = { .pme_name = "PM_PMC2_OVERFLOW", .pme_code = 0x30010, .pme_short_desc = "Overflow from counter 2", .pme_long_desc = "Overflow from counter 2.", }, [ POWER8_PME_PM_PMC3_OVERFLOW ] = { .pme_name = "PM_PMC3_OVERFLOW", .pme_code = 0x40010, .pme_short_desc = "Overflow from counter 3", .pme_long_desc = "Overflow from counter 3.", }, [ POWER8_PME_PM_PMC4_OVERFLOW ] = { .pme_name = "PM_PMC4_OVERFLOW", .pme_code = 0x10010, .pme_short_desc = "Overflow from counter 4", .pme_long_desc = "Overflow from counter 4.", }, [ POWER8_PME_PM_PMC6_OVERFLOW ] = { .pme_name = "PM_PMC6_OVERFLOW", .pme_code = 0x30024, .pme_short_desc = "Overflow from counter 6", .pme_long_desc = "Overflow from counter 6.", }, [ POWER8_PME_PM_PPC_CMPL ] = { .pme_name = "PM_PPC_CMPL", .pme_code = 0x40002, .pme_short_desc = "PPC Instructions Finished (completed)", .pme_long_desc = "PPC Instructions Finished (completed).", }, [ POWER8_PME_PM_THRD_ALL_RUN_CYC ] = { .pme_name = "PM_THRD_ALL_RUN_CYC", .pme_code = 0x2000c, .pme_short_desc = "All Threads in Run_cycles (was both threads in run_cycles)", .pme_long_desc = "All Threads in Run_cycles (was both threads in run_cycles).", }, [ POWER8_PME_PM_THRESH_NOT_MET ] = { .pme_name = "PM_THRESH_NOT_MET", .pme_code = 0x4016e, .pme_short_desc = "Threshold counter did not meet threshold", .pme_long_desc = "Threshold counter did not meet threshold.", }, }; #endif papi-5.3.0/src/libpfm4/lib/events/intel_snbep_unc_qpi_events.h0000600003276200002170000003154412247131123024120 0ustar ralphundrgrad/* * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * This file has been automatically generated. * * PMU: snbep_unc_qpi (Intel SandyBridge-EP QPI uncore) */ static const intel_x86_umask_t snbep_unc_q_direct2core[]={ { .uname = "FAILURE_CREDITS", .udesc = "Number of spawn failures due to lack of Egress credits", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "FAILURE_CREDITS_RBT", .udesc = "Number of spawn failures due to lack of Egress credit and route-back table (RBT) bit was not set", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "FAILURE_RBT", .udesc = "Number of spawn failures because route-back table (RBT) specified that the transaction should not trigger a direct2core transaction", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SUCCESS", .udesc = "Number of spawn successes", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_q_rxl_credits_consumed_vn0[]={ { .uname = "DRS", .udesc = "Number of times VN0 consumed for DRS message class", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM", .udesc = "Number of times VN0 consumed for HOM message class", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB", .udesc = "Number of times VN0 consumed for NCB message class", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "Number of times VN0 consumed for NCS message class", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NDR", .udesc = "Number of times VN0 consumed for NDR message class", .ucode = 0x2000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "Number of times VN0 consumed for SNP message class", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_q_rxl_flits_g0[]={ { .uname = "DATA", .udesc = "Number of data flits over QPI", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "IDLE", .udesc = "Number of flits over QPI that do not hold protocol payload", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NON_DATA", .udesc = "Number of non-NULL non-data flits over QPI", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_q_rxl_flits_g1[]={ { .uname = "DRS", .udesc = "Number of flits over QPI on the Data Response (DRS) channel", .ucode = 0x1800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DRS_DATA", .udesc = "Number of data flits over QPI on the Data Response (DRS) channel", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "DRS_NONDATA", .udesc = "Number of protocol flits over QPI on the Data Response (DRS) channel", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM", .udesc = "Number of flits over QPI on the home channel", .ucode = 0x600, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM_NONREQ", .udesc = "Number of non-request flits over QPI on the home channel", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "HOM_REQ", .udesc = "Number of data requests over QPI on the home channel", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, { .uname = "SNP", .udesc = "Number of snoop requests flits over QPI", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_umask_t snbep_unc_q_rxl_flits_g2[]={ { .uname = "NCB", .udesc = "Number of non-coherent bypass flits", .ucode = 0xc00, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB_DATA", .udesc = "Number of non-coherent data flits", .ucode = 0x400, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCB_NONDATA", .udesc = "Number of bypass non-data flits", .ucode = 0x800, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NCS", .udesc = "Number of non-coherent standard (NCS) flits", .ucode = 0x1000, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NDR_AD", .udesc = "Number of flits received over Non-data response (NDR) channel", .ucode = 0x100, .uflags = INTEL_X86_NCOMBO, }, { .uname = "NDR_AK", .udesc = "Number of flits received on the Non-data response (NDR) channel)", .ucode = 0x200, .uflags = INTEL_X86_NCOMBO, }, }; static const intel_x86_entry_t intel_snbep_unc_q_pe[]={ { .name = "UNC_Q_CLOCKTICKS", .desc = "Number of qfclks", .code = 0x14, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_CTO_COUNT", .desc = "Count of CTO Events", .code = 0x38, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_DIRECT2CORE", .desc = "Direct 2 Core Spawning", .code = 0x13, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_direct2core), .umasks = snbep_unc_q_direct2core }, { .name = "UNC_Q_L1_POWER_CYCLES", .desc = "Cycles in L1", .code = 0x12, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL0P_POWER_CYCLES", .desc = "Cycles in L0p", .code = 0x10, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL0_POWER_CYCLES", .desc = "Cycles in L0", .code = 0xf, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_BYPASSED", .desc = "Rx Flit Buffer Bypassed", .code = 0x9, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_CREDITS_CONSUMED_VN0", .desc = "VN0 Credit Consumed", .code = 0x1e | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_credits_consumed_vn0), .umasks = snbep_unc_q_rxl_credits_consumed_vn0 }, { .name = "UNC_Q_RXL_CREDITS_CONSUMED_VNA", .desc = "VNA Credit Consumed", .code = 0x1d | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_CYCLES_NE", .desc = "RxQ Cycles Not Empty", .code = 0xa, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_FLITS_G0", .desc = "Flits Received - Group 0", .code = 0x1, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g0), .umasks = snbep_unc_q_rxl_flits_g0 }, { .name = "UNC_Q_RXL_FLITS_G1", .desc = "Flits Received - Group 1", .code = 0x2 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g1), .umasks = snbep_unc_q_rxl_flits_g1 }, { .name = "UNC_Q_RXL_FLITS_G2", .desc = "Flits Received - Group 2", .code = 0x3 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g2), .umasks = snbep_unc_q_rxl_flits_g2 }, { .name = "UNC_Q_RXL_INSERTS", .desc = "Rx Flit Buffer Allocations", .code = 0x8, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_DRS", .desc = "Rx Flit Buffer Allocations - DRS", .code = 0x9 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_HOM", .desc = "Rx Flit Buffer Allocations - HOM", .code = 0xc | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_NCB", .desc = "Rx Flit Buffer Allocations - NCB", .code = 0xa | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_NCS", .desc = "Rx Flit Buffer Allocations - NCS", .code = 0xb | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_NDR", .desc = "Rx Flit Buffer Allocations - NDR", .code = 0xe | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_INSERTS_SNP", .desc = "Rx Flit Buffer Allocations - SNP", .code = 0xd | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY", .desc = "RxQ Occupancy - All Packets", .code = 0xb, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_DRS", .desc = "RxQ Occupancy - DRS", .code = 0x15 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_HOM", .desc = "RxQ Occupancy - HOM", .code = 0x18 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_NCB", .desc = "RxQ Occupancy - NCB", .code = 0x16 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_NCS", .desc = "RxQ Occupancy - NCS", .code = 0x17 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_NDR", .desc = "RxQ Occupancy - NDR", .code = 0x1a | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_RXL_OCCUPANCY_SNP", .desc = "RxQ Occupancy - SNP", .code = 0x19 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL0P_POWER_CYCLES", .desc = "Cycles in L0p", .code = 0xd, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL0_POWER_CYCLES", .desc = "Cycles in L0", .code = 0xc, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL_BYPASSED", .desc = "Tx Flit Buffer Bypassed", .code = 0x5, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL_CYCLES_NE", .desc = "Tx Flit Buffer Cycles not Empty", .code = 0x6, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL_FLITS_G0", .desc = "Flits Transferred - Group 0", .code = 0x0, .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g0), .umasks = snbep_unc_q_rxl_flits_g0 /* shared with rxl_flits_g0 */ }, { .name = "UNC_Q_TXL_FLITS_G1", .desc = "Flits Transferred - Group 1", .code = 0x0 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g1), .umasks = snbep_unc_q_rxl_flits_g1 /* shared with rxl_flits_g1 */ }, { .name = "UNC_Q_TXL_FLITS_G2", .desc = "Flits Transferred - Group 2", .code = 0x1 | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .ngrp = 1, .modmsk = SNBEP_UNC_QPI_ATTRS, .numasks = LIBPFM_ARRAY_SIZE(snbep_unc_q_rxl_flits_g2), .umasks = snbep_unc_q_rxl_flits_g2 /* shared with rxl_flits_g2 */ }, { .name = "UNC_Q_TXL_INSERTS", .desc = "Tx Flit Buffer Allocations", .code = 0x4, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_TXL_OCCUPANCY", .desc = "Tx Flit Buffer Occupancy", .code = 0x7, .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_VNA_CREDIT_RETURNS", .desc = "VNA Credits Returned", .code = 0x1c | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, { .name = "UNC_Q_VNA_CREDIT_RETURN_OCCUPANCY", .desc = "VNA Credits Pending Return - Occupancy", .code = 0x1b | (1ULL << 21), /* sel_ext */ .cntmsk = 0xf, .modmsk = SNBEP_UNC_QPI_ATTRS, }, }; papi-5.3.0/src/libpfm4/lib/pfmlib_sparc.c0000600003276200002170000001706312247131124017645 0ustar ralphundrgrad/* * pfmlib_sparc.c : support for SPARC processors * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_sparc_priv.h" const pfmlib_attr_desc_t sparc_mods[]={ PFM_ATTR_B("k", "monitor at priv level 0"), /* monitor priv level 0 */ PFM_ATTR_B("u", "monitor at priv level 1, 2, 3"), /* monitor priv level 1, 2, 3 */ PFM_ATTR_B("h", "monitor in hypervisor"), /* monitor in hypervisor*/ PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; #define SPARC_NUM_MODS (sizeof(sparc_mods)/sizeof(pfmlib_attr_desc_t) - 1) #ifdef CONFIG_PFMLIB_OS_LINUX /* * helper function to retrieve one value from /proc/cpuinfo * for internal libpfm use only * attr: the attribute (line) to look for * ret_buf: a buffer to store the value of the attribute (as a string) * maxlen : number of bytes of capacity in ret_buf * * ret_buf is null terminated. * * Return: * 0 : attribute found, ret_buf populated * -1: attribute not found */ static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { FILE *fp = NULL; int ret = -1; size_t attr_len, buf_len = 0; char *p, *value = NULL; char *buffer = NULL; if (attr == NULL || ret_buf == NULL || maxlen < 1) return -1; attr_len = strlen(attr); fp = fopen("/proc/cpuinfo", "r"); if (fp == NULL) return -1; while(pfmlib_getl(&buffer, &buf_len, fp) != -1){ /* skip blank lines */ if (*buffer == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) goto error; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncmp(attr, buffer, attr_len)) break; } strncpy(ret_buf, value, maxlen-1); ret_buf[maxlen-1] = '\0'; ret = 0; error: free(buffer); fclose(fp); return ret; } #else static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { return -1; } #endif static pfm_pmu_t pmu_name_to_pmu_type(char *name) { if (!strcmp(name, "ultra12")) return PFM_PMU_SPARC_ULTRA12; if (!strcmp(name, "ultra3")) return PFM_PMU_SPARC_ULTRA3; if (!strcmp(name, "ultra3i")) return PFM_PMU_SPARC_ULTRA3I; if (!strcmp(name, "ultra3+")) return PFM_PMU_SPARC_ULTRA3PLUS; if (!strcmp(name, "ultra4+")) return PFM_PMU_SPARC_ULTRA4PLUS; if (!strcmp(name, "niagara2")) return PFM_PMU_SPARC_NIAGARA2; if (!strcmp(name, "niagara")) return PFM_PMU_SPARC_NIAGARA1; return PFM_PMU_NONE; } int pfm_sparc_detect(void *this) { pfmlib_pmu_t *pmu = this; pfm_pmu_t model; int ret; char buffer[32]; ret = pfmlib_getcpuinfo_attr("pmu", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; model = pmu_name_to_pmu_type(buffer); return model == pmu->pmu ? PFM_SUCCESS : PFM_ERR_NOTSUPP; } void pfm_sparc_display_reg(void *this, pfmlib_event_desc_t *e, pfm_sparc_reg_t reg) { __pfm_vbprintf("[0x%x umask=0x%x code=0x%x ctrl_s1=%d ctrl_s0=%d] %s\n", reg.val, reg.config.umask, reg.config.code, reg.config.ctrl_s1, reg.config.ctrl_s0, e->fstr); } int pfm_sparc_get_encoding(void *this, pfmlib_event_desc_t *e) { const sparc_entry_t *pe = this_pe(this); pfm_event_attr_info_t *a; pfm_sparc_reg_t reg; int i; //reg.val = pe[e->event].code << 16 | pe[e->event].ctrl; reg.val = pe[e->event].code; for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) reg.config.umask |= 1 << pe[e->event].umasks[a->idx].ubit; } e->count = 2; e->codes[0] = reg.val; e->codes[1] = pe[e->event].ctrl; evt_strcat(e->fstr, "%s", pe[e->event].name); pfmlib_sort_attr(e); for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", pe[e->event].umasks[a->idx].uname); } pfm_sparc_display_reg(this, e, reg); return PFM_SUCCESS; } int pfm_sparc_get_event_first(void *this) { return 0; } int pfm_sparc_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } int pfm_sparc_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } int pfm_sparc_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const sparc_entry_t *pe = this_pe(this); int i, j, error = 0; for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", pmu->name, i, pe[i].name); error++; } for(j=i+1; j < pmu->pme_count; j++) { if (pe[i].code == pe[j].code && pe[i].ctrl == pe[j].ctrl) { fprintf(fp, "pmu: %s event%d: %s code: 0x%x is duplicated in event%d : %s\n", pmu->name, i, pe[i].name, pe[i].code, j, pe[j].name); error++; } } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } int pfm_sparc_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { const sparc_entry_t *pe = this_pe(this); int idx; if (attr_idx < pe[pidx].numasks) { info->name = pe[pidx].umasks[attr_idx].uname; info->desc = pe[pidx].umasks[attr_idx].udesc; info->name = pe[pidx].umasks[attr_idx].uname; info->equiv= NULL; info->code = 1 << pe[pidx].umasks[attr_idx].ubit; info->type = PFM_ATTR_UMASK; info->idx = attr_idx; } else { /* * all mods implemented by ALL events */ idx = attr_idx - pe[pidx].numasks; info->name = sparc_mods[idx].name; info->desc = sparc_mods[idx].desc; info->type = sparc_mods[idx].type; info->code = idx; info->type = sparc_mods[idx].type; } info->is_dfl = 0; info->is_precise = 0; info->ctrl = PFM_ATTR_CTRL_PMU;; return PFM_SUCCESS; } int pfm_sparc_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const sparc_entry_t *pe = this_pe(this); /* * pmu and idx filled out by caller */ info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].code; info->equiv = NULL; info->idx = idx; /* private index */ info->pmu = pmu->pmu; info->is_precise = 0; info->nattrs = pe[idx].numasks; return PFM_SUCCESS; } unsigned int pfm_sparc_get_event_nattrs(void *this, int pidx) { const sparc_entry_t *pe = this_pe(this); return SPARC_NUM_MODS + pe[pidx].numasks; } papi-5.3.0/src/libpfm4/lib/pfmlib_mips_perf_event.c0000600003276200002170000000640412247131124021717 0ustar ralphundrgrad/* * pfmlib_mips_perf_event.c : perf_event MIPS functions * * Copyright (c) 2011 Samara Technology Group, Inc * Contributed by Philip Mucci * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include "pfmlib_priv.h" #include "pfmlib_mips_priv.h" #include "pfmlib_perf_event_priv.h" int pfm_mips_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { struct perf_event_attr *attr = e->os_data; int ret; ret = pfm_mips_get_encoding(this, e); if (ret != PFM_SUCCESS) return ret; if (e->count != 2) { DPRINT("unexpected encoding count=%d\n", e->count); return PFM_ERR_INVAL; } attr->type = PERF_TYPE_RAW; /* * priv levels are ignored because they are managed * directly through perf excl_*. */ attr->config = e->codes[0] >> 5; /* * codes[1] contains counter mask supported by the event. * Events support either odd or even indexed counters * except for cycles (code = 0) and instructions (code =1) * which work on all counters. * * The kernel expects bit 7 of config to indicate whether * the event works only on odd-indexed counters */ if ((e->codes[1] & 0x2) && attr->config > 1) attr->config |= 1ULL << 7; return PFM_SUCCESS; } void pfm_mips_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * remove PMU-provided attributes which are either * not accessible under perf_events or fully controlled * by perf_events, e.g., priv levels filters */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { /* * with perf_event, priv levels under full * control of perf_event. */ if ( e->pattrs[i].idx == MIPS_ATTR_K ||e->pattrs[i].idx == MIPS_ATTR_U ||e->pattrs[i].idx == MIPS_ATTR_S ||e->pattrs[i].idx == MIPS_ATTR_E) compact = 1; } /* * remove perf_event generic attributes not supported * by MIPS */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* no precise sampling on MIPS */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_x86_priv.h0000600003276200002170000003033212247131124021414 0ustar ralphundrgrad/* * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_INTEL_X86_PRIV_H__ #define __PFMLIB_INTEL_X86_PRIV_H__ /* * This file contains the definitions used for all Intel X86 processors */ /* * maximum number of unit masks groups per event */ #define INTEL_X86_NUM_GRP 8 #define INTEL_X86_MAX_FILTERS 2 /* * unit mask description */ typedef struct { const char *uname; /* unit mask name */ const char *udesc; /* unit umask description */ const char *uequiv;/* name of event from which this one is derived, NULL if none */ uint64_t ucntmsk;/* supported counters for umask (if set, supersedes cntmsk) */ uint64_t ucode; /* unit mask code */ uint64_t ufilters[INTEL_X86_MAX_FILTERS]; /* extra encoding for event */ unsigned int uflags; /* unit mask flags */ unsigned int umodel; /* only available on this PMU model */ unsigned int grpid; /* unit mask group id */ unsigned int modhw; /* hardwired modifiers, cannot be changed */ unsigned int umodmsk_req; /* bitmask of required modifiers */ } intel_x86_umask_t; #define INTEL_X86_MAX_GRPID (~0U) /* * event description */ typedef struct { const char *name; /* event name */ const char *desc; /* event description */ const char *equiv; /* name of event from which this one is derived, NULL if none */ uint64_t cntmsk; /* supported counters */ unsigned int code; /* event code */ unsigned int numasks;/* number of umasks */ unsigned int flags; /* flags */ unsigned int modmsk; /* bitmask of modifiers for this event */ unsigned int modmsk_req; /* bitmask of required modifiers */ unsigned int ngrp; /* number of unit masks groups */ const intel_x86_umask_t *umasks; /* umask desc */ } intel_x86_entry_t; /* * pme_flags value (event and unit mask) */ #define INTEL_X86_NCOMBO 0x01 /* unit masks within group cannot be combined */ #define INTEL_X86_FALLBACK_GEN 0x02 /* fallback from fixed to generic counter possible */ #define INTEL_X86_PEBS 0x04 /* event supports PEBS or at least one umask supports PEBS */ #define INTEL_X86_DFL 0x08 /* unit mask is default choice */ #define INTEL_X86_GRP_EXCL 0x10 /* only one unit mask group can be selected */ #define INTEL_X86_NHM_OFFCORE 0x20 /* Nehalem/Westmere offcore_response */ #define INTEL_X86_EXCL_GRP_GT 0x40 /* exclude use of grp with id > own grp */ #define INTEL_X86_FIXED 0x80 /* fixed counter only event */ #define INTEL_X86_NO_AUTOENCODE 0x100 /* does not support auto encoding validation */ #define INTEL_X86_CODE_OVERRIDE 0x200 /* umask overrides event code */ #define INTEL_X86_LDLAT 0x400 /* needs load latency modifier (ldlat) */ typedef union pfm_intel_x86_reg { unsigned long long val; /* complete register value */ struct { unsigned long sel_event_select:8; /* event mask */ unsigned long sel_unit_mask:8; /* unit mask */ unsigned long sel_usr:1; /* user level */ unsigned long sel_os:1; /* system level */ unsigned long sel_edge:1; /* edge detec */ unsigned long sel_pc:1; /* pin control */ unsigned long sel_int:1; /* enable APIC intr */ unsigned long sel_anythr:1; /* measure any thread */ unsigned long sel_en:1; /* enable */ unsigned long sel_inv:1; /* invert counter mask */ unsigned long sel_cnt_mask:8; /* counter mask */ unsigned long sel_intx:1; /* only in tx region */ unsigned long sel_intxcp:1; /* excl. aborted tx region */ unsigned long sel_res2:30; } perfevtsel; struct { unsigned long usel_event:8; /* event select */ unsigned long usel_umask:8; /* event unit mask */ unsigned long usel_res1:1; /* reserved */ unsigned long usel_occ:1; /* occupancy reset */ unsigned long usel_edge:1; /* edge detection */ unsigned long usel_res2:1; /* reserved */ unsigned long usel_int:1; /* PMI enable */ unsigned long usel_res3:1; /* reserved */ unsigned long usel_en:1; /* enable */ unsigned long usel_inv:1; /* invert */ unsigned long usel_cnt_mask:8; /* counter mask */ unsigned long usel_res4:32; /* reserved */ } nhm_unc; struct { unsigned long usel_en:1; /* enable */ unsigned long usel_res1:1; unsigned long usel_int:1; /* PMI enable */ unsigned long usel_res2:32; unsigned long usel_res3:29; } nhm_unc_fixed; struct { unsigned long cpl_eq0:1; /* filter out branches at pl0 */ unsigned long cpl_neq0:1; /* filter out branches at pl1-pl3 */ unsigned long jcc:1; /* filter out condition branches */ unsigned long near_rel_call:1; /* filter out near relative calls */ unsigned long near_ind_call:1; /* filter out near indirect calls */ unsigned long near_ret:1; /* filter out near returns */ unsigned long near_ind_jmp:1; /* filter out near unconditional jmp/calls */ unsigned long near_rel_jmp:1; /* filter out near uncoditional relative jmp */ unsigned long far_branch:1; /* filter out far branches */ unsigned long reserved1:23; /* reserved */ unsigned long reserved2:32; /* reserved */ } nhm_lbr_select; } pfm_intel_x86_reg_t; #define INTEL_X86_ATTR_K 0 /* kernel (0) */ #define INTEL_X86_ATTR_U 1 /* user (1, 2, 3) */ #define INTEL_X86_ATTR_E 2 /* edge */ #define INTEL_X86_ATTR_I 3 /* invert */ #define INTEL_X86_ATTR_C 4 /* counter mask */ #define INTEL_X86_ATTR_T 5 /* any thread */ #define INTEL_X86_ATTR_LDLAT 6 /* load latency threshold */ #define INTEL_X86_ATTR_INTX 7 /* in transaction */ #define INTEL_X86_ATTR_INTXCP 8 /* not aborted transaction */ #define _INTEL_X86_ATTR_U (1 << INTEL_X86_ATTR_U) #define _INTEL_X86_ATTR_K (1 << INTEL_X86_ATTR_K) #define _INTEL_X86_ATTR_I (1 << INTEL_X86_ATTR_I) #define _INTEL_X86_ATTR_E (1 << INTEL_X86_ATTR_E) #define _INTEL_X86_ATTR_C (1 << INTEL_X86_ATTR_C) #define _INTEL_X86_ATTR_T (1 << INTEL_X86_ATTR_T) #define _INTEL_X86_ATTR_INTX (1 << INTEL_X86_ATTR_INTX) #define _INTEL_X86_ATTR_INTXCP (1 << INTEL_X86_ATTR_INTXCP) #define _INTEL_X86_ATTR_LDLAT (1 << INTEL_X86_ATTR_LDLAT) #define INTEL_X86_ATTRS \ (_INTEL_X86_ATTR_I|_INTEL_X86_ATTR_E|_INTEL_X86_ATTR_C|_INTEL_X86_ATTR_U|_INTEL_X86_ATTR_K) #define INTEL_V1_ATTRS INTEL_X86_ATTRS #define INTEL_V2_ATTRS INTEL_X86_ATTRS #define INTEL_FIXED2_ATTRS (_INTEL_X86_ATTR_U|_INTEL_X86_ATTR_K) #define INTEL_FIXED3_ATTRS (INTEL_FIXED2_ATTRS|_INTEL_X86_ATTR_T) #define INTEL_V3_ATTRS (INTEL_V2_ATTRS|_INTEL_X86_ATTR_T) #define INTEL_V4_ATTRS (INTEL_V3_ATTRS | _INTEL_X86_ATTR_INTX | _INTEL_X86_ATTR_INTXCP) /* let's define some handy shortcuts! */ #define sel_event_select perfevtsel.sel_event_select #define sel_unit_mask perfevtsel.sel_unit_mask #define sel_usr perfevtsel.sel_usr #define sel_os perfevtsel.sel_os #define sel_edge perfevtsel.sel_edge #define sel_pc perfevtsel.sel_pc #define sel_int perfevtsel.sel_int #define sel_en perfevtsel.sel_en #define sel_inv perfevtsel.sel_inv #define sel_cnt_mask perfevtsel.sel_cnt_mask #define sel_anythr perfevtsel.sel_anythr #define sel_intx perfevtsel.sel_intx #define sel_intxcp perfevtsel.sel_intxcp /* * shift relative to start of register */ #define INTEL_X86_EDGE_BIT 18 #define INTEL_X86_ANY_BIT 21 #define INTEL_X86_INV_BIT 23 #define INTEL_X86_CMASK_BIT 24 #define INTEL_X86_MOD_EDGE (1 << INTEL_X86_EDGE_BIT) #define INTEL_X86_MOD_ANY (1 << INTEL_X86_ANY_BIT) #define INTEL_X86_MOD_INV (1 << INTEL_X86_INV_BIT) /* intel x86 core PMU supported plm */ #define INTEL_X86_PLM (PFM_PLM0|PFM_PLM3) /* * Intel x86 specific pmu flags (pmu->flags 16 MSB) */ #define INTEL_X86_PMU_FL_ECMASK 0x10000 /* edge requires cmask >=1 */ /* * default ldlat value for PEBS-LL events. Used when ldlat= is missing */ #define INTEL_X86_LDLAT_DEFAULT 3 /* default ldlat value in core cycles */ typedef struct { unsigned int version:8; unsigned int num_cnt:8; unsigned int cnt_width:8; unsigned int ebx_length:8; } intel_x86_pmu_eax_t; typedef struct { unsigned int num_cnt:6; unsigned int cnt_width:6; unsigned int reserved:20; } intel_x86_pmu_edx_t; typedef struct { unsigned int no_core_cycle:1; unsigned int no_inst_retired:1; unsigned int no_ref_cycle:1; unsigned int no_llc_ref:1; unsigned int no_llc_miss:1; unsigned int no_br_retired:1; unsigned int no_br_mispred_retired:1; unsigned int reserved:25; } intel_x86_pmu_ebx_t; typedef struct { int model; int family; /* 0 means nothing detected yet */ int arch_version; int stepping; } pfm_intel_x86_config_t; extern pfm_intel_x86_config_t pfm_intel_x86_cfg; extern const pfmlib_attr_desc_t intel_x86_mods[]; static inline int intel_x86_eflag(void *this, int idx, int flag) { const intel_x86_entry_t *pe = this_pe(this); return !!(pe[idx].flags & flag); } static inline int is_model_umask(void *this, int pidx, int attr) { pfmlib_pmu_t *pmu = this; const intel_x86_entry_t *pe = this_pe(this); const intel_x86_entry_t *ent; unsigned int model; ent = pe + pidx; model = ent->umasks[attr].umodel; return model == 0 || model == pmu->pmu; } static inline int intel_x86_uflag(void *this, int idx, int attr, int flag) { const intel_x86_entry_t *pe = this_pe(this); return !!(pe[idx].umasks[attr].uflags & flag); } static inline unsigned int intel_x86_num_umasks(void *this, int pidx) { pfmlib_pmu_t *pmu = this; const intel_x86_entry_t *pe = this_pe(this); unsigned int i, n = 0, model; /* * some umasks may be model specific */ for (i = 0; i < pe[pidx].numasks; i++) { model = pe[pidx].umasks[i].umodel; if (model && model != pmu->pmu) continue; n++; } return n; } /* * find actual index of umask based on attr_idx */ static inline int intel_x86_attr2umask(void *this, int pidx, int attr_idx) { const intel_x86_entry_t *pe = this_pe(this); unsigned int i; for (i = 0; i < pe[pidx].numasks; i++) { if (!is_model_umask(this, pidx, i)) continue; if (attr_idx == 0) break; attr_idx--; } return i; } extern int pfm_intel_x86_detect(void); extern int pfm_intel_x86_add_defaults(void *this, pfmlib_event_desc_t *e, unsigned int msk, uint64_t *umask, unsigned int max_grpid); extern int pfm_intel_x86_event_is_valid(void *this, int pidx); extern int pfm_intel_x86_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_intel_x86_get_event_first(void *this); extern int pfm_intel_x86_get_event_next(void *this, int idx); extern int pfm_intel_x86_get_event_umask_first(void *this, int idx); extern int pfm_intel_x86_get_event_umask_next(void *this, int idx, int attr); extern int pfm_intel_x86_validate_table(void *this, FILE *fp); extern int pfm_intel_x86_get_event_attr_info(void *this, int idx, int attr_idx, pfm_event_attr_info_t *info); extern int pfm_intel_x86_get_event_info(void *this, int idx, pfm_event_info_t *info); extern int pfm_intel_x86_valid_pebs(pfmlib_event_desc_t *e); extern int pfm_intel_x86_perf_event_encoding(pfmlib_event_desc_t *e, void *data); extern int pfm_intel_x86_perf_detect(void *this); extern unsigned int pfm_intel_x86_get_event_nattrs(void *this, int pidx); extern int intel_x86_attr2mod(void *this, int pidx, int attr_idx); extern int pfm_intel_x86_get_perf_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_intel_nhm_unc_get_perf_encoding(void *this, pfmlib_event_desc_t *e); extern void pfm_intel_x86_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_intel_x86_can_auto_encode(void *this, int pidx, int uidx); #endif /* __PFMLIB_INTEL_X86_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_pcu.c0000600003276200002170000000663012247131124022551 0ustar ralphundrgrad/* * pfmlib_intel_snbep_unc_pcu.c : Intel SandyBridge-EP Power Control Unit (PCU) uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_pcu_events.h" static void display_pcu(void *this, pfmlib_event_desc_t *e, void *val) { const intel_x86_entry_t *pe = this_pe(this); pfm_snbep_unc_reg_t *reg = val; pfm_snbep_unc_reg_t f; __pfm_vbprintf("[UNC_PCU=0x%"PRIx64" event=0x%x occ_sel=0x%x en=%d " "inv=%d edge=%d thres=%d occ_inv=%d occ_edge=%d] %s\n", reg->val, reg->pcu.unc_event, reg->pcu.unc_occ, reg->pcu.unc_en, reg->pcu.unc_inv, reg->pcu.unc_edge, reg->pcu.unc_thres, reg->pcu.unc_occ_inv, reg->pcu.unc_occ_edge, pe[e->event].name); if (e->count == 1) return; f.val = e->codes[1]; __pfm_vbprintf("[UNC_PCU_FILTER=0x%"PRIx64" band0=%u band1=%u band2=%u band3=%u]\n", f.val, f.pcu_filt.filt0, f.pcu_filt.filt1, f.pcu_filt.filt2, f.pcu_filt.filt3); } pfmlib_pmu_t intel_snbep_unc_pcu_support = { .desc = "Intel Sandy Bridge-EP PCU uncore", .name = "snbep_unc_pcu", .perf_name = "uncore_pcu", .pmu = PFM_PMU_INTEL_SNBEP_UNC_PCU, .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_p_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 4, .num_fixed_cntrs = 0, .max_encoding = 2, .pe = intel_snbep_unc_p_pe, .atdesc = snbep_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_PMU_FL_UNC_OCC | PFMLIB_PMU_FL_NO_SMPL, .pmu_detect = pfm_intel_snbep_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_snbep_unc_can_auto_encode, .display_reg = display_pcu, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_netburst.c0000600003276200002170000003150512247131124021573 0ustar ralphundrgrad/* * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Copyright (c) 2006 IBM Corp. * Contributed by Kevin Corry * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * pfmlib_intel_netburst.c * * Support for the Pentium4/Xeon/EM64T processor family (family=15). */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_netburst_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_netburst_events.h" const pfmlib_attr_desc_t netburst_mods[]={ PFM_ATTR_B("u", "monitor at priv level 1, 2, 3"), /* monitor priv level 1, 2, 3 */ PFM_ATTR_B("k", "monitor at priv level 0"), /* monitor priv level 0 */ PFM_ATTR_B("cmpl", "complement"), /* set: <=, clear: > */ PFM_ATTR_B("e", "edge"), /* edge */ PFM_ATTR_I("thr", "event threshold in range [0-15]"), /* threshold */ }; #define NETBURST_MODS_COUNT (sizeof(netburst_mods)/sizeof(pfmlib_attr_desc_t)) extern pfmlib_pmu_t netburst_support; static inline int netburst_get_numasks(int pidx) { int i = 0; /* * name = NULL is end-marker */ while (netburst_events[pidx].event_masks[i].name) i++; return i; } static void netburst_display_reg(pfmlib_event_desc_t *e) { netburst_escr_value_t escr; netburst_cccr_value_t cccr; escr.val = e->codes[0]; cccr.val = e->codes[1]; __pfm_vbprintf("[0x%"PRIx64" 0x%"PRIx64" 0x%"PRIx64" usr=%d os=%d tag_ena=%d tag_val=%d " "evmask=0x%x evsel=0x%x escr_sel=0x%x comp=%d cmpl=%d thr=%d e=%d", escr, cccr, e->codes[2], /* perf_event code */ escr.bits.t0_usr, /* t1 is identical */ escr.bits.t0_os, /* t1 is identical */ escr.bits.tag_enable, escr.bits.tag_value, escr.bits.event_mask, escr.bits.event_select, cccr.bits.escr_select, cccr.bits.compare, cccr.bits.complement, cccr.bits.threshold, cccr.bits.edge); __pfm_vbprintf("] %s\n", e->fstr); } static int netburst_add_defaults(pfmlib_event_desc_t *e, unsigned int *evmask) { int i, n; n = netburst_get_numasks(e->event); for (i = 0; i < n; i++) { if (netburst_events[e->event].event_masks[i].flags & NETBURST_FL_DFL) goto found; } return PFM_ERR_ATTR; found: *evmask = 1 << netburst_events[e->event].event_masks[i].bit; n = e->nattrs; e->attrs[n].id = i; e->attrs[n].ival = i; e->nattrs = n+1; return PFM_SUCCESS; } int pfm_netburst_get_encoding(void *this, pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; netburst_escr_value_t escr; netburst_cccr_value_t cccr; unsigned int evmask = 0; unsigned int plmmsk = 0; int umask_done = 0; const char *n; int k, id, bit, ret; int tag_enable = 0, tag_value = 0; e->fstr[0] = '\0'; escr.val = 0; cccr.val = 0; for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { bit = netburst_events[e->event].event_masks[a->idx].bit; n = netburst_events[e->event].event_masks[a->idx].name; /* * umask combination seems possible, although it does * not always make sense, e.g., BOGUS vs. NBOGUS */ if (bit < EVENT_MASK_BITS && n) { evmask |= (1 << bit); } else if (bit >= EVENT_MASK_BITS && n) { tag_value |= (1 << (bit - EVENT_MASK_BITS)); tag_enable = 1; } umask_done = 1; } else if (a->type == PFM_ATTR_RAW_UMASK) { /* should not happen */ return PFM_ERR_ATTR; } else { uint64_t ival = e->attrs[k].ival; switch (a->idx) { case NETBURST_ATTR_U: escr.bits.t1_usr = !!ival; escr.bits.t0_usr = !!ival; plmmsk |= _NETBURST_ATTR_U; break; case NETBURST_ATTR_K: escr.bits.t1_os = !!ival; escr.bits.t0_os = !!ival; plmmsk |= _NETBURST_ATTR_K; break; case NETBURST_ATTR_E: if (ival) { cccr.bits.compare = 1; cccr.bits.edge = 1; } break; case NETBURST_ATTR_C: if (ival) { cccr.bits.compare = 1; cccr.bits.complement = 1; } break; case NETBURST_ATTR_T: if (ival > 15) return PFM_ERR_ATTR_VAL; if (ival) { cccr.bits.compare = 1; cccr.bits.threshold = ival; } break; default: return PFM_ERR_ATTR; } } } /* * handle case where no priv level mask was passed. * then we use the dfl_plm */ if (!(plmmsk & (_NETBURST_ATTR_K|_NETBURST_ATTR_U))) { if (e->dfl_plm & PFM_PLM0) { escr.bits.t1_os = 1; escr.bits.t0_os = 1; } if (e->dfl_plm & PFM_PLM3) { escr.bits.t1_usr = 1; escr.bits.t0_usr = 1; } } if (!umask_done) { ret = netburst_add_defaults(e, &evmask); if (ret != PFM_SUCCESS) return ret; } escr.bits.tag_enable = tag_enable; escr.bits.tag_value = tag_value; escr.bits.event_mask = evmask; escr.bits.event_select = netburst_events[e->event].event_select; cccr.bits.enable = 1; cccr.bits.escr_select = netburst_events[e->event].escr_select; cccr.bits.active_thread = 3; if (e->event == PME_REPLAY_EVENT) escr.bits.event_mask &= P4_REPLAY_REAL_MASK; /* remove virtual mask bits */ /* * reorder all the attributes such that the fstr appears always * the same regardless of how the attributes were submitted. */ evt_strcat(e->fstr, "%s", netburst_events[e->event].name); pfmlib_sort_attr(e); for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { id = e->attrs[k].id; evt_strcat(e->fstr, ":%s", netburst_events[e->event].event_masks[id].name); } } evt_strcat(e->fstr, ":%s=%lu", netburst_mods[NETBURST_ATTR_K].name, escr.bits.t0_os); evt_strcat(e->fstr, ":%s=%lu", netburst_mods[NETBURST_ATTR_U].name, escr.bits.t0_usr); evt_strcat(e->fstr, ":%s=%lu", netburst_mods[NETBURST_ATTR_E].name, cccr.bits.edge); evt_strcat(e->fstr, ":%s=%lu", netburst_mods[NETBURST_ATTR_C].name, cccr.bits.complement); evt_strcat(e->fstr, ":%s=%lu", netburst_mods[NETBURST_ATTR_T].name, cccr.bits.threshold); e->count = 2; e->codes[0] = escr.val; e->codes[1] = cccr.val; netburst_display_reg(e); return PFM_SUCCESS; } static int pfm_netburst_detect(void *this) { int ret; int model; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 15) return PFM_ERR_NOTSUPP; model = pfm_intel_x86_cfg.model; if (model == 3 || model == 4 || model == 6) return PFM_ERR_NOTSUPP; return PFM_SUCCESS; } static int pfm_netburst_detect_prescott(void *this) { int ret; int model; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 15) return PFM_ERR_NOTSUPP; /* * prescott has one more event (instr_completed) */ model = pfm_intel_x86_cfg.model; if (model != 3 && model != 4 && model != 6) return PFM_ERR_NOTSUPP; return PFM_SUCCESS; } static int pfm_netburst_get_event_first(void *this) { pfmlib_pmu_t *p = this; return p->pme_count ? 0 : -1; } static int pfm_netburst_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } static int pfm_netburst_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } static int pfm_netburst_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { const netburst_entry_t *pe = this_pe(this); int numasks, idx; numasks = netburst_get_numasks(pidx); if (attr_idx < numasks) { //idx = pfm_intel_x86_attr2umask(this, pidx, attr_idx); idx = attr_idx; info->name = pe[pidx].event_masks[idx].name; info->desc = pe[pidx].event_masks[idx].desc; info->equiv= NULL; info->code = pe[pidx].event_masks[idx].bit; info->type = PFM_ATTR_UMASK; info->is_dfl = !!(pe[pidx].event_masks[idx].flags & NETBURST_FL_DFL); } else { idx = attr_idx - numasks; info->name = netburst_mods[idx].name; info->desc = netburst_mods[idx].desc; info->equiv= NULL; info->code = idx; info->type = netburst_mods[idx].type; info->is_dfl = 0; } info->ctrl = PFM_ATTR_CTRL_PMU; info->idx = idx; /* namespace specific index */ info->dfl_val64 = 0; return PFM_SUCCESS; } static int pfm_netburst_get_event_info(void *this, int idx, pfm_event_info_t *info) { const netburst_entry_t *pe = this_pe(this); pfmlib_pmu_t *pmu = this; /* * pmu and idx filled out by caller */ info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].event_select | (pe[idx].escr_select << 8); info->equiv = NULL; info->idx = idx; /* private index */ info->pmu = pmu->pmu; info->nattrs = netburst_get_numasks(idx); info->nattrs += NETBURST_MODS_COUNT; return PFM_SUCCESS; } static int pfm_netburst_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const netburst_entry_t *pe = netburst_events; const char *name = pmu->name; int i, j, noname, ndfl; int error = 0; for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", name, i, pe[i].name); error++; } noname = ndfl = 0; /* name = NULL is end-marker, veryfy there is at least one */ for(j= 0; j < EVENT_MASK_BITS; j++) { if (!pe[i].event_masks[j].name) noname++; if (pe[i].event_masks[j].name) { if (!pe[i].event_masks[j].desc) { fprintf(fp, "pmu: %s event%d:%s umask%d: %s :: no description\n", name, i, pe[i].name, j, pe[i].event_masks[j].name); error++; } if (pe[i].event_masks[j].bit >= (EVENT_MASK_BITS+4)) { fprintf(fp, "pmu: %s event%d:%s umask%d: %s :: invalid bit field\n", name, i, pe[i].name, j, pe[i].event_masks[j].name); error++; } if (pe[i].event_masks[j].flags & NETBURST_FL_DFL) ndfl++; } } if (ndfl > 1) { fprintf(fp, "pmu: %s event%d:%s :: more than one default umask\n", name, i, pe[i].name); error++; } if (!noname) { fprintf(fp, "pmu: %s event%d:%s :: no event mask end-marker\n", name, i, pe[i].name); error++; } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } static unsigned int pfm_netburst_get_event_nattrs(void *this, int pidx) { unsigned int nattrs; nattrs = netburst_get_numasks(pidx); nattrs += NETBURST_MODS_COUNT; return nattrs; } pfmlib_pmu_t netburst_support = { .desc = "Pentium4", .name = "netburst", .pmu = PFM_PMU_INTEL_NETBURST, .pme_count = LIBPFM_ARRAY_SIZE(netburst_events) - 1, .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .atdesc = netburst_mods, .pe = netburst_events, .max_encoding = 3, .num_cntrs = 18, .pmu_detect = pfm_netburst_detect, .get_event_encoding[PFM_OS_NONE] = pfm_netburst_get_encoding, PFMLIB_ENCODE_PERF(pfm_netburst_get_perf_encoding), .get_event_first = pfm_netburst_get_event_first, .get_event_next = pfm_netburst_get_event_next, .event_is_valid = pfm_netburst_event_is_valid, .validate_table = pfm_netburst_validate_table, .get_event_info = pfm_netburst_get_event_info, .get_event_attr_info = pfm_netburst_get_event_attr_info, .get_event_nattrs = pfm_netburst_get_event_nattrs, PFMLIB_VALID_PERF_PATTRS(pfm_netburst_perf_validate_pattrs), }; pfmlib_pmu_t netburst_p_support = { .desc = "Pentium4 (Prescott)", .name = "netburst_p", .pmu = PFM_PMU_INTEL_NETBURST_P, .pme_count = LIBPFM_ARRAY_SIZE(netburst_events), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .atdesc = netburst_mods, .pe = netburst_events, .max_encoding = 3, .num_cntrs = 18, .pmu_detect = pfm_netburst_detect_prescott, .get_event_encoding[PFM_OS_NONE] = pfm_netburst_get_encoding, PFMLIB_ENCODE_PERF(pfm_netburst_get_perf_encoding), .get_event_first = pfm_netburst_get_event_first, .get_event_next = pfm_netburst_get_event_next, .event_is_valid = pfm_netburst_event_is_valid, .validate_table = pfm_netburst_validate_table, .get_event_info = pfm_netburst_get_event_info, .get_event_attr_info = pfm_netburst_get_event_attr_info, .get_event_nattrs = pfm_netburst_get_event_nattrs, PFMLIB_VALID_PERF_PATTRS(pfm_netburst_perf_validate_pattrs), }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_coreduo.c0000600003276200002170000000540012247131124021360 0ustar ralphundrgrad/* * pfmlib_intel_coreduo.c : Intel Core Duo/Solo (Yonah) * * Copyright (c) 2009, Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_intel_x86_priv.h" /* architecture private */ #include "events/intel_coreduo_events.h" static int pfm_coreduo_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; /* * check for core solo/core duo */ if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; if (pfm_intel_x86_cfg.model != 14) return PFM_ERR_NOTSUPP; return PFM_SUCCESS; } static int pfm_coreduo_init(void *this) { pfm_intel_x86_cfg.arch_version = 1; return PFM_SUCCESS; } pfmlib_pmu_t intel_coreduo_support={ .desc = "Intel Core Duo/Core Solo", .name = "coreduo", .pmu = PFM_PMU_COREDUO, .pme_count = LIBPFM_ARRAY_SIZE(intel_coreduo_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 2, .max_encoding = 1, .pe = intel_coreduo_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_coreduo_detect, .pmu_init = pfm_coreduo_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_r2pcie.c0000600003276200002170000000506112247131124023143 0ustar ralphundrgrad/* * pfmlib_intel_snbep_r2pcie.c : Intel SandyBridge-EP R2PCIe uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_r2pcie_events.h" pfmlib_pmu_t intel_snbep_unc_r2pcie_support = { .desc = "Intel Sandy Bridge-EP R2PCIe uncore", .name = "snbep_unc_r2pcie", .perf_name = "uncore_r2pcie", .pmu = PFM_PMU_INTEL_SNBEP_UNC_R2PCIE, .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_r2_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 4, .num_fixed_cntrs = 0, .max_encoding = 1, .pe = intel_snbep_unc_r2_pe, .atdesc = snbep_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | PFMLIB_PMU_FL_NO_SMPL, .pmu_detect = pfm_intel_snbep_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_ultra4.c0000600003276200002170000000424712247131124021140 0ustar ralphundrgrad/* * pfmlib_sparc_ultra4.c : SPARC Ultra 4+ * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_sparc_priv.h" #include "events/sparc_ultra4plus_events.h" pfmlib_pmu_t sparc_ultra4plus_support={ .desc = "Ultra Sparc 4+", .name = "ultra4p", .pmu = PFM_PMU_SPARC_ULTRA4PLUS, .pme_count = LIBPFM_ARRAY_SIZE(ultra4plus_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .max_encoding = 2, .num_cntrs = 2, .pe = ultra4plus_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_ivb_unc.c0000600003276200002170000000574212247131124021356 0ustar ralphundrgrad/* * pfmlib_intel_ivb_unc.c : Intel IvyBridge C-Box uncore PMU * * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #define INTEL_SNB_UNC_ATTRS \ (_INTEL_X86_ATTR_I|_INTEL_X86_ATTR_E|_INTEL_X86_ATTR_C) /* same event table as SNB */ #include "events/intel_snb_unc_events.h" static int pfm_ivb_unc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 58: /* IvyBridge */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } #define IVB_UNC_CBOX(n, p) \ pfmlib_pmu_t intel_ivb_unc_cbo##n##_support={ \ .desc = "Intel Ivy Bridge C-box"#n" uncore", \ .name = "ivb_unc_cbo"#n, \ .perf_name = "uncore_cbox_"#n, \ .pmu = PFM_PMU_INTEL_IVB_UNC_CB##n, \ .pme_count = LIBPFM_ARRAY_SIZE(intel_snb_unc_##p##_pe), \ .type = PFM_PMU_TYPE_UNCORE, \ .num_cntrs = 2, \ .num_fixed_cntrs = 1, \ .max_encoding = 1,\ .pe = intel_snb_unc_##p##_pe, \ .atdesc = intel_x86_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_ivb_unc_detect, \ .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, \ PFMLIB_ENCODE_PERF(pfm_intel_nhm_unc_get_perf_encoding), \ PFMLIB_OS_DETECT(pfm_intel_x86_perf_detect), \ .get_event_first = pfm_intel_x86_get_event_first, \ .get_event_next = pfm_intel_x86_get_event_next, \ .event_is_valid = pfm_intel_x86_event_is_valid, \ .validate_table = pfm_intel_x86_validate_table, \ .get_event_info = pfm_intel_x86_get_event_info, \ .get_event_attr_info = pfm_intel_x86_get_event_attr_info, \ PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs),\ .get_event_nattrs = pfm_intel_x86_get_event_nattrs,\ } IVB_UNC_CBOX(0, cbo0); IVB_UNC_CBOX(1, cbo); IVB_UNC_CBOX(2, cbo); IVB_UNC_CBOX(3, cbo); papi-5.3.0/src/libpfm4/lib/pfmlib_itanium2.c0000600003276200002170000017343012247131124020266 0ustar ralphundrgrad/* * pfmlib_itanium2.c : support for the Itanium2 PMU family * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include /* public headers */ #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_priv_ia64.h" /* architecture private */ #include "pfmlib_itanium2_priv.h" /* PMU private */ #include "itanium2_events.h" /* PMU private */ #define is_ear(i) event_is_ear(itanium2_pe+(i)) #define is_ear_tlb(i) event_is_ear_tlb(itanium2_pe+(i)) #define is_ear_alat(i) event_is_ear_alat(itanium2_pe+(i)) #define is_ear_cache(i) event_is_ear_cache(itanium2_pe+(i)) #define is_iear(i) event_is_iear(itanium2_pe+(i)) #define is_dear(i) event_is_dear(itanium2_pe+(i)) #define is_btb(i) event_is_btb(itanium2_pe+(i)) #define has_opcm(i) event_opcm_ok(itanium2_pe+(i)) #define has_iarr(i) event_iarr_ok(itanium2_pe+(i)) #define has_darr(i) event_darr_ok(itanium2_pe+(i)) #define evt_use_opcm(e) ((e)->pfp_ita2_pmc8.opcm_used != 0 || (e)->pfp_ita2_pmc9.opcm_used !=0) #define evt_use_irange(e) ((e)->pfp_ita2_irange.rr_used) #define evt_use_drange(e) ((e)->pfp_ita2_drange.rr_used) #define evt_grp(e) (int)itanium2_pe[e].pme_qualifiers.pme_qual.pme_group #define evt_set(e) (int)itanium2_pe[e].pme_qualifiers.pme_qual.pme_set #define evt_umask(e) itanium2_pe[e].pme_umask #define FINE_MODE_BOUNDARY_BITS 12 #define FINE_MODE_MASK ~((1U<<12)-1) /* let's define some handy shortcuts! */ #define pmc_plm pmc_ita2_counter_reg.pmc_plm #define pmc_ev pmc_ita2_counter_reg.pmc_ev #define pmc_oi pmc_ita2_counter_reg.pmc_oi #define pmc_pm pmc_ita2_counter_reg.pmc_pm #define pmc_es pmc_ita2_counter_reg.pmc_es #define pmc_umask pmc_ita2_counter_reg.pmc_umask #define pmc_thres pmc_ita2_counter_reg.pmc_thres #define pmc_ism pmc_ita2_counter_reg.pmc_ism static char * pfm_ita2_get_event_name(unsigned int i); /* * Description of the PMC register mappings use by * this module (as reported in pfmlib_reg_t.reg_num): * * 0 -> PMC0 * 1 -> PMC1 * n -> PMCn * * The following are in the model specific rr_br[]: * IBR0 -> 0 * IBR1 -> 1 * ... * IBR7 -> 7 * DBR0 -> 0 * DBR1 -> 1 * ... * DBR7 -> 7 * * We do not use a mapping table, instead we make up the * values on the fly given the base. */ /* * The Itanium2 PMU has a bug in the fine mode implementation. * It only sees ranges with a granularity of two bundles. * So we prepare for the day they fix it. */ static int has_fine_mode_bug; static int pfm_ita2_detect(void) { int tmp; int ret = PFMLIB_ERR_NOTSUPP; tmp = pfm_ia64_get_cpu_family(); if (tmp == 0x1f) { has_fine_mode_bug = 1; ret = PFMLIB_SUCCESS; } return ret; } /* * Check the event for incompatibilities. This is useful * for L1 and L2 related events. Due to wire limitations, * some caches events are separated into sets. There * are 5 sets for the L1D cache group and 6 sets for L2 group. * It is NOT possible to simultaneously measure events from * differents sets within a group. For instance, you cannot * measure events from set0 and set1 in L1D cache group. However * it is possible to measure set0 in L1D and set1 in L2 at the same * time. * * This function verifies that the set constraint are respected. */ static int check_cross_groups_and_umasks(pfmlib_input_param_t *inp) { unsigned long ref_umask, umask; int g, s; unsigned int cnt = inp->pfp_event_count; pfmlib_event_t *e = inp->pfp_events; unsigned int i, j; /* * XXX: could possibly be optimized */ for (i=0; i < cnt; i++) { g = evt_grp(e[i].event); s = evt_set(e[i].event); if (g == PFMLIB_ITA2_EVT_NO_GRP) continue; ref_umask = evt_umask(e[i].event); for (j=i+1; j < cnt; j++) { if (evt_grp(e[j].event) != g) continue; if (evt_set(e[j].event) != s) return PFMLIB_ERR_EVTSET; /* only care about L2 cache group */ if (g != PFMLIB_ITA2_EVT_L2_CACHE_GRP || (s == 1 || s == 2)) continue; umask = evt_umask(e[j].event); /* * there is no assignement possible if the event in PMC4 * has a umask (ref_umask) and an event (from the same * set) also has a umask AND it is different. For some * sets, the umasks are shared, therefore the value * programmed into PMC4 determines the umask for all * the other events (with umask) from the set. */ if (umask && ref_umask != umask) return PFMLIB_ERR_NOASSIGN; } } return PFMLIB_SUCCESS; } /* * Certain prefetch events must be treated specially when instruction range restriction * is in use because they can only be constrained by IBRP1 in fine-mode. Other events * will use IBRP0 if tagged as a demand fetch OR IBPR1 if tagged as a prefetch match. * From the library's point of view there is no way of distinguishing this, so we leave * it up to the user to interpret the results. * * Events which can be qualified by the two pairs depending on their tag: * - IBP_BUNPAIRS_IN * - L1I_FETCH_RAB_HIT * - L1I_FETCH_ISB_HIT * - L1I_FILLS * * This function returns the number of qualifying prefetch events found * * XXX: not clear which events do qualify as prefetch events. */ static int prefetch_events[]={ PME_ITA2_L1I_PREFETCHES, PME_ITA2_L1I_STRM_PREFETCHES, PME_ITA2_L2_INST_PREFETCHES }; #define NPREFETCH_EVENTS sizeof(prefetch_events)/sizeof(int) static int check_prefetch_events(pfmlib_input_param_t *inp) { int code; int prefetch_codes[NPREFETCH_EVENTS]; unsigned int i, j, count; int c; int found = 0; for(i=0; i < NPREFETCH_EVENTS; i++) { pfm_get_event_code(prefetch_events[i], &code); prefetch_codes[i] = code; } count = inp->pfp_event_count; for(i=0; i < count; i++) { pfm_get_event_code(inp->pfp_events[i].event, &c); for(j=0; j < NPREFETCH_EVENTS; j++) { if (c == prefetch_codes[j]) found++; } } return found; } /* * IA64_INST_RETIRED (and subevents) is the only event which can be measured on all * 4 IBR when non-fine mode is not possible. * * This function returns: * - the number of events matching the IA64_INST_RETIRED code * - in retired_mask the bottom 4 bits indicates which of the 4 INST_RETIRED event * is present */ static unsigned int check_inst_retired_events(pfmlib_input_param_t *inp, unsigned long *retired_mask) { int code; int c; unsigned int i, count, found = 0; unsigned long umask, mask; pfm_get_event_code(PME_ITA2_IA64_INST_RETIRED_THIS, &code); count = inp->pfp_event_count; mask = 0; for(i=0; i < count; i++) { pfm_get_event_code(inp->pfp_events[i].event, &c); if (c == code) { pfm_ita2_get_event_umask(inp->pfp_events[i].event, &umask); switch(umask) { case 0: mask |= 1; break; case 1: mask |= 2; break; case 2: mask |= 4; break; case 3: mask |= 8; break; } found++; } } if (retired_mask) *retired_mask = mask; return found; } static int check_fine_mode_possible(pfmlib_ita2_input_rr_t *rr, int n) { pfmlib_ita2_input_rr_desc_t *lim = rr->rr_limits; int i; for(i=0; i < n; i++) { if ((lim[i].rr_start & FINE_MODE_MASK) != (lim[i].rr_end & FINE_MODE_MASK)) return 0; } return 1; } /* * mode = 0 -> check code (enforce bundle alignment) * mode = 1 -> check data */ static int check_intervals(pfmlib_ita2_input_rr_t *irr, int mode, unsigned int *n_intervals) { unsigned int i; pfmlib_ita2_input_rr_desc_t *lim = irr->rr_limits; for(i=0; i < 4; i++) { /* end marker */ if (lim[i].rr_start == 0 && lim[i].rr_end == 0) break; /* invalid entry */ if (lim[i].rr_start >= lim[i].rr_end) return PFMLIB_ERR_IRRINVAL; if (mode == 0 && (lim[i].rr_start & 0xf || lim[i].rr_end & 0xf)) return PFMLIB_ERR_IRRALIGN; } *n_intervals = i; return PFMLIB_SUCCESS; } static int valid_assign(pfmlib_event_t *e, unsigned int *as, pfmlib_regmask_t *r_pmcs, unsigned int cnt) { unsigned long pmc4_umask = 0, umask; char *name; int l1_grp_present = 0, l2_grp_present = 0; unsigned int i; int c, failure; int need_pmc5, need_pmc4; int pmc5_evt = -1, pmc4_evt = -1; if (PFMLIB_DEBUG()) { unsigned int j; for(j=0;jpfp_event_count; for(i=0; i < count; i++) { for (j=0; j < NCANCEL_EVENTS; j++) { pfm_get_event_code(inp->pfp_events[i].event, &code); if (code == cancel_codes[j]) { if (idx != -1) { return PFMLIB_ERR_INVAL; } idx = inp->pfp_events[i].event; } } } return PFMLIB_SUCCESS; } /* * Automatically dispatch events to corresponding counters following constraints. * Upon return the pfarg_regt structure is ready to be submitted to kernel */ static int pfm_ita2_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp) { #define has_counter(e,b) (itanium2_pe[e].pme_counters & (1 << (b)) ? (b) : 0) pfmlib_ita2_input_param_t *param = mod_in; pfm_ita2_pmc_reg_t reg; pfmlib_event_t *e; pfmlib_reg_t *pc, *pd; pfmlib_regmask_t *r_pmcs; unsigned int i,j,k,l; int ret; unsigned int max_l0, max_l1, max_l2, max_l3; unsigned int assign[PMU_ITA2_NUM_COUNTERS]; unsigned int m, cnt; e = inp->pfp_events; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; cnt = inp->pfp_event_count; r_pmcs = &inp->pfp_unavail_pmcs; if (PFMLIB_DEBUG()) for (m=0; m < cnt; m++) { DPRINT("ev[%d]=%s counters=0x%lx\n", m, itanium2_pe[e[m].event].pme_name, itanium2_pe[e[m].event].pme_counters); } if (cnt > PMU_ITA2_NUM_COUNTERS) return PFMLIB_ERR_TOOMANY; ret = check_cross_groups_and_umasks(inp); if (ret != PFMLIB_SUCCESS) return ret; ret = check_cancel_events(inp); if (ret != PFMLIB_SUCCESS) return ret; max_l0 = PMU_ITA2_FIRST_COUNTER + PMU_ITA2_NUM_COUNTERS; max_l1 = PMU_ITA2_FIRST_COUNTER + PMU_ITA2_NUM_COUNTERS*(cnt>1); max_l2 = PMU_ITA2_FIRST_COUNTER + PMU_ITA2_NUM_COUNTERS*(cnt>2); max_l3 = PMU_ITA2_FIRST_COUNTER + PMU_ITA2_NUM_COUNTERS*(cnt>3); DPRINT("max_l0=%u max_l1=%u max_l2=%u max_l3=%u\n", max_l0, max_l1, max_l2, max_l3); /* * For now, worst case in the loop nest: 4! (factorial) */ for (i=PMU_ITA2_FIRST_COUNTER; i < max_l0; i++) { assign[0] = has_counter(e[0].event,i); if (max_l1 == PMU_ITA2_FIRST_COUNTER && valid_assign(e, assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (j=PMU_ITA2_FIRST_COUNTER; j < max_l1; j++) { if (j == i) continue; assign[1] = has_counter(e[1].event,j); if (max_l2 == PMU_ITA2_FIRST_COUNTER && valid_assign(e, assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (k=PMU_ITA2_FIRST_COUNTER; k < max_l2; k++) { if(k == i || k == j) continue; assign[2] = has_counter(e[2].event,k); if (max_l3 == PMU_ITA2_FIRST_COUNTER && valid_assign(e, assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (l=PMU_ITA2_FIRST_COUNTER; l < max_l3; l++) { if(l == i || l == j || l == k) continue; assign[3] = has_counter(e[3].event,l); if (valid_assign(e, assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; } } } } /* we cannot satisfy the constraints */ return PFMLIB_ERR_NOASSIGN; done: for (j=0; j < cnt ; j++ ) { reg.pmc_val = 0; /* clear all, bits 26-27 must be zero for proper operations */ /* if plm is 0, then assume not specified per-event and use default */ reg.pmc_plm = inp->pfp_events[j].plm ? inp->pfp_events[j].plm : inp->pfp_dfl_plm; reg.pmc_oi = 1; /* overflow interrupt */ reg.pmc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc_thres = param ? param->pfp_ita2_counters[j].thres: 0; reg.pmc_ism = param ? param->pfp_ita2_counters[j].ism : PFMLIB_ITA2_ISM_BOTH; reg.pmc_umask = is_ear(e[j].event) ? 0x0 : itanium2_pe[e[j].event].pme_umask; reg.pmc_es = itanium2_pe[e[j].event].pme_code; /* * Note that we don't force PMC4.pmc_ena = 1 because the kernel takes care of this for us. * This way we don't have to program something in PMC4 even when we don't use it */ pc[j].reg_num = assign[j]; pc[j].reg_value = reg.pmc_val; pc[j].reg_addr = pc[j].reg_alt_addr = assign[j]; pd[j].reg_num = assign[j]; pd[j].reg_addr = pd[j].reg_addr = assign[j]; __pfm_vbprintf("[PMC%u(pmc%u)=0x%06lx thres=%d es=0x%02x plm=%d umask=0x%x pm=%d ism=0x%x oi=%d] %s\n", assign[j], assign[j], reg.pmc_val, reg.pmc_thres, reg.pmc_es,reg.pmc_plm, reg.pmc_umask, reg.pmc_pm, reg.pmc_ism, reg.pmc_oi, itanium2_pe[e[j].event].pme_name); __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[j].reg_num, pd[j].reg_num); } /* number of PMC registers programmed */ outp->pfp_pmc_count = cnt; outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } static int pfm_dispatch_iear(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_ita2_pmc_reg_t reg; pfmlib_ita2_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_ita2_input_param_t fake_param; unsigned int pos1, pos2; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_iear(inp->pfp_events[i].event)) break; } if (param == NULL || param->pfp_ita2_iear.ear_used == 0) { /* * case 3: no I-EAR event, no (or nothing) in param->pfp_ita2_iear.ear_used */ if (i == count) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; /* * case 1: extract all information for event (name) */ pfm_ita2_get_ear_mode(inp->pfp_events[i].event, ¶m->pfp_ita2_iear.ear_mode); param->pfp_ita2_iear.ear_umask = evt_umask(inp->pfp_events[i].event); param->pfp_ita2_iear.ear_ism = PFMLIB_ITA2_ISM_BOTH; /* force both instruction sets */ DPRINT("I-EAR event with no info\n"); } /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running I-EAR), use param info */ reg.pmc_val = 0; if (param->pfp_ita2_iear.ear_mode == PFMLIB_ITA2_EAR_TLB_MODE) { /* if plm is 0, then assume not specified per-event and use default */ reg.pmc10_ita2_tlb_reg.iear_plm = param->pfp_ita2_iear.ear_plm ? param->pfp_ita2_iear.ear_plm : inp->pfp_dfl_plm; reg.pmc10_ita2_tlb_reg.iear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc10_ita2_tlb_reg.iear_ct = 0x0; reg.pmc10_ita2_tlb_reg.iear_umask = param->pfp_ita2_iear.ear_umask; reg.pmc10_ita2_tlb_reg.iear_ism = param->pfp_ita2_iear.ear_ism; } else if (param->pfp_ita2_iear.ear_mode == PFMLIB_ITA2_EAR_CACHE_MODE) { /* if plm is 0, then assume not specified per-event and use default */ reg.pmc10_ita2_cache_reg.iear_plm = param->pfp_ita2_iear.ear_plm ? param->pfp_ita2_iear.ear_plm : inp->pfp_dfl_plm; reg.pmc10_ita2_cache_reg.iear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc10_ita2_cache_reg.iear_ct = 0x1; reg.pmc10_ita2_cache_reg.iear_umask = param->pfp_ita2_iear.ear_umask; reg.pmc10_ita2_cache_reg.iear_ism = param->pfp_ita2_iear.ear_ism; } else { DPRINT("ALAT mode not supported in I-EAR mode\n"); return PFMLIB_ERR_INVAL; } if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 10)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 10; /* PMC10 is I-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 10; pos1++; pd[pos2].reg_num = 0; pd[pos2].reg_addr = pd[pos2].reg_alt_addr= 0; pos2++; pd[pos2].reg_num = 1; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 1; pos2++; if (param->pfp_ita2_iear.ear_mode == PFMLIB_ITA2_EAR_TLB_MODE) { __pfm_vbprintf("[PMC10(pmc10)=0x%lx ctb=tlb plm=%d pm=%d ism=0x%x umask=0x%x]\n", reg.pmc_val, reg.pmc10_ita2_tlb_reg.iear_plm, reg.pmc10_ita2_tlb_reg.iear_pm, reg.pmc10_ita2_tlb_reg.iear_ism, reg.pmc10_ita2_tlb_reg.iear_umask); } else { __pfm_vbprintf("[PMC10(pmc10)=0x%lx ctb=cache plm=%d pm=%d ism=0x%x umask=0x%x]\n", reg.pmc_val, reg.pmc10_ita2_cache_reg.iear_plm, reg.pmc10_ita2_cache_reg.iear_pm, reg.pmc10_ita2_cache_reg.iear_ism, reg.pmc10_ita2_cache_reg.iear_umask); } __pfm_vbprintf("[PMD0(pmd0)]\n[PMD1(pmd1)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_dear(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_ita2_pmc_reg_t reg; pfmlib_ita2_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_ita2_input_param_t fake_param; unsigned int pos1, pos2; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_dear(inp->pfp_events[i].event)) break; } if (param == NULL || param->pfp_ita2_dear.ear_used == 0) { /* * case 3: no D-EAR event, no (or nothing) in param->pfp_ita2_dear.ear_used */ if (i == count) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; /* * case 1: extract all information for event (name) */ pfm_ita2_get_ear_mode(inp->pfp_events[i].event, ¶m->pfp_ita2_dear.ear_mode); param->pfp_ita2_dear.ear_umask = evt_umask(inp->pfp_events[i].event); param->pfp_ita2_dear.ear_ism = PFMLIB_ITA2_ISM_BOTH; /* force both instruction sets */ DPRINT("D-EAR event with no info\n"); } /* sanity check on the mode */ if ( param->pfp_ita2_dear.ear_mode != PFMLIB_ITA2_EAR_CACHE_MODE && param->pfp_ita2_dear.ear_mode != PFMLIB_ITA2_EAR_TLB_MODE && param->pfp_ita2_dear.ear_mode != PFMLIB_ITA2_EAR_ALAT_MODE) return PFMLIB_ERR_INVAL; /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running D-EAR), use param info */ reg.pmc_val = 0; /* if plm is 0, then assume not specified per-event and use default */ reg.pmc11_ita2_reg.dear_plm = param->pfp_ita2_dear.ear_plm ? param->pfp_ita2_dear.ear_plm : inp->pfp_dfl_plm; reg.pmc11_ita2_reg.dear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc11_ita2_reg.dear_mode = param->pfp_ita2_dear.ear_mode; reg.pmc11_ita2_reg.dear_umask = param->pfp_ita2_dear.ear_umask; reg.pmc11_ita2_reg.dear_ism = param->pfp_ita2_dear.ear_ism; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 11)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 11; /* PMC11 is D-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 11; pos1++; pd[pos2].reg_num = 2; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 2; pos2++; pd[pos2].reg_num = 3; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 3; pos2++; pd[pos2].reg_num = 17; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 17; pos2++; __pfm_vbprintf("[PMC11(pmc11)=0x%lx mode=%s plm=%d pm=%d ism=0x%x umask=0x%x]\n", reg.pmc_val, reg.pmc11_ita2_reg.dear_mode == 0 ? "L1D" : (reg.pmc11_ita2_reg.dear_mode == 1 ? "L1DTLB" : "ALAT"), reg.pmc11_ita2_reg.dear_plm, reg.pmc11_ita2_reg.dear_pm, reg.pmc11_ita2_reg.dear_ism, reg.pmc11_ita2_reg.dear_umask); __pfm_vbprintf("[PMD2(pmd2)]\n[PMD3(pmd3)\nPMD17(pmd17)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_opcm(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out) { pfmlib_ita2_input_param_t *param = mod_in; pfmlib_reg_t *pc = outp->pfp_pmcs; pfm_ita2_pmc_reg_t reg, pmc15; unsigned int i, has_1st_pair, has_2nd_pair, count; unsigned int pos = outp->pfp_pmc_count; if (param == NULL) return PFMLIB_SUCCESS; /* not constrained by PMC8 nor PMC9 */ pmc15.pmc_val = 0xffffffff; /* XXX: use PAL instead. PAL value is 0xfffffff0 */ if (param->pfp_ita2_irange.rr_used && mod_out == NULL) return PFMLIB_ERR_INVAL; if (param->pfp_ita2_pmc8.opcm_used || (param->pfp_ita2_irange.rr_used && mod_out->pfp_ita2_irange.rr_nbr_used!=0) ) { reg.pmc_val = param->pfp_ita2_pmc8.opcm_used ? param->pfp_ita2_pmc8.pmc_val : 0xffffffff3fffffff; if (param->pfp_ita2_irange.rr_used) { reg.pmc8_9_ita2_reg.opcm_ig_ad = 0; reg.pmc8_9_ita2_reg.opcm_inv = param->pfp_ita2_irange.rr_flags & PFMLIB_ITA2_RR_INV ? 1 : 0; } else { /* clear range restriction fields when none is used */ reg.pmc8_9_ita2_reg.opcm_ig_ad = 1; reg.pmc8_9_ita2_reg.opcm_inv = 0; } /* force bit 2 to 1 */ reg.pmc8_9_ita2_reg.opcm_bit2 = 1; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 8)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 8; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = pc[pos].reg_addr = 8; pos++; /* * will be constrained by PMC8 */ if (param->pfp_ita2_pmc8.opcm_used) { has_1st_pair = has_2nd_pair = 0; count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].event == PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP0_PMC8) has_1st_pair=1; if (inp->pfp_events[i].event == PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP2_PMC8) has_2nd_pair=1; } if (has_1st_pair || has_2nd_pair == 0) pmc15.pmc15_ita2_reg.opcmc_ibrp0_pmc8 = 0; if (has_2nd_pair || has_1st_pair == 0) pmc15.pmc15_ita2_reg.opcmc_ibrp2_pmc8 = 0; } __pfm_vbprintf("[PMC8(pmc8)=0x%lx m=%d i=%d f=%d b=%d match=0x%x mask=0x%x inv=%d ig_ad=%d]\n", reg.pmc_val, reg.pmc8_9_ita2_reg.opcm_m, reg.pmc8_9_ita2_reg.opcm_i, reg.pmc8_9_ita2_reg.opcm_f, reg.pmc8_9_ita2_reg.opcm_b, reg.pmc8_9_ita2_reg.opcm_match, reg.pmc8_9_ita2_reg.opcm_mask, reg.pmc8_9_ita2_reg.opcm_inv, reg.pmc8_9_ita2_reg.opcm_ig_ad); } if (param->pfp_ita2_pmc9.opcm_used) { /* * PMC9 can only be used to qualify IA64_INST_RETIRED_* events */ if (check_inst_retired_events(inp, NULL) != inp->pfp_event_count) return PFMLIB_ERR_FEATCOMB; reg.pmc_val = param->pfp_ita2_pmc9.pmc_val; /* ig_ad, inv are ignored for PMC9, to avoid confusion we force default values */ reg.pmc8_9_ita2_reg.opcm_ig_ad = 1; reg.pmc8_9_ita2_reg.opcm_inv = 0; /* force bit 2 to 1 */ reg.pmc8_9_ita2_reg.opcm_bit2 = 1; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 9)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 9; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 9; pos++; /* * will be constrained by PMC9 */ has_1st_pair = has_2nd_pair = 0; count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].event == PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP1_PMC9) has_1st_pair=1; if (inp->pfp_events[i].event == PME_ITA2_IA64_TAGGED_INST_RETIRED_IBRP3_PMC9) has_2nd_pair=1; } if (has_1st_pair || has_2nd_pair == 0) pmc15.pmc15_ita2_reg.opcmc_ibrp1_pmc9 = 0; if (has_2nd_pair || has_1st_pair == 0) pmc15.pmc15_ita2_reg.opcmc_ibrp3_pmc9 = 0; __pfm_vbprintf("[PMC9(pmc9)=0x%lx m=%d i=%d f=%d b=%d match=0x%x mask=0x%x]\n", reg.pmc_val, reg.pmc8_9_ita2_reg.opcm_m, reg.pmc8_9_ita2_reg.opcm_i, reg.pmc8_9_ita2_reg.opcm_f, reg.pmc8_9_ita2_reg.opcm_b, reg.pmc8_9_ita2_reg.opcm_match, reg.pmc8_9_ita2_reg.opcm_mask); } if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 15)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 15; pc[pos].reg_value = pmc15.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 15; pos++; __pfm_vbprintf("[PMC15(pmc15)=0x%lx ibrp0_pmc8=%d ibrp1_pmc9=%d ibrp2_pmc8=%d ibrp3_pmc9=%d]\n", pmc15.pmc_val, pmc15.pmc15_ita2_reg.opcmc_ibrp0_pmc8, pmc15.pmc15_ita2_reg.opcmc_ibrp1_pmc9, pmc15.pmc15_ita2_reg.opcmc_ibrp2_pmc8, pmc15.pmc15_ita2_reg.opcmc_ibrp3_pmc9); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int pfm_dispatch_btb(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfmlib_event_t *e= inp->pfp_events; pfm_ita2_pmc_reg_t reg; pfmlib_ita2_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_ita2_input_param_t fake_param; int found_btb = 0, found_bad_dear = 0; int has_btb_param; unsigned int i, pos1, pos2; unsigned int count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; /* * explicit BTB settings */ has_btb_param = param && param->pfp_ita2_btb.btb_used; reg.pmc_val = 0UL; /* * we need to scan all events looking for DEAR ALAT/TLB due to incompatibility */ count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_btb(e[i].event)) found_btb = 1; /* * keep track of the first BTB event */ /* look only for DEAR TLB */ if (is_dear(e[i].event) && (is_ear_tlb(e[i].event) || is_ear_alat(e[i].event))) { found_bad_dear = 1; } } DPRINT("found_btb=%d found_bar_dear=%d\n", found_btb, found_bad_dear); /* * did not find D-EAR TLB/ALAT event, need to check param structure */ if (found_bad_dear == 0 && param && param->pfp_ita2_dear.ear_used == 1) { if ( param->pfp_ita2_dear.ear_mode == PFMLIB_ITA2_EAR_TLB_MODE || param->pfp_ita2_dear.ear_mode == PFMLIB_ITA2_EAR_ALAT_MODE) found_bad_dear = 1; } /* * no explicit BTB event and no special case to deal with (cover part of case 3) */ if (found_btb == 0 && has_btb_param == 0 && found_bad_dear == 0) return PFMLIB_SUCCESS; if (has_btb_param == 0) { /* * case 3: no BTB event, btb_used=0 but found_bad_dear=1, need to cleanup PMC12 */ if (found_btb == 0) goto assign_zero; /* * case 1: we have a BTB event but no param, default setting is to capture * all branches. */ memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; param->pfp_ita2_btb.btb_ds = 0; /* capture branch targets */ param->pfp_ita2_btb.btb_tm = 0x3; /* all branches */ param->pfp_ita2_btb.btb_ptm = 0x3; /* all branches */ param->pfp_ita2_btb.btb_ppm = 0x3; /* all branches */ param->pfp_ita2_btb.btb_brt = 0x0; /* all branches */ DPRINT("BTB event with no info\n"); } /* * case 2: BTB event in the list, param provided * case 4: no BTB event, param provided (free running mode) */ reg.pmc12_ita2_reg.btbc_plm = param->pfp_ita2_btb.btb_plm ? param->pfp_ita2_btb.btb_plm : inp->pfp_dfl_plm; reg.pmc12_ita2_reg.btbc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc12_ita2_reg.btbc_ds = param->pfp_ita2_btb.btb_ds & 0x1; reg.pmc12_ita2_reg.btbc_tm = param->pfp_ita2_btb.btb_tm & 0x3; reg.pmc12_ita2_reg.btbc_ptm = param->pfp_ita2_btb.btb_ptm & 0x3; reg.pmc12_ita2_reg.btbc_ppm = param->pfp_ita2_btb.btb_ppm & 0x3; reg.pmc12_ita2_reg.btbc_brt = param->pfp_ita2_btb.btb_brt & 0x3; /* * if DEAR-ALAT or DEAR-TLB is set then PMC12 must be set to zero (see documentation p. 87) * * D-EAR ALAT/TLB and BTB cannot be used at the same time. * From documentation: PMC12 must be zero in this mode; else the wrong IP for misses * coming right after a mispredicted branch. * * D-EAR cache is fine. */ assign_zero: if (found_bad_dear && reg.pmc_val != 0UL) return PFMLIB_ERR_EVTINCOMP; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 12)) return PFMLIB_ERR_NOASSIGN; memset(pc+pos1, 0, sizeof(pfmlib_reg_t)); pc[pos1].reg_num = 12; pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 12; pos1++; __pfm_vbprintf("[PMC12(pmc12)=0x%lx plm=%d pm=%d ds=%d tm=%d ptm=%d ppm=%d brt=%d]\n", reg.pmc_val, reg.pmc12_ita2_reg.btbc_plm, reg.pmc12_ita2_reg.btbc_pm, reg.pmc12_ita2_reg.btbc_ds, reg.pmc12_ita2_reg.btbc_tm, reg.pmc12_ita2_reg.btbc_ptm, reg.pmc12_ita2_reg.btbc_ppm, reg.pmc12_ita2_reg.btbc_brt); /* * only add BTB PMD when actually using BTB. * Not needed when dealing with D-EAR TLB and DEAR-ALAT * PMC12 restriction */ if (found_btb || has_btb_param) { /* * PMD16 is included in list of used PMD */ for(i=8; i < 17; i++, pos2++) { pd[pos2].reg_num = i; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = i; __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[pos2].reg_num, pd[pos2].reg_num); } } /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static void do_normal_rr(unsigned long start, unsigned long end, pfmlib_reg_t *br, int nbr, int dir, int *idx, int *reg_idx, int plm) { unsigned long size, l_addr, c; unsigned long l_offs = 0, r_offs = 0; unsigned long l_size, r_size; dbreg_t db; int p2; if (nbr < 1 || end <= start) return; size = end - start; DPRINT("start=0x%016lx end=0x%016lx size=0x%lx bytes (%lu bundles) nbr=%d dir=%d\n", start, end, size, size >> 4, nbr, dir); p2 = pfm_ia64_fls(size); c = ALIGN_DOWN(end, p2); DPRINT("largest power of two possible: 2^%d=0x%lx, crossing=0x%016lx\n", p2, 1UL << p2, c); if ((c - (1UL<= start) { l_addr = c - (1UL << p2); } else { p2--; if ((c + (1UL<>l_offs: 0x%lx\n", l_offs); } } else if (dir == 1 && r_size != 0 && nbr == 1) { p2++; l_addr = start; if (PFMLIB_DEBUG()) { r_offs = l_addr+(1UL<>r_offs: 0x%lx\n", r_offs); } } l_size = l_addr - start; r_size = end - l_addr-(1UL<>largest chunk: 2^%d @0x%016lx-0x%016lx\n", p2, l_addr, l_addr+(1UL<>before: 0x%016lx-0x%016lx\n", start, l_addr); if (r_size && !r_offs) printf(">>after : 0x%016lx-0x%016lx\n", l_addr+(1UL<>1; if (nbr & 0x1) { /* * our simple heuristic is: * we assign the largest number of registers to the largest * of the two chunks */ if (l_size > r_size) { l_nbr++; } else { r_nbr++; } } do_normal_rr(start, l_addr, br, l_nbr, 0, idx, reg_idx, plm); do_normal_rr(l_addr+(1UL<rr_start, in_rr->rr_end, n_pairs, fine_mode ? ", fine_mode" : "", rr_flags & PFMLIB_ITA2_RR_INV ? ", inversed" : ""); __pfm_vbprintf("start offset: -0x%lx end_offset: +0x%lx\n", out_rr->rr_soff, out_rr->rr_eoff); for (j=0; j < n_pairs; j++, base_idx+=2) { d.val = dbr[base_idx+1].reg_value; r_end = dbr[base_idx].reg_value+((~(d.db.db_mask)) & ~(0xffUL << 56)); if (fine_mode) __pfm_vbprintf("brp%u: db%u: 0x%016lx db%u: plm=0x%x mask=0x%016lx\n", dbr[base_idx].reg_num>>1, dbr[base_idx].reg_num, dbr[base_idx].reg_value, dbr[base_idx+1].reg_num, d.db.db_plm, d.db.db_mask); else __pfm_vbprintf("brp%u: db%u: 0x%016lx db%u: plm=0x%x mask=0x%016lx end=0x%016lx\n", dbr[base_idx].reg_num>>1, dbr[base_idx].reg_num, dbr[base_idx].reg_value, dbr[base_idx+1].reg_num, d.db.db_plm, d.db.db_mask, r_end); } } /* * base_idx = base register index to use (for IBRP1, base_idx = 2) */ static int compute_fine_rr(pfmlib_ita2_input_rr_t *irr, int dfl_plm, int n, int *base_idx, pfmlib_ita2_output_rr_t *orr) { int i; pfmlib_reg_t *br; pfmlib_ita2_input_rr_desc_t *in_rr; pfmlib_ita2_output_rr_desc_t *out_rr; unsigned long addr; int reg_idx; dbreg_t db; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; reg_idx = *base_idx; db.val = 0; db.db.db_mask = FINE_MODE_MASK; if (n > 2) return PFMLIB_ERR_IRRTOOMANY; for (i=0; i < n; i++, reg_idx += 2, in_rr++, br+= 4) { /* * setup lower limit pair * * because of the PMU bug, we must align down to the closest bundle-pair * aligned address. 5 => 32-byte aligned address */ addr = has_fine_mode_bug ? ALIGN_DOWN(in_rr->rr_start, 5) : in_rr->rr_start; out_rr->rr_soff = in_rr->rr_start - addr; /* * adjust plm for each range */ db.db.db_plm = in_rr->rr_plm ? in_rr->rr_plm : (unsigned long)dfl_plm; br[0].reg_num = reg_idx; br[0].reg_value = addr; br[0].reg_addr = br[0].reg_alt_addr = reg_idx; br[1].reg_num = reg_idx+1; br[1].reg_value = db.val; br[1].reg_addr = br[1].reg_alt_addr = reg_idx+1; /* * setup upper limit pair * * * In fine mode, the bundle address stored in the upper limit debug * registers is included in the count, so we substract 0x10 to exclude it. * * because of the PMU bug, we align the (corrected) end to the nearest * 32-byte aligned address + 0x10. With this correction and depending * on the correction, we may count one * * */ addr = in_rr->rr_end - 0x10; if (has_fine_mode_bug && (addr & 0x1f) == 0) addr += 0x10; out_rr->rr_eoff = addr - in_rr->rr_end + 0x10; br[2].reg_num = reg_idx+4; br[2].reg_value = addr; br[2].reg_addr = br[2].reg_alt_addr = reg_idx+4; br[3].reg_num = reg_idx+5; br[3].reg_value = db.val; br[3].reg_addr = br[3].reg_alt_addr = reg_idx+5; if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, 0, 2, 1, irr->rr_flags); } orr->rr_nbr_used += i<<2; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } /* * base_idx = base register index to use (for IBRP1, base_idx = 2) */ static int compute_single_rr(pfmlib_ita2_input_rr_t *irr, int dfl_plm, int *base_idx, pfmlib_ita2_output_rr_t *orr) { unsigned long size, end, start; unsigned long p_start, p_end; pfmlib_ita2_input_rr_desc_t *in_rr; pfmlib_ita2_output_rr_desc_t *out_rr; pfmlib_reg_t *br; dbreg_t db; int reg_idx; int l, m; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; start = in_rr->rr_start; end = in_rr->rr_end; size = end - start; reg_idx = *base_idx; l = pfm_ia64_fls(size); m = l; if (size & ((1UL << l)-1)) { if (l>62) { printf("range: [0x%lx-0x%lx] too big\n", start, end); return PFMLIB_ERR_IRRTOOBIG; } m++; } DPRINT("size=%ld, l=%d m=%d, internal: 0x%lx full: 0x%lx\n", size, l, m, 1UL << l, 1UL << m); for (; m < 64; m++) { p_start = ALIGN_DOWN(start, m); p_end = p_start+(1UL<= end) goto found; } return PFMLIB_ERR_IRRINVAL; found: DPRINT("m=%d p_start=0x%lx p_end=0x%lx\n", m, p_start,p_end); /* when the event is not IA64_INST_RETIRED, then we MUST use ibrp0 */ br[0].reg_num = reg_idx; br[0].reg_value = p_start; br[0].reg_addr = br[0].reg_alt_addr = reg_idx; db.val = 0; db.db.db_mask = ~((1UL << m)-1); db.db.db_plm = in_rr->rr_plm ? in_rr->rr_plm : (unsigned long)dfl_plm; br[1].reg_num = reg_idx + 1; br[1].reg_value = db.val; br[1].reg_addr = br[1].reg_alt_addr = reg_idx + 1; out_rr->rr_soff = start - p_start; out_rr->rr_eoff = p_end - end; if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, 0, 1, 0, irr->rr_flags); orr->rr_nbr_used += 2; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } static int compute_normal_rr(pfmlib_ita2_input_rr_t *irr, int dfl_plm, int n, int *base_idx, pfmlib_ita2_output_rr_t *orr) { pfmlib_ita2_input_rr_desc_t *in_rr; pfmlib_ita2_output_rr_desc_t *out_rr; unsigned long r_end; pfmlib_reg_t *br; dbreg_t d; int i, j; int br_index, reg_idx, prev_index; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; reg_idx = *base_idx; br_index = 0; for (i=0; i < n; i++, in_rr++, out_rr++) { /* * running out of registers */ if (br_index == 8) break; prev_index = br_index; do_normal_rr( in_rr->rr_start, in_rr->rr_end, br, 4 - (reg_idx>>1), /* how many pairs available */ 0, &br_index, ®_idx, in_rr->rr_plm ? in_rr->rr_plm : dfl_plm); DPRINT("br_index=%d reg_idx=%d\n", br_index, reg_idx); /* * compute offsets */ out_rr->rr_soff = out_rr->rr_eoff = 0; for(j=prev_index; j < br_index; j+=2) { d.val = br[j+1].reg_value; r_end = br[j].reg_value+((~(d.db.db_mask)+1) & ~(0xffUL << 56)); if (br[j].reg_value <= in_rr->rr_start) out_rr->rr_soff = in_rr->rr_start - br[j].reg_value; if (r_end >= in_rr->rr_end) out_rr->rr_eoff = r_end - in_rr->rr_end; } if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, prev_index, (br_index-prev_index)>>1, 0, irr->rr_flags); } /* do not have enough registers to cover all the ranges */ if (br_index == 8 && i < n) return PFMLIB_ERR_TOOMANY; orr->rr_nbr_used += br_index; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } static int pfm_dispatch_irange(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out) { pfm_ita2_pmc_reg_t reg; pfmlib_ita2_input_param_t *param = mod_in; pfmlib_ita2_input_rr_t *irr; pfmlib_ita2_output_rr_t *orr; pfmlib_reg_t *pc = outp->pfp_pmcs; unsigned int i, pos = outp->pfp_pmc_count, count; int ret; unsigned int retired_only, retired_count, fine_mode, prefetch_count; unsigned int n_intervals; int base_idx = 0; unsigned long retired_mask; if (param == NULL) return PFMLIB_SUCCESS; if (param->pfp_ita2_irange.rr_used == 0) return PFMLIB_SUCCESS; if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_ita2_irange; orr = &mod_out->pfp_ita2_irange; ret = check_intervals(irr, 0, &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_IRRINVAL; retired_count = check_inst_retired_events(inp, &retired_mask); retired_only = retired_count == inp->pfp_event_count; prefetch_count = check_prefetch_events(inp); fine_mode = irr->rr_flags & PFMLIB_ITA2_RR_NO_FINE_MODE ? 0 : check_fine_mode_possible(irr, n_intervals); DPRINT("n_intervals=%d retired_only=%d retired_count=%d prefetch_count=%d fine_mode=%d\n", n_intervals, retired_only, retired_count, prefetch_count, fine_mode); /* * On Itanium2, there are more constraints on what can be measured with irange. * * - The fine mode is the best because you directly set the lower and upper limits of * the range. This uses 2 ibr pairs for range (ibrp0/ibrp2 and ibp1/ibrp3). Therefore * at most 2 fine mode ranges can be defined. There is a limit on the size and alignment * of the range to allow fine mode: the range must be less than 4KB in size AND the lower * and upper limits must NOT cross a 4KB page boundary. The fine mode works will all events. * * - if the fine mode fails, then for all events, except IA64_TAGGED_INST_RETIRED_*, only * the first pair of ibr is available: ibrp0. This imposes some severe restrictions on the * size and alignement of the range. It can be bigger than 4KB and must be properly aligned * on its size. The library relaxes these constraints by allowing the covered areas to be * larger than the expected range. It may start before and end after. You can determine how * far off the range is in either direction for each range by looking at the rr_soff (start * offset) and rr_eoff (end offset). * * - if the events include certain prefetch events then only IBRP1 can be used in fine mode * See 10.3.5.1 Exception 1. * * - Finally, when the events are ONLY IA64_TAGGED_INST_RETIRED_* then all IBR pairs can be used * to cover the range giving us more flexibility to approximate the range when it is not * properly aligned on its size (see 10.3.5.2 Exception 2). */ if (fine_mode == 0 && retired_only == 0 && n_intervals > 1) return PFMLIB_ERR_IRRTOOMANY; /* we do not default to non-fine mode to support more ranges */ if (n_intervals > 2 && fine_mode == 1) return PFMLIB_ERR_IRRTOOMANY; if (fine_mode == 0) { if (retired_only) { ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); } else { /* unless we have only prefetch and instruction retired events, * we cannot satisfy the request because the other events cannot * be measured on anything but IBRP0. */ if (prefetch_count && (prefetch_count+retired_count) != inp->pfp_event_count) return PFMLIB_ERR_FEATCOMB; base_idx = prefetch_count ? 2 : 0; ret = compute_single_rr(irr, inp->pfp_dfl_plm, &base_idx, orr); } } else { if (prefetch_count && n_intervals != 1) return PFMLIB_ERR_IRRTOOMANY; base_idx = prefetch_count ? 2 : 0; ret = compute_fine_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); } if (ret != PFMLIB_SUCCESS) { return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_IRRTOOMANY : ret; } reg.pmc_val = 0xdb6; /* default value */ count = orr->rr_nbr_used; for (i=0; i < count; i++) { switch(orr->rr_br[i].reg_num) { case 0: reg.pmc14_ita2_reg.iarc_ibrp0 = 0; break; case 2: reg.pmc14_ita2_reg.iarc_ibrp1 = 0; break; case 4: reg.pmc14_ita2_reg.iarc_ibrp2 = 0; break; case 6: reg.pmc14_ita2_reg.iarc_ibrp3 = 0; break; } } if (retired_only && (param->pfp_ita2_pmc8.opcm_used ||param->pfp_ita2_pmc9.opcm_used)) { /* * PMC8 + IA64_INST_RETIRED only works if irange on IBRP0 and/or IBRP2 * PMC9 + IA64_INST_RETIRED only works if irange on IBRP1 and/or IBRP3 */ count = orr->rr_nbr_used; for (i=0; i < count; i++) { if (orr->rr_br[i].reg_num == 0 && param->pfp_ita2_pmc9.opcm_used) return PFMLIB_ERR_FEATCOMB; if (orr->rr_br[i].reg_num == 2 && param->pfp_ita2_pmc8.opcm_used) return PFMLIB_ERR_FEATCOMB; if (orr->rr_br[i].reg_num == 4 && param->pfp_ita2_pmc9.opcm_used) return PFMLIB_ERR_FEATCOMB; if (orr->rr_br[i].reg_num == 6 && param->pfp_ita2_pmc8.opcm_used) return PFMLIB_ERR_FEATCOMB; } } if (fine_mode) { reg.pmc14_ita2_reg.iarc_fine = 1; } else if (retired_only) { /* * we need to check that the user provided all the events needed to cover * all the ibr pairs used to cover the range */ if ((retired_mask & 0x1) == 0 && reg.pmc14_ita2_reg.iarc_ibrp0 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x2) == 0 && reg.pmc14_ita2_reg.iarc_ibrp1 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x4) == 0 && reg.pmc14_ita2_reg.iarc_ibrp2 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x8) == 0 && reg.pmc14_ita2_reg.iarc_ibrp3 == 0) return PFMLIB_ERR_IRRINVAL; } /* initialize pmc request slot */ memset(pc+pos, 0, sizeof(pfmlib_reg_t)); if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 14)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 14; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 14; pos++; __pfm_vbprintf("[PMC14(pmc14)=0x%lx ibrp0=%d ibrp1=%d ibrp2=%d ibrp3=%d fine=%d]\n", reg.pmc_val, reg.pmc14_ita2_reg.iarc_ibrp0, reg.pmc14_ita2_reg.iarc_ibrp1, reg.pmc14_ita2_reg.iarc_ibrp2, reg.pmc14_ita2_reg.iarc_ibrp3, reg.pmc14_ita2_reg.iarc_fine); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static const unsigned long iod_tab[8]={ /* --- */ 3, /* --D */ 2, /* -O- */ 3, /* should not be used */ /* -OD */ 0, /* =IOD safe because default IBR is harmless */ /* I-- */ 1, /* =IO safe because by defaut OPC is turned off */ /* I-D */ 0, /* =IOD safe because by default opc is turned off */ /* IO- */ 1, /* IOD */ 0 }; /* * IMPORTANT: MUST BE CALLED *AFTER* pfm_dispatch_irange() to make sure we see * the irange programming to adjust pmc13. */ static int pfm_dispatch_drange(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out) { pfmlib_ita2_input_param_t *param = mod_in; pfmlib_reg_t *pc = outp->pfp_pmcs; pfmlib_ita2_input_rr_t *irr; pfmlib_ita2_output_rr_t *orr, *orr2; pfm_ita2_pmc_reg_t pmc13; pfm_ita2_pmc_reg_t pmc14; unsigned int i, pos = outp->pfp_pmc_count; int iod_codes[4], dfl_val_pmc8, dfl_val_pmc9; unsigned int n_intervals; int ret; int base_idx = 0; int fine_mode = 0; #define DR_USED 0x1 /* data range is used */ #define OP_USED 0x2 /* opcode matching is used */ #define IR_USED 0x4 /* code range is used */ if (param == NULL) return PFMLIB_SUCCESS; /* * if only pmc8/pmc9 opcode matching is used, we do not need to change * the default value of pmc13 regardless of the events being measured. */ if ( param->pfp_ita2_drange.rr_used == 0 && param->pfp_ita2_irange.rr_used == 0) return PFMLIB_SUCCESS; /* * it seems like the ignored bits need to have special values * otherwise this does not work. */ pmc13.pmc_val = 0x2078fefefefe; /* * initialize iod codes */ iod_codes[0] = iod_codes[1] = iod_codes[2] = iod_codes[3] = 0; /* * setup default iod value, we need to separate because * if drange is used we do not know in advance which DBR will be used * therefore we need to apply dfl_val later */ dfl_val_pmc8 = param->pfp_ita2_pmc8.opcm_used ? OP_USED : 0; dfl_val_pmc9 = param->pfp_ita2_pmc9.opcm_used ? OP_USED : 0; if (param->pfp_ita2_drange.rr_used == 1) { if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_ita2_drange; orr = &mod_out->pfp_ita2_drange; ret = check_intervals(irr, 1, &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_DRRINVAL; ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); if (ret != PFMLIB_SUCCESS) { return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_DRRTOOMANY : ret; } /* * Update iod_codes to reflect the use of the DBR constraint. */ for (i=0; i < orr->rr_nbr_used; i++) { if (orr->rr_br[i].reg_num == 0) iod_codes[0] |= DR_USED | dfl_val_pmc8; if (orr->rr_br[i].reg_num == 2) iod_codes[1] |= DR_USED | dfl_val_pmc9; if (orr->rr_br[i].reg_num == 4) iod_codes[2] |= DR_USED | dfl_val_pmc8; if (orr->rr_br[i].reg_num == 6) iod_codes[3] |= DR_USED | dfl_val_pmc9; } } /* * XXX: assume dispatch_irange executed before calling this function */ if (param->pfp_ita2_irange.rr_used == 1) { orr2 = &mod_out->pfp_ita2_irange; if (mod_out == NULL) return PFMLIB_ERR_INVAL; /* * we need to find out whether or not the irange is using * fine mode. If this is the case, then we only need to * program pmc13 for the ibr pairs which designate the lower * bounds of a range. For instance, if IBRP0/IBRP2 are used, * then we only need to program pmc13.cfg_dbrp0 and pmc13.ena_dbrp0, * the PMU will automatically use IBRP2, even though pmc13.ena_dbrp2=0. */ for(i=0; i <= pos; i++) { if (pc[i].reg_num == 14) { pmc14.pmc_val = pc[i].reg_value; if (pmc14.pmc14_ita2_reg.iarc_fine == 1) fine_mode = 1; break; } } /* * Update to reflect the use of the IBR constraint */ for (i=0; i < orr2->rr_nbr_used; i++) { if (orr2->rr_br[i].reg_num == 0) iod_codes[0] |= IR_USED | dfl_val_pmc8; if (orr2->rr_br[i].reg_num == 2) iod_codes[1] |= IR_USED | dfl_val_pmc9; if (fine_mode == 0 && orr2->rr_br[i].reg_num == 4) iod_codes[2] |= IR_USED | dfl_val_pmc8; if (fine_mode == 0 && orr2->rr_br[i].reg_num == 6) iod_codes[3] |= IR_USED | dfl_val_pmc9; } } if (param->pfp_ita2_irange.rr_used == 0 && param->pfp_ita2_drange.rr_used ==0) { iod_codes[0] = iod_codes[2] = dfl_val_pmc8; iod_codes[1] = iod_codes[3] = dfl_val_pmc9; } /* * update the cfg dbrpX field. If we put a constraint on a cfg dbrp, then * we must enable it in the corresponding ena_dbrpX */ pmc13.pmc13_ita2_reg.darc_ena_dbrp0 = iod_codes[0] ? 1 : 0; pmc13.pmc13_ita2_reg.darc_cfg_dbrp0 = iod_tab[iod_codes[0]]; pmc13.pmc13_ita2_reg.darc_ena_dbrp1 = iod_codes[1] ? 1 : 0; pmc13.pmc13_ita2_reg.darc_cfg_dbrp1 = iod_tab[iod_codes[1]]; pmc13.pmc13_ita2_reg.darc_ena_dbrp2 = iod_codes[2] ? 1 : 0; pmc13.pmc13_ita2_reg.darc_cfg_dbrp2 = iod_tab[iod_codes[2]]; pmc13.pmc13_ita2_reg.darc_ena_dbrp3 = iod_codes[3] ? 1 : 0; pmc13.pmc13_ita2_reg.darc_cfg_dbrp3 = iod_tab[iod_codes[3]]; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 13)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 13; pc[pos].reg_value = pmc13.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 13; pos++; __pfm_vbprintf("[PMC13(pmc13)=0x%lx cfg_dbrp0=%d cfg_dbrp1=%d cfg_dbrp2=%d cfg_dbrp3=%d ena_dbrp0=%d ena_dbrp1=%d ena_dbrp2=%d ena_dbrp3=%d]\n", pmc13.pmc_val, pmc13.pmc13_ita2_reg.darc_cfg_dbrp0, pmc13.pmc13_ita2_reg.darc_cfg_dbrp1, pmc13.pmc13_ita2_reg.darc_cfg_dbrp2, pmc13.pmc13_ita2_reg.darc_cfg_dbrp3, pmc13.pmc13_ita2_reg.darc_ena_dbrp0, pmc13.pmc13_ita2_reg.darc_ena_dbrp1, pmc13.pmc13_ita2_reg.darc_ena_dbrp2, pmc13.pmc13_ita2_reg.darc_ena_dbrp3); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int check_qualifier_constraints(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in) { pfmlib_ita2_input_param_t *param = mod_in; pfmlib_event_t *e = inp->pfp_events; unsigned int i, count; count = inp->pfp_event_count; for(i=0; i < count; i++) { /* * skip check for counter which requested it. Use at your own risk. * No all counters have necessarily been validated for use with * qualifiers. Typically the event is counted as if no constraint * existed. */ if (param->pfp_ita2_counters[i].flags & PFMLIB_ITA2_FL_EVT_NO_QUALCHECK) continue; if (evt_use_irange(param) && has_iarr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_drange(param) && has_darr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_opcm(param) && has_opcm(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int check_range_plm(pfmlib_input_param_t *inp, pfmlib_ita2_input_param_t *mod_in) { pfmlib_ita2_input_param_t *param = mod_in; unsigned int i, count; if (param->pfp_ita2_drange.rr_used == 0 && param->pfp_ita2_irange.rr_used == 0) return PFMLIB_SUCCESS; /* * range restriction applies to all events, therefore we must have a consistent * set of plm and they must match the pfp_dfl_plm which is used to setup the debug * registers */ count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].plm && inp->pfp_events[i].plm != inp->pfp_dfl_plm) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int pfm_ita2_dispatch_events(pfmlib_input_param_t *inp, void *model_in, pfmlib_output_param_t *outp, void *model_out) { int ret; pfmlib_ita2_input_param_t *mod_in = (pfmlib_ita2_input_param_t *)model_in; pfmlib_ita2_output_param_t *mod_out = (pfmlib_ita2_output_param_t *)model_out; /* * nothing will come out of this combination */ if (mod_out && mod_in == NULL) return PFMLIB_ERR_INVAL; /* check opcode match, range restriction qualifiers */ if (mod_in && check_qualifier_constraints(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; /* check for problems with raneg restriction and per-event plm */ if (mod_in && check_range_plm(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; ret = pfm_ita2_dispatch_counters(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for I-EAR */ ret = pfm_dispatch_iear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for D-EAR */ ret = pfm_dispatch_dear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* XXX: must be done before dispatch_opcm() and dispatch_drange() */ ret = pfm_dispatch_irange(inp, mod_in, outp, mod_out);; if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_drange(inp, mod_in, outp, mod_out);; if (ret != PFMLIB_SUCCESS) return ret; /* now check for Opcode matchers */ ret = pfm_dispatch_opcm(inp, mod_in, outp, mod_out); if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_btb(inp, mod_in, outp); return ret; } /* XXX: return value is also error code */ int pfm_ita2_get_event_maxincr(unsigned int i, unsigned int *maxincr) { if (i >= PME_ITA2_EVENT_COUNT || maxincr == NULL) return PFMLIB_ERR_INVAL; *maxincr = itanium2_pe[i].pme_maxincr; return PFMLIB_SUCCESS; } int pfm_ita2_is_ear(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_ear(i); } int pfm_ita2_is_dear(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_dear(i); } int pfm_ita2_is_dear_tlb(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_dear(i) && is_ear_tlb(i); } int pfm_ita2_is_dear_cache(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_dear(i) && is_ear_cache(i); } int pfm_ita2_is_dear_alat(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_ear_alat(i); } int pfm_ita2_is_iear(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_iear(i); } int pfm_ita2_is_iear_tlb(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_iear(i) && is_ear_tlb(i); } int pfm_ita2_is_iear_cache(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_iear(i) && is_ear_cache(i); } int pfm_ita2_is_btb(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && is_btb(i); } int pfm_ita2_support_iarr(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && has_iarr(i); } int pfm_ita2_support_darr(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && has_darr(i); } int pfm_ita2_support_opcm(unsigned int i) { return i < PME_ITA2_EVENT_COUNT && has_opcm(i); } int pfm_ita2_get_ear_mode(unsigned int i, pfmlib_ita2_ear_mode_t *m) { pfmlib_ita2_ear_mode_t r; if (!is_ear(i) || m == NULL) return PFMLIB_ERR_INVAL; r = PFMLIB_ITA2_EAR_TLB_MODE; if (is_ear_tlb(i)) goto done; r = PFMLIB_ITA2_EAR_CACHE_MODE; if (is_ear_cache(i)) goto done; r = PFMLIB_ITA2_EAR_ALAT_MODE; if (is_ear_alat(i)) goto done; return PFMLIB_ERR_INVAL; done: *m = r; return PFMLIB_SUCCESS; } static int pfm_ita2_get_event_code(unsigned int i, unsigned int cnt, int *code) { if (cnt != PFMLIB_CNT_FIRST && (cnt < 4 || cnt > 7)) return PFMLIB_ERR_INVAL; *code = (int)itanium2_pe[i].pme_code; return PFMLIB_SUCCESS; } /* * This function is accessible directly to the user */ int pfm_ita2_get_event_umask(unsigned int i, unsigned long *umask) { if (i >= PME_ITA2_EVENT_COUNT || umask == NULL) return PFMLIB_ERR_INVAL; *umask = evt_umask(i); return PFMLIB_SUCCESS; } int pfm_ita2_get_event_group(unsigned int i, int *grp) { if (i >= PME_ITA2_EVENT_COUNT || grp == NULL) return PFMLIB_ERR_INVAL; *grp = evt_grp(i); return PFMLIB_SUCCESS; } int pfm_ita2_get_event_set(unsigned int i, int *set) { if (i >= PME_ITA2_EVENT_COUNT || set == NULL) return PFMLIB_ERR_INVAL; *set = evt_set(i) == 0xf ? PFMLIB_ITA2_EVT_NO_SET : evt_set(i); return PFMLIB_SUCCESS; } /* external interface */ int pfm_ita2_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_ita2_output_param_t *mod_out) { pfmlib_ita2_output_param_t *param = mod_out; pfm_ita2_pmc_reg_t reg; unsigned int i, count; /* some sanity checks */ if (outp == NULL || param == NULL) return 0; if (outp->pfp_pmc_count >= PFMLIB_MAX_PMCS) return 0; if (param->pfp_ita2_irange.rr_nbr_used == 0) return 0; /* * we look for pmc14 as it contains the bit indicating if fine mode is used */ count = outp->pfp_pmc_count; for(i=0; i < count; i++) { if (outp->pfp_pmcs[i].reg_num == 14) goto found; } return 0; found: reg.pmc_val = outp->pfp_pmcs[i].reg_value; return reg.pmc14_ita2_reg.iarc_fine ? 1 : 0; } static char * pfm_ita2_get_event_name(unsigned int i) { return itanium2_pe[i].pme_name; } static void pfm_ita2_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { unsigned int i; unsigned long m; memset(counters, 0, sizeof(*counters)); m =itanium2_pe[j].pme_counters; for(i=0; m ; i++, m>>=1) { if (m & 0x1) pfm_regmask_set(counters, i); } } static void pfm_ita2_get_impl_pmcs(pfmlib_regmask_t *impl_pmcs) { unsigned int i = 0; /* all pmcs are contiguous */ for(i=0; i < PMU_ITA2_NUM_PMCS; i++) pfm_regmask_set(impl_pmcs, i); } static void pfm_ita2_get_impl_pmds(pfmlib_regmask_t *impl_pmds) { unsigned int i = 0; /* all pmds are contiguous */ for(i=0; i < PMU_ITA2_NUM_PMDS; i++) pfm_regmask_set(impl_pmds, i); } static void pfm_ita2_get_impl_counters(pfmlib_regmask_t *impl_counters) { unsigned int i = 0; /* counting pmds are contiguous */ for(i=4; i < 8; i++) pfm_regmask_set(impl_counters, i); } static void pfm_ita2_get_hw_counter_width(unsigned int *width) { *width = PMU_ITA2_COUNTER_WIDTH; } static int pfm_ita2_get_event_description(unsigned int ev, char **str) { char *s; s = itanium2_pe[ev].pme_desc; if (s) { *str = strdup(s); } else { *str = NULL; } return PFMLIB_SUCCESS; } static int pfm_ita2_get_cycle_event(pfmlib_event_t *e) { e->event = PME_ITA2_CPU_CYCLES; return PFMLIB_SUCCESS; } static int pfm_ita2_get_inst_retired(pfmlib_event_t *e) { e->event = PME_ITA2_IA64_INST_RETIRED; return PFMLIB_SUCCESS; } pfm_pmu_support_t itanium2_support={ .pmu_name = "itanium2", .pmu_type = PFMLIB_ITANIUM2_PMU, .pme_count = PME_ITA2_EVENT_COUNT, .pmc_count = PMU_ITA2_NUM_PMCS, .pmd_count = PMU_ITA2_NUM_PMDS, .num_cnt = PMU_ITA2_NUM_COUNTERS, .get_event_code = pfm_ita2_get_event_code, .get_event_name = pfm_ita2_get_event_name, .get_event_counters = pfm_ita2_get_event_counters, .dispatch_events = pfm_ita2_dispatch_events, .pmu_detect = pfm_ita2_detect, .get_impl_pmcs = pfm_ita2_get_impl_pmcs, .get_impl_pmds = pfm_ita2_get_impl_pmds, .get_impl_counters = pfm_ita2_get_impl_counters, .get_hw_counter_width = pfm_ita2_get_hw_counter_width, .get_event_desc = pfm_ita2_get_event_description, .get_cycle_event = pfm_ita2_get_cycle_event, .get_inst_retired_event = pfm_ita2_get_inst_retired }; papi-5.3.0/src/libpfm4/lib/pfmlib_perf_event_pmu.c0000600003276200002170000005437212247131124021557 0ustar ralphundrgrad/* * pfmlib_perf_pmu.c: support for perf_events event table * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #ifdef __linux__ #include /* for openat() */ #include #endif /* * looks like several distributions do not have * the latest libc with openat support, so disable * for now */ #undef HAS_OPENAT #include "pfmlib_priv.h" #include "pfmlib_perf_event_priv.h" #define PERF_MAX_UMASKS 8 typedef struct { const char *uname; /* unit mask name */ const char *udesc; /* unit mask desc */ uint64_t uid; /* unit mask id */ int uflags; /* umask options */ int grpid; /* group identifier */ } perf_umask_t; typedef struct { const char *name; /* name */ const char *desc; /* description */ const char *equiv; /* event is aliased to */ uint64_t id; /* perf_hw_id or equivalent */ int modmsk; /* modifiers bitmask */ int type; /* perf_type_id */ int numasks; /* number of unit masls */ int ngrp; /* number of umasks groups */ unsigned long umask_ovfl_idx; /* base index of overflow unit masks */ perf_umask_t umasks[PERF_MAX_UMASKS];/* first unit masks */ } perf_event_t; /* * umask options: uflags */ #define PERF_FL_DEFAULT 0x1 /* umask is default for group */ #define PERF_INVAL_OVFL_IDX ((unsigned long)-1) #define PCL_EVT(f, t, m) \ { .name = #f, \ .id = (f), \ .type = (t), \ .desc = #f, \ .equiv = NULL, \ .numasks = 0, \ .modmsk = (m), \ .ngrp = 0, \ .umask_ovfl_idx = PERF_INVAL_OVFL_IDX,\ } #define PCL_EVTA(f, t, m, a) \ { .name = #f, \ .id = a, \ .type = t, \ .desc = #a, \ .equiv = #a, \ .numasks = 0, \ .modmsk = m, \ .ngrp = 0, \ .umask_ovfl_idx = PERF_INVAL_OVFL_IDX,\ } #define PCL_EVT_HW(n) PCL_EVT(PERF_COUNT_HW_##n, PERF_TYPE_HARDWARE, PERF_ATTR_HW) #define PCL_EVT_SW(n) PCL_EVT(PERF_COUNT_SW_##n, PERF_TYPE_SOFTWARE, PERF_ATTR_SW) #define PCL_EVT_AHW(n, a) PCL_EVTA(n, PERF_TYPE_HARDWARE, PERF_ATTR_HW, PERF_COUNT_HW_##a) #define PCL_EVT_ASW(n, a) PCL_EVTA(n, PERF_TYPE_SOFTWARE, PERF_ATTR_SW, PERF_COUNT_SW_##a) #ifndef MAXPATHLEN #define MAXPATHLEN 1024 #endif static char debugfs_mnt[MAXPATHLEN]; #define PERF_ATTR_HW 0 #define PERF_ATTR_SW 0 #include "events/perf_events.h" #define perf_nevents (perf_event_support.pme_count) static perf_event_t *perf_pe = perf_static_events; static perf_event_t *perf_pe_free, *perf_pe_end; static perf_umask_t *perf_um, *perf_um_free, *perf_um_end; static int perf_pe_count, perf_um_count; static inline int pfm_perf_pmu_supported_plm(void *this) { pfmlib_pmu_t *pmu; pmu = pfmlib_get_pmu_by_type(PFM_PMU_TYPE_CORE); if (!pmu) { DPRINT("no core CPU PMU, going with default\n"); pmu = this; } else { DPRINT("guessing plm from %s PMU plm=0x%x\n", pmu->name, pmu->supported_plm); } return pmu->supported_plm; } static inline unsigned long perf_get_ovfl_umask_idx(perf_umask_t *um) { return um - perf_um; } static inline perf_umask_t * perf_get_ovfl_umask(int pidx) { return perf_um+perf_pe[pidx].umask_ovfl_idx; } static inline perf_umask_t * perf_attridx2um(int idx, int attr_idx) { perf_umask_t *um; if (attr_idx < PERF_MAX_UMASKS) { um = &perf_pe[idx].umasks[attr_idx]; } else { um = perf_get_ovfl_umask(idx); um += attr_idx - PERF_MAX_UMASKS; } return um; } /* * figure out the mount point of the debugfs filesystem * * returns -1 if none is found */ static int get_debugfs_mnt(void) { FILE *fp; char *buffer = NULL; size_t len = 0; char *q, *mnt, *fs; int res = -1; fp = fopen("/proc/mounts", "r"); if (!fp) return -1; while(pfmlib_getl(&buffer, &len, fp) != -1) { q = strchr(buffer, ' '); if (!q) continue; mnt = ++q; q = strchr(q, ' '); if (!q) continue; *q = '\0'; fs = ++q; q = strchr(q, ' '); if (!q) continue; *q = '\0'; if (!strcmp(fs, "debugfs")) { strncpy(debugfs_mnt, mnt, MAXPATHLEN); debugfs_mnt[MAXPATHLEN-1]= '\0'; res = 0; break; } } if (buffer) free(buffer); fclose(fp); return res; } #define PERF_ALLOC_EVENT_COUNT (512) #define PERF_ALLOC_UMASK_COUNT (1024) /* * clone static event table into a dynamic * event table * * Used for tracepoints */ static perf_event_t * perf_table_clone(void) { perf_event_t *addr; perf_pe_count = perf_nevents + PERF_ALLOC_EVENT_COUNT; addr = calloc(perf_pe_count, sizeof(perf_event_t)); if (addr) { memcpy(addr, perf_static_events, perf_nevents * sizeof(perf_event_t)); perf_pe_free = addr + perf_nevents; perf_pe_end = perf_pe_free + PERF_ALLOC_EVENT_COUNT; perf_pe = addr; } return addr; } /* * allocate space for one new event in event table * * returns NULL if out-of-memory * * may realloc existing table if necessary for growth */ static perf_event_t * perf_table_alloc_event(void) { perf_event_t *new_pe; retry: if (perf_pe_free < perf_pe_end) return perf_pe_free++; perf_pe_count += PERF_ALLOC_EVENT_COUNT; new_pe = realloc(perf_pe, perf_pe_count * sizeof(perf_event_t)); if (!new_pe) return NULL; perf_pe_free = new_pe + (perf_pe_free - perf_pe); perf_pe_end = perf_pe_free + PERF_ALLOC_EVENT_COUNT; perf_pe = new_pe; goto retry; } /* * allocate space for overflow new unit masks * * Each event can hold up to PERF_MAX_UMASKS. * But gievn we can dynamically add events * which may have more unit masks, then we * put them into a separate overflow unit * masks, table which can grow on demand. * In that case the first PERF_MAX_UMASKS * are in the event, the rest in the overflow * table at index pointed to by event->umask_ovfl_idx * All unit masks for an event are contiguous in the * overflow table. */ static perf_umask_t * perf_table_alloc_umask(void) { perf_umask_t *new_um; retry: if (perf_um_free < perf_um_end) return perf_um_free++; perf_um_count += PERF_ALLOC_UMASK_COUNT; new_um = realloc(perf_um, perf_um_count * sizeof(*new_um)); if (!new_um) return NULL; perf_um_free = new_um + (perf_um_free - perf_um); perf_um_end = perf_um_free + PERF_ALLOC_UMASK_COUNT; perf_um = new_um; goto retry; } #ifdef __GNUC__ #define POTENTIALLY_UNUSED __attribute__((unused)) #endif static void gen_tracepoint_table(void) { DIR *dir1, *dir2; struct dirent *d1, *d2; perf_event_t *p; perf_umask_t *um; char d2path[MAXPATHLEN]; char idpath[MAXPATHLEN]; char id_str[32]; uint64_t id; int fd, err; int POTENTIALLY_UNUSED dir2_fd; int reuse_event = 0; int numasks; char *tracepoint_name; err = get_debugfs_mnt(); if (err == -1) return; strncat(debugfs_mnt, "/tracing/events", MAXPATHLEN-1); debugfs_mnt[MAXPATHLEN-1]= '\0'; dir1 = opendir(debugfs_mnt); if (!dir1) return; p = perf_table_clone(); err = 0; while((d1 = readdir(dir1)) && err >= 0) { if (!strcmp(d1->d_name, ".")) continue; if (!strcmp(d1->d_name, "..")) continue; snprintf(d2path, MAXPATHLEN, "%s/%s", debugfs_mnt, d1->d_name); /* fails if d2path is not a directory */ dir2 = opendir(d2path); if (!dir2) continue; dir2_fd = dirfd(dir2); /* * if a subdir did not fit our expected * tracepoint format, then we reuse the * allocated space (with have no free) */ if (!reuse_event) p = perf_table_alloc_event(); if (!p) break; if (p) p->name = tracepoint_name = strdup(d1->d_name); if (!(p && p->name)) { closedir(dir2); err = -1; continue; } p->desc = "tracepoint"; p->id = -1; p->type = PERF_TYPE_TRACEPOINT; p->umask_ovfl_idx = PERF_INVAL_OVFL_IDX; p->modmsk = 0, p->ngrp = 1; numasks = 0; reuse_event = 0; while((d2 = readdir(dir2))) { if (!strcmp(d2->d_name, ".")) continue; if (!strcmp(d2->d_name, "..")) continue; #ifdef HAS_OPENAT snprintf(idpath, MAXPATHLEN, "%s/id", d2->d_name); fd = openat(dir2_fd, idpath, O_RDONLY); #else snprintf(idpath, MAXPATHLEN, "%s/%s/id", d2path, d2->d_name); fd = open(idpath, O_RDONLY); #endif if (fd == -1) continue; err = read(fd, id_str, sizeof(id_str)); close(fd); if (err < 0) continue; id = strtoull(id_str, NULL, 0); if (numasks < PERF_MAX_UMASKS) um = p->umasks+numasks; else { um = perf_table_alloc_umask(); if (numasks == PERF_MAX_UMASKS) p->umask_ovfl_idx = perf_get_ovfl_umask_idx(um); } if (!um) { err = -1; break; } /* * tracepoint have no event codes * the code is in the unit masks */ p->id = 0; um->uname = strdup(d2->d_name); if (!um->uname) { err = -1; break; } um->udesc = um->uname; um->uid = id; um->grpid = 0; DPRINT("idpath=%s:%s id=%"PRIu64"\n", p->name, um->uname, id); numasks++; } p->numasks = numasks; closedir(dir2); /* * directory was not pointing * to a tree structure we know about */ if (!numasks) { free(tracepoint_name); reuse_event =1; continue; } /* * update total number of events * only when no error is reported */ if (err >= 0) perf_nevents++; reuse_event = 0; } closedir(dir1); } static int pfm_perf_detect(void *this) { #ifdef __linux__ /* ought to find a better way of detecting PERF */ #define PERF_OLD_PROC_FILE "/proc/sys/kernel/perf_counter_paranoid" #define PERF_PROC_FILE "/proc/sys/kernel/perf_event_paranoid" return !(access(PERF_PROC_FILE, F_OK) && access(PERF_OLD_PROC_FILE, F_OK)) ? PFM_SUCCESS: PFM_ERR_NOTSUPP; #else return PFM_SUCCESS; #endif } static int pfm_perf_init(void *this) { pfmlib_pmu_t *pmu = this; perf_pe = perf_static_events; /* must dynamically add tracepoints */ gen_tracepoint_table(); /* dynamically patch supported plm based on CORE PMU plm */ pmu->supported_plm = pfm_perf_pmu_supported_plm(pmu); return PFM_SUCCESS; } static int pfm_perf_get_event_first(void *this) { return 0; } static int pfm_perf_get_event_next(void *this, int idx) { if (idx < 0 || idx >= (perf_nevents-1)) return -1; return idx+1; } static int pfm_perf_add_defaults(pfmlib_event_desc_t *e, unsigned int msk, uint64_t *umask) { perf_event_t *ent; perf_umask_t *um; int i, j, k, added; k = e->nattrs; ent = perf_pe+e->event; for(i=0; msk; msk >>=1, i++) { if (!(msk & 0x1)) continue; added = 0; for(j=0; j < ent->numasks; j++) { if (j < PERF_MAX_UMASKS) { um = &perf_pe[e->event].umasks[j]; } else { um = perf_get_ovfl_umask(e->event); um += j - PERF_MAX_UMASKS; } if (um->grpid != i) continue; if (um->uflags & PERF_FL_DEFAULT) { DPRINT("added default %s for group %d\n", um->uname, i); *umask |= um->uid; e->attrs[k].id = j; e->attrs[k].ival = 0; k++; added++; } } if (!added) { DPRINT("no default found for event %s unit mask group %d\n", ent->name, i); return PFM_ERR_UMASK; } } e->nattrs = k; return PFM_SUCCESS; } static int pfmlib_perf_encode_tp(pfmlib_event_desc_t *e) { perf_umask_t *um; pfm_event_attr_info_t *a; int i, nu = 0; e->fstr[0] = '\0'; e->count = 1; evt_strcat(e->fstr, "%s", perf_pe[e->event].name); /* * look for tracepoints */ for(i=0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { /* * tracepoint unit masks cannot be combined */ if (++nu > 1) return PFM_ERR_FEATCOMB; if (a->idx < PERF_MAX_UMASKS) { e->codes[0] = perf_pe[e->event].umasks[a->idx].uid; evt_strcat(e->fstr, ":%s", perf_pe[e->event].umasks[a->idx].uname); } else { um = perf_get_ovfl_umask(e->event); e->codes[0] = um[a->idx - PERF_MAX_UMASKS].uid; evt_strcat(e->fstr, ":%s", um[a->idx - PERF_MAX_UMASKS].uname); } } else return PFM_ERR_ATTR; } return PFM_SUCCESS; } static int pfmlib_perf_encode_hw_cache(pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; perf_event_t *ent; unsigned int msk, grpmsk; uint64_t umask = 0; int i, ret; grpmsk = (1 << perf_pe[e->event].ngrp)-1; ent = perf_pe + e->event; e->codes[0] = ent->id; e->count = 1; e->fstr[0] = '\0'; for(i=0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { e->codes[0] |= ent->umasks[a->idx].uid; msk = 1 << ent->umasks[a->idx].grpid; /* umask cannot be combined in each group */ if ((grpmsk & msk) == 0) return PFM_ERR_UMASK; grpmsk &= ~msk; } else return PFM_ERR_ATTR; /* no mod, no raw umask */ } /* check for missing default umasks */ if (grpmsk) { ret = pfm_perf_add_defaults(e, grpmsk, &umask); if (ret != PFM_SUCCESS) return ret; e->codes[0] |= umask; } /* * reorder all the attributes such that the fstr appears always * the same regardless of how the attributes were submitted. * * cannot sort attr until after we have added the default umasks */ evt_strcat(e->fstr, "%s", ent->name); pfmlib_sort_attr(e); for(i=0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", ent->umasks[a->idx].uname); } return PFM_SUCCESS; } static int pfm_perf_get_encoding(void *this, pfmlib_event_desc_t *e) { int ret; switch(perf_pe[e->event].type) { case PERF_TYPE_TRACEPOINT: ret = pfmlib_perf_encode_tp(e); break; case PERF_TYPE_HW_CACHE: ret = pfmlib_perf_encode_hw_cache(e); break; case PERF_TYPE_HARDWARE: case PERF_TYPE_SOFTWARE: ret = PFM_SUCCESS; e->codes[0] = perf_pe[e->event].id; e->count = 1; e->fstr[0] = '\0'; evt_strcat(e->fstr, "%s", perf_pe[e->event].name); break; default: DPRINT("unsupported event type=%d\n", perf_pe[e->event].type); return PFM_ERR_NOTSUPP; } return ret; } static int pfm_perf_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { struct perf_event_attr *attr; int ret; switch(perf_pe[e->event].type) { case PERF_TYPE_TRACEPOINT: ret = pfmlib_perf_encode_tp(e); break; case PERF_TYPE_HW_CACHE: ret = pfmlib_perf_encode_hw_cache(e); break; case PERF_TYPE_HARDWARE: case PERF_TYPE_SOFTWARE: ret = PFM_SUCCESS; e->codes[0] = perf_pe[e->event].id; e->count = 1; e->fstr[0] = '\0'; evt_strcat(e->fstr, "%s", perf_pe[e->event].name); break; default: DPRINT("unsupported event type=%d\n", perf_pe[e->event].type); return PFM_ERR_NOTSUPP; } attr = e->os_data; attr->type = perf_pe[e->event].type; attr->config = e->codes[0]; return ret; } static int pfm_perf_event_is_valid(void *this, int idx) { return idx >= 0 && idx < perf_nevents; } static int pfm_perf_get_event_attr_info(void *this, int idx, int attr_idx, pfm_event_attr_info_t *info) { perf_umask_t *um; /* only supports umasks, modifiers handled at OS layer */ um = perf_attridx2um(idx, attr_idx); info->name = um->uname; info->desc = um->udesc; info->equiv= NULL; info->code = um->uid; info->type = PFM_ATTR_UMASK; info->ctrl = PFM_ATTR_CTRL_PMU; info->is_precise = 0; info->is_dfl = 0; info->idx = attr_idx; info->dfl_val64 = 0; return PFM_SUCCESS; } static int pfm_perf_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; info->name = perf_pe[idx].name; info->desc = perf_pe[idx].desc; info->code = perf_pe[idx].id; info->equiv = perf_pe[idx].equiv; info->idx = idx; info->pmu = pmu->pmu; info->is_precise = 0; /* unit masks + modifiers */ info->nattrs = perf_pe[idx].numasks; return PFM_SUCCESS; } static void pfm_perf_terminate(void *this) { perf_event_t *p; int i, j; if (!(perf_pe && perf_um)) return; /* * free tracepoints name + unit mask names * which are dynamically allocated */ for (i=0; i < perf_nevents; i++) { p = &perf_pe[i]; if (p->type != PERF_TYPE_TRACEPOINT) continue; /* cast to keep compiler happy, we are * freeing the dynamically allocated clone * table, not the static one. We do not want * to create a specific data type */ free((void *)p->name); /* * first PERF_MAX_UMASKS are pre-allocated * the rest is in a separate dynamic table */ for (j=0; j < p->numasks; j++) { if (j == PERF_MAX_UMASKS) break; free((void *)p->umasks[j].uname); } } /* * perf_pe is systematically allocated */ free(perf_pe); perf_pe = NULL; perf_pe_free = perf_pe_end = NULL; if (perf_um) { int n; /* * free the dynamic umasks' uname */ n = perf_um_free - perf_um; for(i=0; i < n; i++) { free((void *)(perf_um[i].uname)); } free(perf_um); perf_um = NULL; perf_um_free = perf_um_end = NULL; } } static int pfm_perf_validate_table(void *this, FILE *fp) { const char *name = perf_event_support.name; perf_umask_t *um; int i, j; int error = 0; for(i=0; i < perf_event_support.pme_count; i++) { if (!perf_pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", name, i, i > 1 ? perf_pe[i-1].name : "??"); error++; } if (!perf_pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", name, i, perf_pe[i].name); error++; } if (perf_pe[i].type < PERF_TYPE_HARDWARE || perf_pe[i].type >= PERF_TYPE_MAX) { fprintf(fp, "pmu: %s event%d: %s :: invalid type\n", name, i, perf_pe[i].name); error++; } if (perf_pe[i].numasks > PERF_MAX_UMASKS && perf_pe[i].umask_ovfl_idx == PERF_INVAL_OVFL_IDX) { fprintf(fp, "pmu: %s event%d: %s :: numasks too big (<%d)\n", name, i, perf_pe[i].name, PERF_MAX_UMASKS); error++; } if (perf_pe[i].numasks < PERF_MAX_UMASKS && perf_pe[i].umask_ovfl_idx != PERF_INVAL_OVFL_IDX) { fprintf(fp, "pmu: %s event%d: %s :: overflow umask idx defined but not needed (<%d)\n", name, i, perf_pe[i].name, PERF_MAX_UMASKS); error++; } if (perf_pe[i].numasks && perf_pe[i].ngrp == 0) { fprintf(fp, "pmu: %s event%d: %s :: ngrp cannot be zero\n", name, i, perf_pe[i].name); error++; } if (perf_pe[i].numasks == 0 && perf_pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s :: ngrp must be zero\n", name, i, perf_pe[i].name); error++; } for(j = 0; j < perf_pe[i].numasks; j++) { if (j < PERF_MAX_UMASKS){ um = perf_pe[i].umasks+j; } else { um = perf_get_ovfl_umask(i); um += j - PERF_MAX_UMASKS; } if (!um->uname) { fprintf(fp, "pmu: %s event%d: %s umask%d :: no name\n", name, i, perf_pe[i].name, j); error++; } if (!um->udesc) { fprintf(fp, "pmu: %s event%d:%s umask%d: %s :: no description\n", name, i, perf_pe[i].name, j, um->uname); error++; } if (perf_pe[i].ngrp && um->grpid >= perf_pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s umask%d: %s :: invalid grpid %d (must be < %d)\n", name, i, perf_pe[i].name, j, um->uname, um->grpid, perf_pe[i].ngrp); error++; } } /* check for excess unit masks */ for(; j < PERF_MAX_UMASKS; j++) { if (perf_pe[i].umasks[j].uname || perf_pe[i].umasks[j].udesc) { fprintf(fp, "pmu: %s event%d: %s :: numasks (%d) invalid more events exists\n", name, i, perf_pe[i].name, perf_pe[i].numasks); error++; } } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } static unsigned int pfm_perf_get_event_nattrs(void *this, int idx) { return perf_pe[idx].numasks; } /* * this function tries to figure out what the underlying core PMU * priv level masks are. It looks for a TYPE_CORE PMU and uses the * first event to determine supported priv level masks. */ /* * remove attrs which are in conflicts (or duplicated) with os layer */ static void pfm_perf_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; int i, compact, type; int plm = pmu->supported_plm; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; if (e->pattrs[i].ctrl != PFM_ATTR_CTRL_PERF_EVENT) continue; /* * only PERF_TYPE_HARDWARE/HW_CACHE may have * precise mode or hypervisor mode * * there is no way to know for sure for those events * so we allow the modifiers and leave it to the kernel * to decide */ type = perf_pe[e->event].type; if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE) { /* no hypervisor mode */ if (e->pattrs[i].idx == PERF_ATTR_H && !(plm & PFM_PLMH)) compact = 1; /* no user mode */ if (e->pattrs[i].idx == PERF_ATTR_U && !(plm & PFM_PLM3)) compact = 1; /* no kernel mode */ if (e->pattrs[i].idx == PERF_ATTR_K && !(plm & PFM_PLM0)) compact = 1; } else { if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; /* no hypervisor mode */ if (e->pattrs[i].idx == PERF_ATTR_H) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } pfmlib_pmu_t perf_event_support={ .desc = "perf_events generic PMU", .name = "perf", .pmu = PFM_PMU_PERF_EVENT, .pme_count = PME_PERF_EVENT_COUNT, .type = PFM_PMU_TYPE_OS_GENERIC, .max_encoding = 1, .supported_plm = PERF_PLM_ALL, .pmu_detect = pfm_perf_detect, .pmu_init = pfm_perf_init, .pmu_terminate = pfm_perf_terminate, .get_event_encoding[PFM_OS_NONE] = pfm_perf_get_encoding, PFMLIB_ENCODE_PERF(pfm_perf_get_perf_encoding), .get_event_first = pfm_perf_get_event_first, .get_event_next = pfm_perf_get_event_next, .event_is_valid = pfm_perf_event_is_valid, .get_event_info = pfm_perf_get_event_info, .get_event_attr_info = pfm_perf_get_event_attr_info, .validate_table = pfm_perf_validate_table, .get_event_nattrs = pfm_perf_get_event_nattrs, PFMLIB_VALID_PERF_PATTRS(pfm_perf_perf_validate_pattrs), }; papi-5.3.0/src/libpfm4/lib/pfmlib_perf_event.c0000600003276200002170000002567712247131124020704 0ustar ralphundrgrad/* * pfmlib_perf_events.c: encode events for perf_event API * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include "pfmlib_priv.h" #include "pfmlib_perf_event_priv.h" #define PERF_PROC_FILE "/proc/sys/kernel/perf_event_paranoid" #ifdef min #undef min #endif #define min(a, b) ((a) < (b) ? (a) : (b)) /* * contains ONLY attributes related to PMU features */ static const pfmlib_attr_desc_t perf_event_mods[]={ PFM_ATTR_B("u", "monitor at user level"), /* monitor user level */ PFM_ATTR_B("k", "monitor at kernel level"), /* monitor kernel level */ PFM_ATTR_B("h", "monitor at hypervisor level"), /* monitor hypervisor level */ PFM_ATTR_SKIP, PFM_ATTR_SKIP, PFM_ATTR_SKIP, PFM_ATTR_SKIP, PFM_ATTR_B("mg", "monitor guest execution"), /* monitor guest level */ PFM_ATTR_B("mh", "monitor host execution"), /* monitor host level */ PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; /* * contains all attributes controlled by perf_events. That includes PMU attributes * and pure software attributes such as sampling periods */ static const pfmlib_attr_desc_t perf_event_ext_mods[]={ PFM_ATTR_B("u", "monitor at user level"), /* monitor user level */ PFM_ATTR_B("k", "monitor at kernel level"), /* monitor kernel level */ PFM_ATTR_B("h", "monitor at hypervisor level"), /* monitor hypervisor level */ PFM_ATTR_I("period", "sampling period"), /* sampling period */ PFM_ATTR_I("freq", "sampling frequency (Hz)"), /* sampling frequency */ PFM_ATTR_I("precise", "precise ip"), /* anti-skid mechanism */ PFM_ATTR_B("excl", "exclusive access"), /* exclusive PMU access */ PFM_ATTR_B("mg", "monitor guest execution"), /* monitor guest level */ PFM_ATTR_B("mh", "monitor host execution"), /* monitor host level */ PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; static int pfmlib_perf_event_encode(void *this, const char *str, int dfl_plm, void *data) { pfm_perf_encode_arg_t arg; pfm_perf_encode_arg_t *uarg = data; pfmlib_os_t *os = this; struct perf_event_attr my_attr, *attr; pfmlib_pmu_t *pmu; pfmlib_event_desc_t e; pfm_event_attr_info_t *a; size_t orig_sz, asz, sz = sizeof(arg); uint64_t ival; int has_plm = 0, has_vmx_plm = 0; int i, plm = 0, ret, vmx_plm = 0; sz = pfmlib_check_struct(uarg, uarg->size, PFM_PERF_ENCODE_ABI0, sz); if (!sz) return PFM_ERR_INVAL; /* copy input */ memcpy(&arg, uarg, sz); /* pointer to our internal attr struct */ memset(&my_attr, 0, sizeof(my_attr)); attr = &my_attr; /* * copy user attr to our internal version * size == 0 is interpreted minimal possible * size (ABI_VER0) */ /* size of attr struct passed by user */ orig_sz = uarg->attr->size; if (orig_sz == 0) asz = PERF_ATTR_SIZE_VER0; else asz = min(sizeof(*attr), orig_sz); /* * we copy the user struct to preserve whatever may * have been initialized but that we do not use */ memcpy(attr, uarg->attr, asz); /* restore internal size (just in case we need it) */ attr->size = sizeof(my_attr); /* useful for debugging */ if (asz != sizeof(*attr)) __pfm_vbprintf("warning: mismatch attr struct size " "user=%d libpfm=%zu\n", asz, sizeof(*attr)); memset(&e, 0, sizeof(e)); e.osid = os->id; e.os_data = attr; e.dfl_plm = dfl_plm; /* after this call, need to call pfmlib_release_event() */ ret = pfmlib_parse_event(str, &e); if (ret != PFM_SUCCESS) return ret; pmu = e.pmu; ret = PFM_ERR_NOTSUPP; if (!pmu->get_event_encoding[e.osid]) { DPRINT("PMU %s does not support PFM_OS_NONE\n", pmu->name); goto done; } ret = pmu->get_event_encoding[e.osid](pmu, &e); if (ret != PFM_SUCCESS) goto done; /* * process perf_event attributes */ for (i = 0; i < e.nattrs; i++) { a = attr(&e, i); if (a->ctrl != PFM_ATTR_CTRL_PERF_EVENT) continue; ival = e.attrs[i].ival; switch(a->idx) { case PERF_ATTR_U: if (ival) plm |= PFM_PLM3; has_plm = 1; break; case PERF_ATTR_K: if (ival) plm |= PFM_PLM0; has_plm = 1; break; case PERF_ATTR_H: if (ival) plm |= PFM_PLMH; has_plm = 1; break; case PERF_ATTR_PE: if (!ival || attr->freq) return PFM_ERR_ATTR_VAL; attr->sample_period = ival; break; case PERF_ATTR_FR: if (!ival || attr->sample_period) return PFM_ERR_ATTR_VAL; attr->sample_freq = ival; attr->freq = 1; break; case PERF_ATTR_PR: if (ival > 3) return PFM_ERR_ATTR_VAL; attr->precise_ip = ival; break; case PERF_ATTR_EX: if (ival && !attr->exclusive) attr->exclusive = 1; break; case PERF_ATTR_MG: vmx_plm |= PFM_PLM3; has_vmx_plm = 1; break; case PERF_ATTR_MH: vmx_plm |= PFM_PLM0; has_vmx_plm = 1; break; } } /* * if no priv level mask was provided * with the event, then use dfl_plm */ if (!has_plm) plm = dfl_plm; /* exclude_guest by default */ if (!has_vmx_plm) vmx_plm = PFM_PLM0; /* * perf_event plm work by exclusion, so use logical or * goal here is to set to zero any exclude_* not supported * by underlying PMU */ plm |= (~pmu->supported_plm) & PFM_PLM_ALL; vmx_plm |= (~pmu->supported_plm) & PFM_PLM_ALL; attr->exclude_user = !(plm & PFM_PLM3); attr->exclude_kernel = !(plm & PFM_PLM0); attr->exclude_hv = !(plm & PFM_PLMH); attr->exclude_guest = !(vmx_plm & PFM_PLM3); attr->exclude_host = !(vmx_plm & PFM_PLM0); __pfm_vbprintf("PERF[type=%x config=0x%"PRIx64" config1=0x%"PRIx64 " excl=%d e_u=%d e_k=%d e_hv=%d e_host=%d e_gu=%d period=%"PRIu64" freq=%d" " precise=%d] %s\n", attr->type, attr->config, attr->config1, attr->exclusive, attr->exclude_user, attr->exclude_kernel, attr->exclude_hv, attr->exclude_host, attr->exclude_guest, attr->sample_period, attr->freq, attr->precise_ip, str); /* * propagate event index if necessary */ arg.idx = pfmlib_pidx2idx(e.pmu, e.event); /* propagate our changes, that overwrites attr->size */ memcpy(uarg->attr, attr, asz); /* restore user size */ uarg->attr->size = orig_sz; /* * fstr not requested, stop here */ ret = PFM_SUCCESS; if (!arg.fstr) { memcpy(uarg, &arg, sz); goto done; } for (i=0; i < e.npattrs; i++) { int idx; if (e.pattrs[i].ctrl != PFM_ATTR_CTRL_PERF_EVENT) continue; idx = e.pattrs[i].idx; switch (idx) { case PERF_ATTR_K: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, !!(plm & PFM_PLM0)); break; case PERF_ATTR_U: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, !!(plm & PFM_PLM3)); break; case PERF_ATTR_H: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, !!(plm & PFM_PLMH)); break; case PERF_ATTR_PR: evt_strcat(e.fstr, ":%s=%d", perf_event_ext_mods[idx].name, attr->precise_ip); break; case PERF_ATTR_PE: case PERF_ATTR_FR: if (attr->freq && attr->sample_period) evt_strcat(e.fstr, ":%s=%"PRIu64, perf_event_ext_mods[idx].name, attr->sample_period); else if (attr->sample_period) evt_strcat(e.fstr, ":%s=%"PRIu64, perf_event_ext_mods[idx].name, attr->sample_period); break; case PERF_ATTR_MG: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, !attr->exclude_guest); break; case PERF_ATTR_MH: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, !attr->exclude_host); break; case PERF_ATTR_EX: evt_strcat(e.fstr, ":%s=%lu", perf_event_ext_mods[idx].name, attr->exclusive); break; } } ret = pfmlib_build_fstr(&e, arg.fstr); if (ret == PFM_SUCCESS) memcpy(uarg, &arg, sz); done: pfmlib_release_event(&e); return ret; } /* * get OS-specific event attributes */ static int perf_get_os_nattrs(void *this, pfmlib_event_desc_t *e) { pfmlib_os_t *os = this; int i, n = 0; for (i = 0; os->atdesc[i].name; i++) if (!is_empty_attr(os->atdesc+i)) n++; return n; } static int perf_get_os_attr_info(void *this, pfmlib_event_desc_t *e) { pfmlib_os_t *os = this; pfm_event_attr_info_t *info; int i, k, j = e->npattrs; for (i = k = 0; os->atdesc[i].name; i++) { /* skip padding entries */ if (is_empty_attr(os->atdesc+i)) continue; info = e->pattrs + j + k; info->name = os->atdesc[i].name; info->desc = os->atdesc[i].desc; info->equiv= NULL; info->code = i; info->idx = i; /* namespace-specific index */ info->type = os->atdesc[i].type; info->is_dfl = 0; info->ctrl = PFM_ATTR_CTRL_PERF_EVENT; k++; } e->npattrs += k; return PFM_SUCCESS; } /* * old interface, maintained for backward compatibility with earlier versions of the library */ int pfm_get_perf_event_encoding(const char *str, int dfl_plm, struct perf_event_attr *attr, char **fstr, int *idx) { pfm_perf_encode_arg_t arg; int ret; if (PFMLIB_INITIALIZED() == 0) return PFM_ERR_NOINIT; /* idx and fstr can be NULL */ if (!(attr && str)) return PFM_ERR_INVAL; if (dfl_plm & ~(PFM_PLM_ALL)) return PFM_ERR_INVAL; memset(&arg, 0, sizeof(arg)); /* do not clear attr, some fields may be initialized by caller already, e.g., size */ arg.attr = attr; arg.fstr = fstr; ret = pfm_get_os_event_encoding(str, dfl_plm, PFM_OS_PERF_EVENT_EXT, &arg); if (ret != PFM_SUCCESS) return ret; if (idx) *idx = arg.idx; return PFM_SUCCESS; } static int pfm_perf_event_os_detect(void *this) { int ret = access(PERF_PROC_FILE, F_OK); return ret ? PFM_ERR_NOTSUPP : PFM_SUCCESS; } pfmlib_os_t pfmlib_os_perf={ .name = "perf_event", .id = PFM_OS_PERF_EVENT, .atdesc = perf_event_mods, .detect = pfm_perf_event_os_detect, .get_os_attr_info = perf_get_os_attr_info, .get_os_nattrs = perf_get_os_nattrs, .encode = pfmlib_perf_event_encode, }; pfmlib_os_t pfmlib_os_perf_ext={ .name = "perf_event extended", .id = PFM_OS_PERF_EVENT_EXT, .atdesc = perf_event_ext_mods, .detect = pfm_perf_event_os_detect, .get_os_attr_info = perf_get_os_attr_info, .get_os_nattrs = perf_get_os_nattrs, .encode = pfmlib_perf_event_encode, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_netburst_priv.h0000600003276200002170000001621712247131124022643 0ustar ralphundrgrad/* * Copyright (c) 2006 IBM Corp. * Contributed by Kevin Corry * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * pfmlib_netburst_priv.h * * Structures and definitions for use in the Pentium4/Xeon/EM64T libpfm code. */ #ifndef _PFMLIB_INTEL_NETBURST_PRIV_H_ #define _PFMLIB_INTEL_NETBURST_PRIV_H_ /* ESCR: Event Selection Control Register * * These registers are used to select which event to count along with options * for that event. There are (up to) 45 ESCRs, but each data counter is * restricted to a specific set of ESCRs. */ /** * netburst_escr_value_t * * Bit-wise breakdown of the ESCR registers. * * Bits Description * ------- ----------- * 63 - 31 Reserved * 30 - 25 Event Select * 24 - 9 Event Mask * 8 - 5 Tag Value * 4 Tag Enable * 3 T0 OS - Enable counting in kernel mode (thread 0) * 2 T0 USR - Enable counting in user mode (thread 0) * 1 T1 OS - Enable counting in kernel mode (thread 1) * 0 T1 USR - Enable counting in user mode (thread 1) **/ #define EVENT_MASK_BITS 16 #define EVENT_SELECT_BITS 6 typedef union { unsigned long long val; struct { unsigned long t1_usr:1; unsigned long t1_os:1; unsigned long t0_usr:1; unsigned long t0_os:1; unsigned long tag_enable:1; unsigned long tag_value:4; unsigned long event_mask:EVENT_MASK_BITS; unsigned long event_select:EVENT_SELECT_BITS; unsigned long reserved:1; } bits; } netburst_escr_value_t; /* CCCR: Counter Configuration Control Register * * These registers are used to configure the data counters. There are 18 * CCCRs, one for each data counter. */ /** * netburst_cccr_value_t * * Bit-wise breakdown of the CCCR registers. * * Bits Description * ------- ----------- * 63 - 32 Reserved * 31 OVF - The data counter overflowed. * 30 Cascade - Enable cascading of data counter when alternate * counter overflows. * 29 - 28 Reserved * 27 OVF_PMI_T1 - Generate interrupt for LP1 on counter overflow * 26 OVF_PMI_T0 - Generate interrupt for LP0 on counter overflow * 25 FORCE_OVF - Force interrupt on every counter increment * 24 Edge - Enable rising edge detection of the threshold comparison * output for filtering event counts. * 23 - 20 Threshold Value - Select the threshold value for comparing to * incoming event counts. * 19 Complement - Select how incoming event count is compared with * the threshold value. * 18 Compare - Enable filtering of event counts. * 17 - 16 Active Thread - Only used with HT enabled. * 00 - None: Count when neither LP is active. * 01 - Single: Count when only one LP is active. * 10 - Both: Count when both LPs are active. * 11 - Any: Count when either LP is active. * 15 - 13 ESCR Select - Select which ESCR to use for selecting the * event to count. * 12 Enable - Turns the data counter on or off. * 11 - 0 Reserved **/ typedef union { unsigned long long val; struct { unsigned long reserved1:12; unsigned long enable:1; unsigned long escr_select:3; unsigned long active_thread:2; unsigned long compare:1; unsigned long complement:1; unsigned long threshold:4; unsigned long edge:1; unsigned long force_ovf:1; unsigned long ovf_pmi_t0:1; unsigned long ovf_pmi_t1:1; unsigned long reserved2:2; unsigned long cascade:1; unsigned long overflow:1; } bits; } netburst_cccr_value_t; /** * netburst_event_mask_t * * Defines one bit of the event-mask for one Pentium4 event. * * @name: Event mask name * @desc: Event mask description * @bit: The bit position within the event_mask field. **/ typedef struct { const char *name; const char *desc; unsigned int bit; unsigned int flags; } netburst_event_mask_t; /* * netburst_event_mask_t->flags */ #define NETBURST_FL_DFL 0x1 /* event mask is default */ #define MAX_ESCRS_PER_EVENT 2 /* * These are the unique event codes used by perf_events. * The need to be encoded in the ESCR.event_select field when * programming for perf_events */ enum netburst_events { P4_EVENT_TC_DELIVER_MODE, P4_EVENT_BPU_FETCH_REQUEST, P4_EVENT_ITLB_REFERENCE, P4_EVENT_MEMORY_CANCEL, P4_EVENT_MEMORY_COMPLETE, P4_EVENT_LOAD_PORT_REPLAY, P4_EVENT_STORE_PORT_REPLAY, P4_EVENT_MOB_LOAD_REPLAY, P4_EVENT_PAGE_WALK_TYPE, P4_EVENT_BSQ_CACHE_REFERENCE, P4_EVENT_IOQ_ALLOCATION, P4_EVENT_IOQ_ACTIVE_ENTRIES, P4_EVENT_FSB_DATA_ACTIVITY, P4_EVENT_BSQ_ALLOCATION, P4_EVENT_BSQ_ACTIVE_ENTRIES, P4_EVENT_SSE_INPUT_ASSIST, P4_EVENT_PACKED_SP_UOP, P4_EVENT_PACKED_DP_UOP, P4_EVENT_SCALAR_SP_UOP, P4_EVENT_SCALAR_DP_UOP, P4_EVENT_64BIT_MMX_UOP, P4_EVENT_128BIT_MMX_UOP, P4_EVENT_X87_FP_UOP, P4_EVENT_TC_MISC, P4_EVENT_GLOBAL_POWER_EVENTS, P4_EVENT_TC_MS_XFER, P4_EVENT_UOP_QUEUE_WRITES, P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, P4_EVENT_RETIRED_BRANCH_TYPE, P4_EVENT_RESOURCE_STALL, P4_EVENT_WC_BUFFER, P4_EVENT_B2B_CYCLES, P4_EVENT_BNR, P4_EVENT_SNOOP, P4_EVENT_RESPONSE, P4_EVENT_FRONT_END_EVENT, P4_EVENT_EXECUTION_EVENT, P4_EVENT_REPLAY_EVENT, P4_EVENT_INSTR_RETIRED, P4_EVENT_UOPS_RETIRED, P4_EVENT_UOP_TYPE, P4_EVENT_BRANCH_RETIRED, P4_EVENT_MISPRED_BRANCH_RETIRED, P4_EVENT_X87_ASSIST, P4_EVENT_MACHINE_CLEAR, P4_EVENT_INSTR_COMPLETED, }; typedef struct { const char *name; const char *desc; unsigned int event_select; unsigned int escr_select; enum netburst_events perf_code; /* perf_event event code, enum P4_EVENTS */ int allowed_escrs[MAX_ESCRS_PER_EVENT]; netburst_event_mask_t event_masks[EVENT_MASK_BITS]; } netburst_entry_t; #define NETBURST_ATTR_U 0 #define NETBURST_ATTR_K 1 #define NETBURST_ATTR_C 2 #define NETBURST_ATTR_E 3 #define NETBURST_ATTR_T 4 #define _NETBURST_ATTR_U (1 << NETBURST_ATTR_U) #define _NETBURST_ATTR_K (1 << NETBURST_ATTR_K) #define P4_REPLAY_REAL_MASK 0x00000003 extern int pfm_netburst_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_netburst_get_perf_encoding(void *this, pfmlib_event_desc_t *e); extern void pfm_netburst_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); #endif papi-5.3.0/src/libpfm4/lib/pfmlib_arm_armv7_pmuv1.c0000600003276200002170000001161612247131124021556 0ustar ralphundrgrad/* * pfmlib_arm_armv7_pmuv1.c : support for ARMV7 chips * * Copyright (c) 2010 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_arm_priv.h" #include "events/arm_cortex_a8_events.h" /* event tables */ #include "events/arm_cortex_a9_events.h" #include "events/arm_cortex_a15_events.h" static int pfm_arm_detect_cortex_a8(void *this) { int ret; ret = pfm_arm_detect(this); if (ret != PFM_SUCCESS) return PFM_ERR_NOTSUPP; if ((pfm_arm_cfg.implementer == 0x41) && /* ARM */ (pfm_arm_cfg.part == 0xc08)) { /* Cortex-A8 */ return PFM_SUCCESS; } return PFM_ERR_NOTSUPP; } static int pfm_arm_detect_cortex_a9(void *this) { int ret; ret = pfm_arm_detect(this); if (ret != PFM_SUCCESS) return PFM_ERR_NOTSUPP; if ((pfm_arm_cfg.implementer == 0x41) && /* ARM */ (pfm_arm_cfg.part==0xc09)) { /* Cortex-A8 */ return PFM_SUCCESS; } return PFM_ERR_NOTSUPP; } static int pfm_arm_detect_cortex_a15(void *this) { int ret; ret = pfm_arm_detect(this); if (ret != PFM_SUCCESS) return PFM_ERR_NOTSUPP; if ((pfm_arm_cfg.implementer == 0x41) && /* ARM */ (pfm_arm_cfg.part==0xc0f)) { /* Cortex-A15 */ return PFM_SUCCESS; } return PFM_ERR_NOTSUPP; } /* Cortex A8 support */ pfmlib_pmu_t arm_cortex_a8_support={ .desc = "ARM Cortex A8", .name = "arm_ac8", .pmu = PFM_PMU_ARM_CORTEX_A8, .pme_count = LIBPFM_ARRAY_SIZE(arm_cortex_a8_pe), .type = PFM_PMU_TYPE_CORE, .pe = arm_cortex_a8_pe, .pmu_detect = pfm_arm_detect_cortex_a8, .max_encoding = 1, .num_cntrs = 2, .get_event_encoding[PFM_OS_NONE] = pfm_arm_get_encoding, PFMLIB_ENCODE_PERF(pfm_arm_get_perf_encoding), .get_event_first = pfm_arm_get_event_first, .get_event_next = pfm_arm_get_event_next, .event_is_valid = pfm_arm_event_is_valid, .validate_table = pfm_arm_validate_table, .get_event_info = pfm_arm_get_event_info, .get_event_attr_info = pfm_arm_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_arm_perf_validate_pattrs), .get_event_nattrs = pfm_arm_get_event_nattrs, }; /* Cortex A9 support */ pfmlib_pmu_t arm_cortex_a9_support={ .desc = "ARM Cortex A9", .name = "arm_ac9", .pmu = PFM_PMU_ARM_CORTEX_A9, .pme_count = LIBPFM_ARRAY_SIZE(arm_cortex_a9_pe), .type = PFM_PMU_TYPE_CORE, .pe = arm_cortex_a9_pe, .pmu_detect = pfm_arm_detect_cortex_a9, .max_encoding = 1, .num_cntrs = 2, .get_event_encoding[PFM_OS_NONE] = pfm_arm_get_encoding, PFMLIB_ENCODE_PERF(pfm_arm_get_perf_encoding), .get_event_first = pfm_arm_get_event_first, .get_event_next = pfm_arm_get_event_next, .event_is_valid = pfm_arm_event_is_valid, .validate_table = pfm_arm_validate_table, .get_event_info = pfm_arm_get_event_info, .get_event_attr_info = pfm_arm_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_arm_perf_validate_pattrs), .get_event_nattrs = pfm_arm_get_event_nattrs, }; /* Cortex A15 support */ pfmlib_pmu_t arm_cortex_a15_support={ .desc = "ARM Cortex A15", .name = "arm_ac15", .pmu = PFM_PMU_ARM_CORTEX_A15, .pme_count = LIBPFM_ARRAY_SIZE(arm_cortex_a15_pe), .type = PFM_PMU_TYPE_CORE, .pe = arm_cortex_a15_pe, .pmu_detect = pfm_arm_detect_cortex_a15, .max_encoding = 1, .num_cntrs = 6, .supported_plm = ARMV7_A15_PLM, .get_event_encoding[PFM_OS_NONE] = pfm_arm_get_encoding, PFMLIB_ENCODE_PERF(pfm_arm_get_perf_encoding), .get_event_first = pfm_arm_get_event_first, .get_event_next = pfm_arm_get_event_next, .event_is_valid = pfm_arm_event_is_valid, .validate_table = pfm_arm_validate_table, .get_event_info = pfm_arm_get_event_info, .get_event_attr_info = pfm_arm_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_arm_perf_validate_pattrs), .get_event_nattrs = pfm_arm_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_itanium_priv.h0000600003276200002170000000714712247131124021252 0ustar ralphundrgrad/* * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux/ia64. */ #ifndef __PFMLIB_ITANIUM_PRIV_H__ #define __PFMLIB_ITANIUM_PRIV_H__ /* * Itanium encoding structure * (code must be first 8 bits) */ typedef struct { unsigned long pme_code:8; /* major event code */ unsigned long pme_ear:1; /* is EAR event */ unsigned long pme_dear:1; /* 1=Data 0=Instr */ unsigned long pme_tlb:1; /* 1=TLB 0=Cache */ unsigned long pme_btb:1; /* 1=BTB */ unsigned long pme_ig1:4; /* ignored */ unsigned long pme_umask:16; /* unit mask*/ unsigned long pme_ig:32; /* ignored */ } pme_ita_entry_code_t; #define PME_UMASK_NONE 0x0 typedef union { unsigned long pme_vcode; pme_ita_entry_code_t pme_ita_code; /* must not be larger than vcode */ } pme_ita_code_t; typedef union { unsigned long qual; /* generic qualifier */ struct { unsigned long pme_iar:1; /* instruction address range supported */ unsigned long pme_opm:1; /* opcode match supported */ unsigned long pme_dar:1; /* data address range supported */ unsigned long pme_reserved:61; /* not used */ } pme_qual; } pme_ita_qualifiers_t; typedef struct { char *pme_name; pme_ita_code_t pme_entry_code; unsigned long pme_counters; /* supported counters */ unsigned int pme_maxincr; pme_ita_qualifiers_t pme_qualifiers; char *pme_desc; } pme_ita_entry_t; /* * We embed the umask value into the event code. Because it really is * like a subevent. * pme_code: * - lower 16 bits: major event code * - upper 16 bits: unit mask */ #define pme_code pme_entry_code.pme_ita_code.pme_code #define pme_ear pme_entry_code.pme_ita_code.pme_ear #define pme_dear pme_entry_code.pme_ita_code.pme_dear #define pme_tlb pme_entry_code.pme_ita_code.pme_tlb #define pme_btb pme_entry_code.pme_ita_code.pme_btb #define pme_umask pme_entry_code.pme_ita_code.pme_umask #define pme_used pme_qualifiers.pme_qual_struct.pme_used #define event_is_ear(e) ((e)->pme_ear == 1) #define event_is_iear(e) ((e)->pme_ear == 1 && (e)->pme_dear==0) #define event_is_dear(e) ((e)->pme_ear == 1 && (e)->pme_dear==1) #define event_is_tlb_ear(e) ((e)->pme_ear == 1 && (e)->pme_tlb==1) #define event_is_btb(e) ((e)->pme_btb) #define event_opcm_ok(e) ((e)->pme_qualifiers.pme_qual.pme_opm==1) #define event_iarr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_iar==1) #define event_darr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_dar==1) #endif /* __PFMLIB_ITANIUM_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_k7.c0000600003276200002170000000463312247131124020050 0ustar ralphundrgrad/* * pfmlib_amd64_k7.c : AMD64 K7 * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_k7.h" static int pfm_amd64_k7_detect(void *this) { int ret; ret = pfm_amd64_detect(this); if (ret != PFM_SUCCESS) return ret; ret = pfm_amd64_cfg.revision; return ret == PFM_PMU_AMD64_K7 ? PFM_SUCCESS : PFM_ERR_NOTSUPP; } pfmlib_pmu_t amd64_k7_support={ .desc = "AMD64 K7", .name = "amd64_k7", .pmu = PFM_PMU_AMD64_K7, .pmu_rev = AMD64_K7, .pme_count = LIBPFM_ARRAY_SIZE(amd64_k7_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = AMD64_K7_PLM, .num_cntrs = 4, .max_encoding = 1, .pe = amd64_k7_pe, .atdesc = amd64_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .pmu_detect = pfm_amd64_k7_detect, .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding, PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), .get_event_first = pfm_amd64_get_event_first, .get_event_next = pfm_amd64_get_event_next, .event_is_valid = pfm_amd64_event_is_valid, .validate_table = pfm_amd64_validate_table, .get_event_info = pfm_amd64_get_event_info, .get_event_attr_info = pfm_amd64_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs), .get_event_nattrs = pfm_amd64_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_ivb.c0000600003276200002170000001042412247131124020502 0ustar ralphundrgrad/* * pfmlib_intel_ivb.c : Intel Ivy Bridge core PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_ivb_events.h" static int pfm_ivb_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 58: /* IvyBridge (Core i3/i5/i7 3xxx) */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_ivbep_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 62: /* IvyTown */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_ivb_init(void *this) { pfm_intel_x86_cfg.arch_version = 3; return PFM_SUCCESS; } pfmlib_pmu_t intel_ivb_support={ .desc = "Intel Ivy Bridge", .name = "ivb", .pmu = PFM_PMU_INTEL_IVB, .pme_count = LIBPFM_ARRAY_SIZE(intel_ivb_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 8, /* consider with HT off by default */ .num_fixed_cntrs = 3, .max_encoding = 2, /* offcore_response */ .pe = intel_ivb_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_ivb_detect, .pmu_init = pfm_ivb_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; pfmlib_pmu_t intel_ivb_ep_support={ .desc = "Intel Ivy Bridge EP", .name = "ivb_ep", .pmu = PFM_PMU_INTEL_IVB_EP, .pme_count = LIBPFM_ARRAY_SIZE(intel_ivb_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 8, /* consider with HT off by default */ .num_fixed_cntrs = 3, .max_encoding = 2, /* offcore_response */ .pe = intel_ivb_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_ivbep_detect, .pmu_init = pfm_ivb_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; papi-5.3.0/src/libpfm4/lib/Makefile0000600003276200002170000001717512247131123016503 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk # # Common files # SRCS=pfmlib_common.c ifeq ($(SYS),Linux) SRCS += pfmlib_perf_event_pmu.c pfmlib_perf_event.c endif CFLAGS+=-D_REENTRANT -I. # # list all library support modules # ifeq ($(CONFIG_PFMLIB_ARCH_IA64),y) INCARCH = $(INC_IA64) #SRCS += pfmlib_gen_ia64.c pfmlib_itanium.c pfmlib_itanium2.c pfmlib_montecito.c CFLAGS += -DCONFIG_PFMLIB_ARCH_IA64 endif ifeq ($(CONFIG_PFMLIB_ARCH_X86),y) ifeq ($(SYS),Linux) SRCS += pfmlib_intel_x86_perf_event.c pfmlib_amd64_perf_event.c \ pfmlib_intel_netburst_perf_event.c \ pfmlib_intel_snbep_unc_perf_event.c endif INCARCH = $(INC_X86) SRCS += pfmlib_amd64.c pfmlib_intel_core.c pfmlib_intel_x86.c \ pfmlib_intel_x86_arch.c pfmlib_intel_atom.c \ pfmlib_intel_nhm_unc.c pfmlib_intel_nhm.c \ pfmlib_intel_wsm.c \ pfmlib_intel_snb.c pfmlib_intel_snb_unc.c \ pfmlib_intel_ivb.c pfmlib_intel_ivb_unc.c \ pfmlib_intel_hsw.c \ pfmlib_intel_snbep_unc.c \ pfmlib_intel_snbep_unc_cbo.c \ pfmlib_intel_snbep_unc_ha.c \ pfmlib_intel_snbep_unc_imc.c \ pfmlib_intel_snbep_unc_pcu.c \ pfmlib_intel_snbep_unc_qpi.c \ pfmlib_intel_snbep_unc_ubo.c \ pfmlib_intel_snbep_unc_r2pcie.c \ pfmlib_intel_snbep_unc_r3qpi.c \ pfmlib_intel_knc.c \ pfmlib_intel_netburst.c \ pfmlib_amd64_k7.c pfmlib_amd64_k8.c pfmlib_amd64_fam10h.c \ pfmlib_amd64_fam11h.c pfmlib_amd64_fam12h.c \ pfmlib_amd64_fam14h.c pfmlib_amd64_fam15h.c CFLAGS += -DCONFIG_PFMLIB_ARCH_X86 ifeq ($(CONFIG_PFMLIB_ARCH_I386),y) SRCS += pfmlib_intel_coreduo.c pfmlib_intel_p6.c CFLAGS += -DCONFIG_PFMLIB_ARCH_I386 endif ifeq ($(CONFIG_PFMLIB_ARCH_X86_64),y) CFLAGS += -DCONFIG_PFMLIB_ARCH_X86_64 endif endif ifeq ($(CONFIG_PFMLIB_ARCH_POWERPC),y) ifeq ($(SYS),Linux) SRCS += pfmlib_powerpc_perf_event.c endif INCARCH = $(INC_POWERPC) SRCS += pfmlib_powerpc.c pfmlib_power4.c pfmlib_ppc970.c pfmlib_power5.c pfmlib_power6.c pfmlib_power7.c pfmlib_torrent.c pfmlib_power8.c CFLAGS += -DCONFIG_PFMLIB_ARCH_POWERPC endif ifeq ($(CONFIG_PFMLIB_ARCH_S390X),y) ifeq ($(SYS),Linux) SRCS += pfmlib_s390x_perf_event.c endif INCARCH = $(INC_S390X) SRCS += pfmlib_s390x_cpumf.c CFLAGS += -DCONFIG_PFMLIB_ARCH_S390X endif ifeq ($(CONFIG_PFMLIB_ARCH_SPARC),y) ifeq ($(SYS),Linux) SRCS += pfmlib_sparc_perf_event.c endif INCARCH = $(INC_SPARC) SRCS += pfmlib_sparc.c pfmlib_sparc_ultra12.c pfmlib_sparc_ultra3.c pfmlib_sparc_ultra4.c pfmlib_sparc_niagara.c CFLAGS += -DCONFIG_PFMLIB_ARCH_SPARC endif ifeq ($(CONFIG_PFMLIB_ARCH_ARM),y) ifeq ($(SYS),Linux) SRCS += pfmlib_arm_perf_event.c endif INCARCH = $(INC_ARM) SRCS += pfmlib_arm.c pfmlib_arm_armv7_pmuv1.c pfmlib_arm_armv6.c CFLAGS += -DCONFIG_PFMLIB_ARCH_ARM endif ifeq ($(CONFIG_PFMLIB_ARCH_MIPS),y) ifeq ($(SYS),Linux) SRCS += pfmlib_mips_perf_event.c endif INCARCH = $(INC_MIPS) SRCS += pfmlib_mips.c pfmlib_mips_74k.c CFLAGS += -DCONFIG_PFMLIB_ARCH_MIPS endif ifeq ($(CONFIG_PFMLIB_ARCH_CRAYXT),y) CFLAGS += -DCONFIG_PFMLIB_ARCH_CRAYXT endif ifeq ($(CONFIG_PFMLIB_CELL),y) INCARCH = $(INC_CELL) #SRCS += pfmlib_cell.c CFLAGS += -DCONFIG_PFMLIB_CELL endif ifeq ($(SYS),Linux) SLDFLAGS=-shared -Wl,-soname -Wl,$(VLIBPFM) SLIBPFM=libpfm.so.$(VERSION).$(REVISION).$(AGE) VLIBPFM=libpfm.so.$(VERSION) SOLIBEXT=so endif CFLAGS+=-I. ALIBPFM=libpfm.a TARGETS=$(ALIBPFM) ifeq ($(CONFIG_PFMLIB_SHARED),y) TARGETS += $(SLIBPFM) endif OBJS=$(SRCS:.c=.o) SOBJS=$(OBJS:.o=.lo) INC_COMMON= $(PFMINCDIR)/perfmon/pfmlib.h pfmlib_priv.h ifeq ($(SYS),Linux) INC_COMMON += $(PFMINCDIR)/perfmon/perf_event.h events/perf_events.h endif INC_IA64=pfmlib_ia64_priv.h \ events/itanium_events.h \ events/itanium2_events.h \ events/montecito_events.h INC_X86= pfmlib_intel_x86_priv.h \ pfmlib_amd64_priv.h \ events/amd64_events_k7.h \ events/amd64_events_k8.h \ events/amd64_events_fam10h.h \ events/amd64_events_fam11h.h \ events/amd64_events_fam12h.h \ events/amd64_events_fam14h.h \ events/amd64_events_fam15h.h \ events/intel_p6_events.h \ events/intel_netburst_events.h \ events//intel_x86_arch_events.h \ events/intel_coreduo_events.h \ events/intel_core_events.h \ events/intel_atom_events.h \ events/intel_nhm_events.h \ events/intel_nhm_unc_events.h \ events/intel_wsm_events.h \ events/intel_wsm_unc_events.h \ events/intel_snb_events.h \ events/intel_snbep_events.h \ events/intel_snb_unc_events.h \ events/intel_ivb_events.h \ events/intel_hsw_events.h \ pfmlib_intel_snbep_unc_priv.h \ events/intel_snbep_unc_cbo_events.h \ events/intel_snbep_unc_ha_events.h \ events/intel_snbep_unc_imc_events.h \ events/intel_snbep_unc_pcu_events.h \ events/intel_snbep_unc_qpi_events.h \ events/intel_snbep_unc_ubo_events.h \ events/intel_snbep_unc_r2pcie_events.h \ events/intel_snbep_unc_r3qpi_events.h \ events/intel_knc_events.h INC_MIPS=events/mips_74k_events.h events/mips_74k_events.h INC_POWERPC=events/ppc970_events.h \ events/ppc970mp_events.h \ events/power4_events.h \ events/power5_events.h \ events/power5+_events.h \ events/power6_events.h \ events/power7_events.h \ events/power8_events.h \ events/torrent_events.h INC_S390X=pfmlib_s390x_priv.h \ events/s390x_cpumf_events.h INC_SPARC=events/sparc_ultra12_events.h \ events/sparc_ultra3_events.h \ events/sparc_ultra3plus_events.h \ events/sparc_ultra3i_events.h \ events/sparc_ultra4plus_events.h \ events/sparc_niagara1_events.h \ events/sparc_niagara2_events.h INC_CELL=events/cell_events.h INC_ARM=events/arm_cortex_a8_events.h \ events/arm_cortex_a9_events.h INC_ARM=pfmlib_arm_priv.h \ events/arm_cortex_a8_events.h \ events/arm_cortex_a9_events.h \ events/arm_cortex_a15_events.h INCDEP=$(INC_COMMON) $(INCARCH) all: $(TARGETS) $(OBJS) $(SOBJS): $(TOPDIR)/config.mk $(TOPDIR)/rules.mk Makefile $(INCDEP) libpfm.a: $(OBJS) $(RM) $@ $(AR) cq $@ $(OBJS) $(SLIBPFM): $(SOBJS) $(CC) $(CFLAGS) $(SLDFLAGS) -o $@ $(SOBJS) $(LN) $@ $(VLIBPFM) $(LN) $@ libpfm.$(SOLIBEXT) clean: $(RM) -f *.o *.lo *.a *.so* *~ *.$(SOLIBEXT) distclean: clean depend: $(MKDEP) $(CFLAGS) $(SRCS) install: $(TARGETS) install: @echo building: $(TARGETS) -mkdir -p $(DESTDIR)$(LIBDIR) $(INSTALL) -m 644 $(ALIBPFM) $(DESTDIR)$(LIBDIR) ifeq ($(CONFIG_PFMLIB_SHARED),y) $(INSTALL) $(SLIBPFM) $(DESTDIR)$(LIBDIR) cd $(DESTDIR)$(LIBDIR); $(LN) $(SLIBPFM) $(VLIBPFM) cd $(DESTDIR)$(LIBDIR); $(LN) $(SLIBPFM) libpfm.$(SOLIBEXT) $(LDCONFIG) endif tags: $(CTAGS) -o $(TOPDIR)/tags --tag-relative=yes $(SRCS) $(INCDEP) papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_perf_event.c0000600003276200002170000000472212247131124022060 0ustar ralphundrgrad/* * pfmlib_sparc_perf_event.c : perf_event SPARC functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_sparc_priv.h" /* architecture private */ #include "pfmlib_perf_event_priv.h" int pfm_sparc_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { struct perf_event_attr *attr = e->os_data; int ret; ret = pfm_sparc_get_encoding(this, e); if (ret != PFM_SUCCESS) return ret; attr->type = PERF_TYPE_RAW; attr->config = (e->codes[0] << 16) | e->codes[1]; return PFM_SUCCESS; } void pfm_sparc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * with perf_events, u and k are handled at the OS level * via attr.exclude_* fields */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { if (e->pattrs[i].idx == SPARC_ATTR_U || e->pattrs[i].idx == SPARC_ATTR_K || e->pattrs[i].idx == SPARC_ATTR_H) compact = 1; } if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* No precise mode on SPARC */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_torrent.c0000600003276200002170000001545312247131124020233 0ustar ralphundrgrad/* * pfmlib_torrent.c : IBM Torrent support * * Copyright (C) IBM Corporation, 2010. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/torrent_events.h" const pfmlib_attr_desc_t torrent_modifiers[] = { PFM_ATTR_I("type", "Counter type: 0 = 2x64-bit counters w/32-bit prescale, 1 = 4x32-bit counters w/16-bit prescale, 2 = 2x32-bit counters w/no prescale, 3 = 4x16-bit counters w/no prescale"), PFM_ATTR_I("sel", "Sample period / Cmd Increment select: 0 = 256 cycles/ +16, 1 = 512 cycles / +8, 2 = 1024 cycles / +4, 3 = 2048 cycles / +2"), PFM_ATTR_I("lo_cmp", "Low threshold compare: 0..31"), PFM_ATTR_I("hi_cmp", "High threshold compare: 0..31"), PFM_ATTR_NULL }; static inline int pfm_torrent_attr2mod(void *this, int pidx, int attr_idx) { const pme_torrent_entry_t *pe = this_pe(this); size_t x; int n; n = attr_idx; pfmlib_for_each_bit(x, pe[pidx].pme_modmsk) { if (n == 0) break; n--; } return x; } /** * torrent_pmu_detect * * Determine if this machine has a Torrent chip * **/ static int pfm_torrent_detect(void* this) { struct dirent *de; DIR *dir; int ret = PFM_ERR_NOTSUPP; /* If /proc/device-tree/hfi-iohub@ exists, * this machine has an accessible Torrent chip */ dir = opendir("/proc/device-tree"); if (!dir) return PFM_ERR_NOTSUPP; while ((de = readdir(dir)) != NULL) { if (!strncmp(de->d_name, "hfi-iohub@", 10)) { ret = PFM_SUCCESS; break; } } closedir(dir); return ret; } static int pfm_torrent_get_event_info(void *this, int pidx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const pme_torrent_entry_t *pe = this_pe(this); info->name = pe[pidx].pme_name; info->desc = pe[pidx].pme_desc ? pe[pidx].pme_desc : ""; info->code = pe[pidx].pme_code; info->equiv = NULL; info->idx = pidx; /* private index */ info->pmu = pmu->pmu; info->dtype = PFM_DTYPE_UINT64; info->is_precise = 0; /* unit masks + modifiers */ info->nattrs = pfmlib_popcnt((unsigned long)pe[pidx].pme_modmsk); return PFM_SUCCESS; } static int pfm_torrent_get_event_attr_info(void *this, int idx, int attr_idx, pfm_event_attr_info_t *info) { int m; m = pfm_torrent_attr2mod(this, idx, attr_idx); info->name = modx(torrent_modifiers, m, name); info->desc = modx(torrent_modifiers, m, desc); info->code = m; info->type = modx(torrent_modifiers, m, type); info->equiv = NULL; info->is_dfl = 0; info->is_precise = 0; info->idx = m; info->dfl_val64 = 0; info->ctrl = PFM_ATTR_CTRL_PMU; return PFM_SUCCESS; } static int pfm_torrent_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const pme_torrent_entry_t *pe = this_pe(this); int i, ret = PFM_ERR_INVAL; for (i = 0; i < pmu->pme_count; i++) { if (!pe[i].pme_name) { fprintf(fp, "pmu: %s event%d: :: no name\n", pmu->name, i); goto error; } if (pe[i].pme_code == 0) { fprintf(fp, "pmu: %s event%d: %s :: event code is 0\n", pmu->name, i, pe[i].pme_name); goto error; } } ret = PFM_SUCCESS; error: return ret; } static int pfm_torrent_get_encoding(void *this, pfmlib_event_desc_t *e) { const pme_torrent_entry_t *pe = this_pe(this); uint32_t torrent_pmu; int i, mod; e->fstr[0] = '\0'; /* initialize the fully-qualified event string */ e->count = 1; e->codes[0] = (uint64_t)pe[e->event].pme_code; for (i = 0; i < e->nattrs; i++) { mod = pfm_torrent_attr2mod(this, e->event, e->attrs[i].id); torrent_pmu = pe[e->event].pme_code & (TORRENT_SPACE | TORRENT_PMU_MASK); switch (torrent_pmu) { case TORRENT_PBUS_MCD: switch (mod) { case TORRENT_ATTR_MCD_TYPE: if (e->attrs[i].ival <= 3) { e->codes[0] |= e->attrs[i].ival << TORRENT_ATTR_MCD_TYPE_SHIFT; } else { DPRINT("value of attribute \'type\' - %" PRIu64 " - is not in the range 0..3.\n", e->attrs[i].ival); return PFM_ERR_ATTR_VAL; } break; default: DPRINT("unknown attribute for TORRENT_POWERBUS_MCD - %d\n", mod); return PFM_ERR_ATTR; } break; case TORRENT_PBUS_UTIL: switch (mod) { case TORRENT_ATTR_UTIL_SEL: if (e->attrs[i].ival <= 3) { e->codes[0] |= e->attrs[i].ival << TORRENT_ATTR_UTIL_SEL_SHIFT; } else { DPRINT("value of attribute \'sel\' - %" PRIu64 " - is not in the range 0..3.\n", e->attrs[i].ival); return PFM_ERR_ATTR_VAL; } break; case TORRENT_ATTR_UTIL_LO_CMP: case TORRENT_ATTR_UTIL_HI_CMP: if (e->attrs[i].ival <= 31) { e->codes[0] |= e->attrs[i].ival << TORRENT_ATTR_UTIL_CMP_SHIFT; } else { if (mod == TORRENT_ATTR_UTIL_LO_CMP) DPRINT("value of attribute \'lo_cmp\' - %" PRIu64 " - is not in the range 0..31.\n", e->attrs[i].ival); else DPRINT("value of attribute \'hi_cmp\' - %" PRIu64 " - is not in the range 0..31.\n", e->attrs[i].ival); return PFM_ERR_ATTR_VAL; } } break; default: DPRINT("attributes are unsupported for this Torrent PMU - code = %" PRIx32 "\n", torrent_pmu); return PFM_ERR_ATTR; } } return PFM_SUCCESS; } pfmlib_pmu_t torrent_support = { .pmu = PFM_PMU_TORRENT, .name = "power_torrent", .desc = "IBM Power Torrent PMU", .pme_count = PME_TORRENT_EVENT_COUNT, .pe = torrent_pe, .max_encoding = 1, .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .pmu_detect = pfm_torrent_detect, .get_event_encoding[PFM_OS_NONE] = pfm_torrent_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .validate_table = pfm_torrent_validate_table, .get_event_info = pfm_torrent_get_event_info, .get_event_attr_info = pfm_torrent_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_itanium2_priv.h0000600003276200002170000001217112247131124021325 0ustar ralphundrgrad/* * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux/ia64. */ #ifndef __PFMLIB_ITANIUM2_PRIV_H__ #define __PFMLIB_ITANIUM2_PRIV_H__ /* * Event type definitions * * The virtual events are not really defined in the specs but are an artifact used * to quickly and easily setup EAR and/or BTB. The event type encodes the exact feature * which must be configured in combination with a counting monitor. * For instance, DATA_EAR_CACHE_LAT4 is a virtual D-EAR cache event. If the user * requests this event, this will configure a counting monitor to count DATA_EAR_EVENTS * and PMC11 will be configured for cache mode. The latency is encoded in the umask, here * it would correspond to 4 cycles. * */ #define PFMLIB_ITA2_EVENT_NORMAL 0x0 /* standard counter */ #define PFMLIB_ITA2_EVENT_BTB 0x1 /* virtual event used with BTB configuration */ #define PFMLIB_ITA2_EVENT_IEAR_TLB 0x2 /* virtual event used for I-EAR TLB configuration */ #define PFMLIB_ITA2_EVENT_IEAR_CACHE 0x3 /* virtual event used for I-EAR cache configuration */ #define PFMLIB_ITA2_EVENT_DEAR_TLB 0x4 /* virtual event used for D-EAR TLB configuration */ #define PFMLIB_ITA2_EVENT_DEAR_CACHE 0x5 /* virtual event used for D-EAR cache configuration */ #define PFMLIB_ITA2_EVENT_DEAR_ALAT 0x6 /* virtual event used for D-EAR ALAT configuration */ #define event_is_ear(e) ((e)->pme_type >= PFMLIB_ITA2_EVENT_IEAR_TLB &&(e)->pme_type <= PFMLIB_ITA2_EVENT_DEAR_ALAT) #define event_is_iear(e) ((e)->pme_type == PFMLIB_ITA2_EVENT_IEAR_TLB || (e)->pme_type == PFMLIB_ITA2_EVENT_IEAR_CACHE) #define event_is_dear(e) ((e)->pme_type >= PFMLIB_ITA2_EVENT_DEAR_TLB && (e)->pme_type <= PFMLIB_ITA2_EVENT_DEAR_ALAT) #define event_is_ear_cache(e) ((e)->pme_type == PFMLIB_ITA2_EVENT_DEAR_CACHE || (e)->pme_type == PFMLIB_ITA2_EVENT_IEAR_CACHE) #define event_is_ear_tlb(e) ((e)->pme_type == PFMLIB_ITA2_EVENT_IEAR_TLB || (e)->pme_type == PFMLIB_ITA2_EVENT_DEAR_TLB) #define event_is_ear_alat(e) ((e)->pme_type == PFMLIB_ITA2_EVENT_DEAR_ALAT) #define event_is_btb(e) ((e)->pme_type == PFMLIB_ITA2_EVENT_BTB) /* * Itanium encoding structure * (code must be first 8 bits) */ typedef struct { unsigned long pme_code:8; /* major event code */ unsigned long pme_type:3; /* see definitions above */ unsigned long pme_ig1:5; /* ignored */ unsigned long pme_umask:16; /* unit mask*/ unsigned long pme_ig:32; /* ignored */ } pme_ita2_entry_code_t; typedef union { unsigned long pme_vcode; pme_ita2_entry_code_t pme_ita2_code; /* must not be larger than vcode */ } pme_ita2_code_t; typedef union { unsigned long qual; /* generic qualifier */ struct { unsigned long pme_iar:1; /* instruction address range supported */ unsigned long pme_opm:1; /* opcode match supported */ unsigned long pme_dar:1; /* data address range supported */ unsigned long pme_res1:13; /* reserved */ unsigned long pme_group:4; /* event group */ unsigned long pme_set:4; /* event feature set*/ unsigned long pme_res2:40; /* reserved */ } pme_qual; } pme_ita2_qualifiers_t; typedef struct { char *pme_name; pme_ita2_code_t pme_entry_code; unsigned long pme_counters; /* supported counters */ unsigned int pme_maxincr; pme_ita2_qualifiers_t pme_qualifiers; char *pme_desc; /* text description of the event */ } pme_ita2_entry_t; /* * We embed the umask value into the event code. Because it really is * like a subevent. * pme_code: * - lower 16 bits: major event code * - upper 16 bits: unit mask */ #define pme_code pme_entry_code.pme_ita2_code.pme_code #define pme_umask pme_entry_code.pme_ita2_code.pme_umask #define pme_used pme_qualifiers.pme_qual_struct.pme_used #define pme_type pme_entry_code.pme_ita2_code.pme_type #define event_opcm_ok(e) ((e)->pme_qualifiers.pme_qual.pme_opm==1) #define event_iarr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_iar==1) #define event_darr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_dar==1) #endif /* __PFMLIB_ITANIUM2_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_p6.c0000600003276200002170000001556712247131124020264 0ustar ralphundrgrad/* * pfmlib_i386_p6.c : support for the P6 processor family (family=6) * incl. Pentium II, Pentium III, Pentium Pro, Pentium M * * Copyright (c) 2005-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_intel_x86_priv.h" /* architecture private */ #include "events/intel_p6_events.h" /* generic P6 (PIII) */ #include "events/intel_pii_events.h" /* Pentium II */ #include "events/intel_ppro_events.h" /* Pentium Pro */ #include "events/intel_pm_events.h" /* Pentium M */ static int pfm_p6_detect_pii(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 3: /* Pentium II */ case 5: /* Pentium II Deschutes */ case 6: /* Pentium II Mendocino */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_p6_detect_ppro(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 1: /* Pentium Pro */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_p6_detect_piii(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 7: /* Pentium III Katmai */ case 8: /* Pentium III Coppermine */ case 10:/* Pentium III Cascades */ case 11:/* Pentium III Tualatin */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_p6_detect_pm(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 9: /* Pentium M */ case 13:/* Pentium M */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } /* Pentium II support */ pfmlib_pmu_t intel_pii_support={ .desc = "Intel Pentium II", .name = "pii", .pmu = PFM_PMU_INTEL_PII, .pme_count = LIBPFM_ARRAY_SIZE(intel_pii_pe), .pe = intel_pii_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_p6_detect_pii, .num_cntrs = 2, .max_encoding = 1, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; pfmlib_pmu_t intel_p6_support={ .desc = "Intel P6 Processor Family", .name = "p6", .pmu = PFM_PMU_I386_P6, .pme_count = LIBPFM_ARRAY_SIZE(intel_p6_pe), .pe = intel_p6_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_p6_detect_piii, .num_cntrs = 2, .max_encoding = 1, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; pfmlib_pmu_t intel_ppro_support={ .desc = "Intel Pentium Pro", .name = "ppro", .pmu = PFM_PMU_INTEL_PPRO, .pme_count = LIBPFM_ARRAY_SIZE(intel_ppro_pe), .pe = intel_ppro_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_p6_detect_ppro, .num_cntrs = 2, .max_encoding = 1, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; /* Pentium M support */ pfmlib_pmu_t intel_pm_support={ .desc = "Intel Pentium M", .name = "pm", .pmu = PFM_PMU_I386_PM, .pe = intel_pm_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_p6_detect_pm, .pme_count = LIBPFM_ARRAY_SIZE(intel_pm_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 2, .max_encoding = 1, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_mips_priv.h0000600003276200002170000001021112247131124020536 0ustar ralphundrgrad/* * Copyright (c) 2011 Samara Technology Group, Inc * Contributed by Philip Mucci * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_MIPS_PRIV_H__ #define __PFMLIB_MIPS_PRIV_H__ /* * This file contains the definitions used for MIPS processors */ /* * event description */ typedef struct { const char *name; /* event name */ const char *desc; /* event description */ unsigned int mask; /* which counters event lives on */ unsigned int code; /* event code */ } mips_entry_t; #if __BYTE_ORDER == __LITTLE_ENDIAN typedef union { uint64_t val; /* complete register value */ struct { unsigned long sel_exl:1; /* int level */ unsigned long sel_os:1; /* system level */ unsigned long sel_sup:1; /* supervisor level */ unsigned long sel_usr:1; /* user level */ unsigned long sel_int:1; /* enable intr */ unsigned long sel_event_mask:7; /* event mask */ unsigned long sel_res1:20; /* reserved */ unsigned long sel_res2:32; /* reserved */ } perfsel64; } pfm_mips_sel_reg_t; #elif __BYTE_ORDER == __BIG_ENDIAN typedef union { uint64_t val; /* complete register value */ struct { unsigned long sel_res2:32; /* reserved */ unsigned long sel_res1:20; /* reserved */ unsigned long sel_event_mask:7; /* event mask */ unsigned long sel_int:1; /* enable intr */ unsigned long sel_usr:1; /* user level */ unsigned long sel_sup:1; /* supervisor level */ unsigned long sel_os:1; /* system level */ unsigned long sel_exl:1; /* int level */ } perfsel64; } pfm_mips_sel_reg_t; #else #error "cannot determine endianess" #endif typedef struct { char model[1024]; int implementer; int architecture; int part; } pfm_mips_config_t; extern pfm_mips_config_t pfm_mips_cfg; #define MIPS_ATTR_K 0 /* system level */ #define MIPS_ATTR_U 1 /* user level */ #define MIPS_ATTR_S 2 /* supervisor level */ #define MIPS_ATTR_E 3 /* exception level */ #define MIPS_NUM_ATTRS 4 #define _MIPS_ATTR_K (1 << MIPS_ATTR_K) #define _MIPS_ATTR_U (1 << MIPS_ATTR_U) #define _MIPS_ATTR_S (1 << MIPS_ATTR_S) #define _MIPS_ATTR_E (1 << MIPS_ATTR_E) #define MIPS_PLM_ALL ( _MIPS_ATTR_K |\ _MIPS_ATTR_U |\ _MIPS_ATTR_S |\ _MIPS_ATTR_E) extern int pfm_mips_detect(void *this); extern int pfm_mips_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_mips_get_event_first(void *this); extern int pfm_mips_get_event_next(void *this, int idx); extern int pfm_mips_event_is_valid(void *this, int pidx); extern int pfm_mips_validate_table(void *this, FILE *fp); extern int pfm_mips_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info); extern int pfm_mips_get_event_info(void *this, int idx, pfm_event_info_t *info); extern unsigned int pfm_mips_get_event_nattrs(void *this, int pidx); extern void pfm_mips_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_mips_get_perf_encoding(void *this, pfmlib_event_desc_t *e); #endif /* __PFMLIB_MIPS_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_priv.h0000600003276200002170000001566112247131124022753 0ustar ralphundrgrad/* * pfmlib_intel_snbep_unc_priv.c : Intel SandyBridge-EP common definitions * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PFMLIB_INTEL_SNBEP_UNC_PRIV_H__ #define __PFMLIB_INTEL_SNBEP_UNC_PRIV_H__ /* * Intel x86 specific pmu flags (pmu->flags 16 MSB) */ #define INTEL_PMU_FL_UNC_OCC 0x10000 /* PMU has occupancy counter filters */ #define SNBEP_UNC_ATTR_E 0 #define SNBEP_UNC_ATTR_I 1 #define SNBEP_UNC_ATTR_T8 2 #define SNBEP_UNC_ATTR_T5 3 #define SNBEP_UNC_ATTR_TF 4 #define SNBEP_UNC_ATTR_CF 5 #define SNBEP_UNC_ATTR_NF 6 #define SNBEP_UNC_ATTR_FF 7 #define SNBEP_UNC_ATTR_A 8 #define _SNBEP_UNC_ATTR_I (1 << SNBEP_UNC_ATTR_I) #define _SNBEP_UNC_ATTR_E (1 << SNBEP_UNC_ATTR_E) #define _SNBEP_UNC_ATTR_T8 (1 << SNBEP_UNC_ATTR_T8) #define _SNBEP_UNC_ATTR_T5 (1 << SNBEP_UNC_ATTR_T5) #define _SNBEP_UNC_ATTR_TF (1 << SNBEP_UNC_ATTR_TF) #define _SNBEP_UNC_ATTR_CF (1 << SNBEP_UNC_ATTR_CF) #define _SNBEP_UNC_ATTR_NF (1 << SNBEP_UNC_ATTR_NF) #define _SNBEP_UNC_ATTR_FF (1 << SNBEP_UNC_ATTR_FF) #define _SNBEP_UNC_ATTR_A (1 << SNBEP_UNC_ATTR_A) #define SNBEP_UNC_R3QPI_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_R2PCIE_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_QPI_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_UBO_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_PCU_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T5) #define SNBEP_UNC_PCU_BAND_ATTRS \ (SNBEP_UNC_PCU_ATTRS | _SNBEP_UNC_ATTR_FF) #define SNBEP_UNC_IMC_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_CBO_ATTRS \ (_SNBEP_UNC_ATTR_I |\ _SNBEP_UNC_ATTR_E |\ _SNBEP_UNC_ATTR_T8 |\ _SNBEP_UNC_ATTR_CF |\ _SNBEP_UNC_ATTR_TF) #define SNBEP_UNC_CBO_NID_ATTRS \ (SNBEP_UNC_CBO_ATTRS|_SNBEP_UNC_ATTR_NF) #define SNBEP_UNC_HA_ATTRS \ (_SNBEP_UNC_ATTR_I|_SNBEP_UNC_ATTR_E|_SNBEP_UNC_ATTR_T8) #define SNBEP_UNC_HA_OPC_ATTRS \ (SNBEP_UNC_HA_ATTRS|_SNBEP_UNC_ATTR_A) typedef union { uint64_t val; struct { unsigned long unc_event:8; /* event code */ unsigned long unc_umask:8; /* unit mask */ unsigned long unc_res1:1; /* reserved */ unsigned long unc_rst:1; /* reset */ unsigned long unc_edge:1; /* edge detec */ unsigned long unc_res2:3; /* reserved */ unsigned long unc_en:1; /* enable */ unsigned long unc_inv:1; /* invert counter mask */ unsigned long unc_thres:8; /* counter mask */ unsigned long unc_res3:32; /* reserved */ } com; /* covers common fields for cbox, ha, imc, ubox, r2pcie, r3qpi */ struct { unsigned long unc_event:8; /* event code */ unsigned long unc_umask:8; /* unit mask */ unsigned long unc_res1:1; /* reserved */ unsigned long unc_rst:1; /* reset */ unsigned long unc_edge:1; /* edge detec */ unsigned long unc_tid:1; /* tid filter enable */ unsigned long unc_res2:2; /* reserved */ unsigned long unc_en:1; /* enable */ unsigned long unc_inv:1; /* invert counter mask */ unsigned long unc_thres:8; /* counter mask */ unsigned long unc_res3:32; /* reserved */ } cbo; /* covers c-box */ struct { unsigned long unc_event:8; /* event code */ unsigned long unc_res1:6; /* reserved */ unsigned long unc_occ:2; /* occ select */ unsigned long unc_res2:1; /* reserved */ unsigned long unc_rst:1; /* reset */ unsigned long unc_edge:1; /* edge detec */ unsigned long unc_res3:1; /* reserved */ unsigned long unc_res4:2; /* reserved */ unsigned long unc_en:1; /* enable */ unsigned long unc_inv:1; /* invert counter mask */ unsigned long unc_thres:5; /* threshold */ unsigned long unc_res5:1; /* reserved */ unsigned long unc_occ_inv:1; /* occupancy invert */ unsigned long unc_occ_edge:1; /* occupancy edge detect */ unsigned long unc_res6:32; /* reserved */ } pcu; /* covers pcu */ struct { unsigned long unc_event:8; /* event code */ unsigned long unc_umask:8; /* unit maks */ unsigned long unc_res1:1; /* reserved */ unsigned long unc_rst:1; /* reset */ unsigned long unc_edge:1; /* edge detec */ unsigned long unc_res2:1; /* reserved */ unsigned long unc_res3:1; /* reserved */ unsigned long unc_event_ext:1; /* event code extension */ unsigned long unc_en:1; /* enable */ unsigned long unc_inv:1; /* invert counter mask */ unsigned long unc_thres:8; /* threshold */ unsigned long unc_res4:32; /* reserved */ } qpi; /* covers qpi */ struct { unsigned long tid:1; unsigned long cid:3; unsigned long res0:1; unsigned long res1:3; unsigned long res2:2; unsigned long nid:8; unsigned long state:5; unsigned long opc:9; unsigned long res3:1; unsigned long res4:32; } cbo_filt; /* cbox filter */ struct { unsigned long filt0:8; /* band0 freq filter */ unsigned long filt1:8; /* band1 freq filter */ unsigned long filt2:8; /* band2 freq filter */ unsigned long filt3:8; /* band3 freq filter */ unsigned long res1:32; /* reserved */ } pcu_filt; struct { unsigned long res1:6; unsigned long lo_addr:26; /* lo order 26b */ unsigned long hi_addr:14; /* hi order 14b */ unsigned long res2:18; /* reserved */ } ha_addr; struct { unsigned long opc:6; /* opcode match */ unsigned long res1:26; /* reserved */ unsigned long res2:32; /* reserved */ } ha_opc; } pfm_snbep_unc_reg_t; extern void pfm_intel_snbep_unc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_intel_snbep_unc_get_encoding(void *this, pfmlib_event_desc_t *e); extern const pfmlib_attr_desc_t snbep_unc_mods[]; extern int pfm_intel_snbep_unc_detect(void *this); extern int pfm_intel_snbep_unc_get_perf_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_intel_snbep_unc_can_auto_encode(void *this, int pidx, int uidx); extern int pfm_intel_snbep_unc_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info); #endif /* __PFMLIB_INTEL_SNBEP_UNC_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_s390x_cpumf.c0000600003276200002170000001571312247131124020615 0ustar ralphundrgrad/* * PMU support for the CPU-measurement counter facility * * Copyright IBM Corp. 2012 * Contributed by Hendrik Brueckner * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private library and arch headers */ #include "pfmlib_priv.h" #include "pfmlib_s390x_priv.h" #include "pfmlib_perf_event_priv.h" #include "events/s390x_cpumf_events.h" #define CPUMF_DEVICE_DIR "/sys/bus/event_source/devices/cpum_cf" #define SYS_INFO "/proc/sysinfo" /* CPU-measurement counter list (pmu events) */ static pme_cpumf_ctr_t *cpumcf_pe = NULL; /* Detect the CPU-measurement facility */ static int pfm_cpumcf_detect(void *this) { if (access(CPUMF_DEVICE_DIR, R_OK)) return PFM_ERR_NOTSUPP; return PFM_SUCCESS; } /* Parses the machine type that identifies an IBM mainframe. * These kind of information are from /proc/sysinfo. */ static long get_machine_type(void) { long machine_type; size_t buflen, len; char *buffer, *tmp; FILE *fp; machine_type = 0; fp = fopen(SYS_INFO, "r"); if (fp == NULL) goto out; buffer = NULL; while (pfmlib_getl(&buffer, &buflen, fp) != -1) { /* skip empty lines */ if (*buffer == '\n') continue; /* look for 'Type:' entry */ if (!strncmp("Type:", buffer, 5)) { tmp = buffer + 5; /* set ptr after ':' */ /* skip leading blanks */ while (isspace(*tmp)) tmp++; /* skip trailing blanks */ len = strlen(tmp); while (len > 0 && isspace(tmp[len])) len--; tmp[len+1] = '\0'; machine_type = strtol(tmp, NULL, 10); break; } } fclose(fp); free(buffer); out: return machine_type; } /* Initialize the PMU representation for CPUMF. * * Set up the PMU events array based on * - generic (basic, problem-state, and crypto-activaty) counter sets * - the extended counter depending on the machine type */ static int pfm_cpumcf_init(void *this) { pfmlib_pmu_t *pmu = this; const pme_cpumf_ctr_t *ext_set; size_t generic_count, ext_set_count; /* check and assign a machine-specific extended counter set */ switch (get_machine_type()) { case 2097: /* IBM System z10 EC */ case 2098: /* IBM System z10 BC */ ext_set = cpumf_ctr_set_ext_z10; ext_set_count = LIBPFM_ARRAY_SIZE(cpumf_ctr_set_ext_z10); break; case 2817: /* IBM zEnterprise 196 */ case 2818: /* IBM zEnterprise 114 */ ext_set = cpumf_ctr_set_ext_z196; ext_set_count = LIBPFM_ARRAY_SIZE(cpumf_ctr_set_ext_z196); break; default: /* No extended counter set for this machine type or there * was an error retrieving the machine type */ ext_set = NULL; ext_set_count = 0; break; } generic_count = LIBPFM_ARRAY_SIZE(cpumf_generic_ctr); cpumcf_pe = calloc(sizeof(*cpumcf_pe), generic_count + ext_set_count); if (cpumcf_pe == NULL) return PFM_ERR_NOMEM; memcpy(cpumcf_pe, cpumf_generic_ctr, sizeof(*cpumcf_pe) * generic_count); if (ext_set_count) memcpy((void *) (cpumcf_pe + generic_count), ext_set, sizeof(*cpumcf_pe) * ext_set_count); pmu->pe = cpumcf_pe; pmu->pme_count = generic_count + ext_set_count; return PFM_SUCCESS; } static void pfm_cpumcf_exit(void *this) { pfmlib_pmu_t *pmu = this; pmu->pme_count = 0; pmu->pe = NULL; free(cpumcf_pe); } static int pfm_cpumcf_get_encoding(void *this, pfmlib_event_desc_t *e) { const pme_cpumf_ctr_t *pe = this_pe(this); e->count = 1; /* number of encoded entries in e->codes */ e->codes[0] = pe[e->event].ctrnum; evt_strcat(e->fstr, "%s", pe[e->event].name); return PFM_SUCCESS; } static int pfm_cpumcf_get_event_first(void *this) { return 0; } static int pfm_cpumcf_get_event_next(void *this, int idx) { pfmlib_pmu_t *pmu = this; if (idx >= (pmu->pme_count - 1)) return -1; return idx + 1; } static int pfm_cpumcf_event_is_valid(void *this, int idx) { pfmlib_pmu_t *pmu = this; return (idx >= 0 && idx < pmu->pme_count); } static int pfm_cpumcf_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const pme_cpumf_ctr_t *pe = this_pe(this); int i, rc; rc = PFM_ERR_INVAL; if (pmu->pme_count > CPUMF_COUNTER_MAX) { fprintf(fp, "pmu: %s: pme number exceeded maximum\n", pmu->name); goto failed; } for (i = 0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event: %i: No name\n", pmu->name, i); goto failed; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event: %i: No description\n", pmu->name, i); goto failed; } } rc = PFM_SUCCESS; failed: return rc; } static int pfm_cpumcf_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const pme_cpumf_ctr_t *pe = this_pe(this); info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].ctrnum; info->equiv = NULL; info->idx = idx; info->pmu = pmu->pmu; info->is_precise = 0; info->nattrs = 0; /* attributes are not supported */ return PFM_SUCCESS; } static int pfm_cpumcf_get_event_attr_info(void *this, int idx, int umask_idx, pfm_event_attr_info_t *info) { /* Attributes are not supported */ return PFM_ERR_ATTR; } pfmlib_pmu_t s390x_cpum_cf_support = { .desc = "CPU-measurement counter facility", .name = "cpum_cf", .pmu = PFM_PMU_S390X_CPUM_CF, .type = PFM_PMU_TYPE_CORE, .flags = PFMLIB_PMU_FL_ARCH_DFL, .num_cntrs = 0, /* no general-purpose counters */ .num_fixed_cntrs = CPUMF_COUNTER_MAX, /* fixed counters only */ .max_encoding = 1, .pe = cpumf_generic_ctr, .pme_count = LIBPFM_ARRAY_SIZE(cpumf_generic_ctr), .pmu_detect = pfm_cpumcf_detect, .pmu_init = pfm_cpumcf_init, .pmu_terminate = pfm_cpumcf_exit, .get_event_encoding[PFM_OS_NONE] = pfm_cpumcf_get_encoding, PFMLIB_ENCODE_PERF(pfm_s390x_get_perf_encoding), .get_event_first = pfm_cpumcf_get_event_first, .get_event_next = pfm_cpumcf_get_event_next, .event_is_valid = pfm_cpumcf_event_is_valid, .validate_table = pfm_cpumcf_validate_table, .get_event_info = pfm_cpumcf_get_event_info, .get_event_attr_info = pfm_cpumcf_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_hsw.c0000600003276200002170000000550012247131124020522 0ustar ralphundrgrad/* * pfmlib_intel_hsw.c : Intel Haswell core PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_hsw_events.h" static int pfm_hsw_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 60: /* Haswell */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_hsw_init(void *this) { pfm_intel_x86_cfg.arch_version = 4; return PFM_SUCCESS; } pfmlib_pmu_t intel_hsw_support={ .desc = "Intel Haswell", .name = "hsw", .pmu = PFM_PMU_INTEL_HSW, .pme_count = LIBPFM_ARRAY_SIZE(intel_hsw_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 8, /* consider with HT off by default */ .num_fixed_cntrs = 3, .max_encoding = 2, /* offcore_response */ .pe = intel_hsw_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_hsw_detect, .pmu_init = pfm_hsw_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_x86_perf_event.c0000600003276200002170000001667112247131124022576 0ustar ralphundrgrad/* pfmlib_intel_x86_perf.c : perf_event Intel X86 functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_perf_event_priv.h" static int find_pmu_type_by_name(const char *name) { char filename[PATH_MAX]; FILE *fp; int ret, type; if (!name) return PFM_ERR_NOTSUPP; sprintf(filename, "/sys/bus/event_source/devices/%s/type", name); fp = fopen(filename, "r"); if (!fp) return PFM_ERR_NOTSUPP; ret = fscanf(fp, "%d", &type); if (ret != 1) type = PFM_ERR_NOTSUPP; fclose(fp); return type; } static int has_ldlat(void *this, pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; int i; for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type != PFM_ATTR_UMASK) continue; if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_LDLAT)) return 1; } return 0; } int pfm_intel_x86_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; /* * first, we need to do the generic encoding */ ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; if (e->count > 2) { DPRINT("%s: unsupported count=%d\n", e->count); return PFM_ERR_NOTSUPP; } /* default PMU type */ attr->type = PERF_TYPE_RAW; /* * if PMU specifies a perf PMU name, then grab the type * from sysfs as it is most likely dynamically assigned. * This allows this function to use used by some uncore PMUs */ if (pmu->perf_name) { int type = find_pmu_type_by_name(pmu->perf_name); if (type == PFM_ERR_NOTSUPP) { DPRINT("perf PMU %s, not supported by OS\n", pmu->perf_name); } else { DPRINT("PMU %s perf type=%d\n", pmu->name, type); attr->type = type; } } attr->config = e->codes[0]; if (e->count > 1) { /* * Nehalem/Westmere/Sandy Bridge OFFCORE_RESPONSE events * take two MSRs. lower level returns two codes: * - codes[0] goes to regular counter config * - codes[1] goes into extra MSR */ if (intel_x86_eflag(this, e->event, INTEL_X86_NHM_OFFCORE)) { if (e->count != 2) { DPRINT("perf_encoding: offcore=1 count=%d\n", e->count); return PFM_ERR_INVAL; } attr->config1 = e->codes[1]; } /* * Load Latency threshold (NHM/WSM/SNB) * - codes[0] goes to regular counter config * - codes[1] LD_LAT MSR value (LSB 16 bits) */ if (has_ldlat(this, e)) { if (e->count != 2) { DPRINT("perf_encoding: ldlat count=%d\n", e->count); return PFM_ERR_INVAL; } attr->config1 = e->codes[1]; } } return PFM_SUCCESS; } int pfm_intel_nhm_unc_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; pfm_intel_x86_reg_t reg; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; ret = find_pmu_type_by_name(pmu->perf_name); if (ret < 0) return ret; attr->type = ret; reg.val = e->codes[0]; /* * encoder treats all events as using the generic * counters. * perf_events override the enable and int bits, so * drop them here. * * also makes fixed counter special encoding 0xff * work. kernel checking for perfect match. */ reg.nhm_unc.usel_en = 0; reg.nhm_unc.usel_int = 0; attr->config = reg.val; /* * uncore measures at all priv levels * * user cannot set per-event priv levels because * attributes are simply not there * * dfl_plm is ignored in this case */ attr->exclude_hv = 0; attr->exclude_kernel = 0; attr->exclude_user = 0; return PFM_SUCCESS; } int pfm_intel_x86_requesting_pebs(pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; int i; for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PERF_EVENT) continue; if (a->idx == PERF_ATTR_PR && e->attrs[i].ival) return 1; } return 0; } static int intel_x86_event_has_pebs(void *this, pfmlib_event_desc_t *e) { pfm_event_attr_info_t *a; int i; /* first check at the event level */ if (intel_x86_eflag(e->pmu, e->event, INTEL_X86_PEBS)) return 1; /* check umasks */ for(i=0; i < e->npattrs; i++) { a = e->pattrs+i; if (a->ctrl != PFM_ATTR_CTRL_PMU || a->type != PFM_ATTR_UMASK) continue; if (intel_x86_uflag(e->pmu, e->event, a->idx, INTEL_X86_PEBS)) return 1; } return 0; } /* * remove attrs which are in conflicts (or duplicated) with os layer */ void pfm_intel_x86_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; int i, compact; int has_pebs = intel_x86_event_has_pebs(this, e); int no_smpl = pmu->flags & PFMLIB_PMU_FL_NO_SMPL; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * with perf_events, u and k are handled at the OS level * via exclude_user, exclude_kernel. */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { if (e->pattrs[i].idx == INTEL_X86_ATTR_U || e->pattrs[i].idx == INTEL_X86_ATTR_K) compact = 1; } if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* Precise mode, subject to PEBS */ if (e->pattrs[i].idx == PERF_ATTR_PR && !has_pebs) compact = 1; /* * No hypervisor on Intel */ if (e->pattrs[i].idx == PERF_ATTR_H) compact = 1; if (no_smpl && ( e->pattrs[i].idx == PERF_ATTR_FR || e->pattrs[i].idx == PERF_ATTR_PR || e->pattrs[i].idx == PERF_ATTR_PE)) compact = 1; /* * no priv level support */ if (pmu->supported_plm == 0 && ( e->pattrs[i].idx == PERF_ATTR_U || e->pattrs[i].idx == PERF_ATTR_K || e->pattrs[i].idx == PERF_ATTR_MG || e->pattrs[i].idx == PERF_ATTR_MH)) compact = 1; } if (compact) { /* e->npattrs modified by call */ pfmlib_compact_pattrs(e, i); /* compensate for i++ */ i--; } } } int pfm_intel_x86_perf_detect(void *this) { pfmlib_pmu_t *pmu = this; char file[64]; snprintf(file,sizeof(file), "/sys/devices/%s", pmu->perf_name); return access(file, R_OK|X_OK) ? PFM_ERR_NOTSUPP : PFM_SUCCESS; } papi-5.3.0/src/libpfm4/lib/pfmlib_priv.h0000600003276200002170000003264012247131124017520 0ustar ralphundrgrad/* * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_PRIV_H__ #define __PFMLIB_PRIV_H__ #include #include #define PFM_PLM_ALL (PFM_PLM0|PFM_PLM1|PFM_PLM2|PFM_PLM3|PFM_PLMH) #define PFMLIB_ATTR_DELIM ':' /* event attribute delimiter */ #define PFMLIB_PMU_DELIM "::" /* pmu to event delimiter */ #define PFMLIB_EVENT_DELIM ',' /* event to event delimiter */ #define PFM_ATTR_I(y, d) { .name = (y), .type = PFM_ATTR_MOD_INTEGER, .desc = (d) } #define PFM_ATTR_B(y, d) { .name = (y), .type = PFM_ATTR_MOD_BOOL, .desc = (d) } #define PFM_ATTR_SKIP { .name = "" } /* entry not populated (skipped) */ #define PFM_ATTR_NULL { .name = NULL } #define PFMLIB_EVT_MAX_NAME_LEN 256 /* * event identifier encoding: * bit 00-20 : event table specific index (2097152 possibilities) * bit 21-30 : PMU identifier (1024 possibilities) * bit 31 : reserved (to distinguish from a negative error code) */ #define PFMLIB_PMU_SHIFT 21 #define PFMLIB_PMU_MASK 0x3ff /* must fit PFM_PMU_MAX */ #define PFMLIB_PMU_PIDX_MASK ((1<< PFMLIB_PMU_SHIFT)-1) typedef struct { const char *name; /* name */ const char *desc; /* description */ pfm_attr_t type; /* used to validate value (if any) */ } pfmlib_attr_desc_t; /* * attribute description passed to model-specific layer */ typedef struct { int id; /* attribute index */ union { uint64_t ival; /* integer value (incl. bool) */ char *sval; /* string */ }; } pfmlib_attr_t; /* * must be big enough to hold all possible priv level attributes */ #define PFMLIB_MAX_ATTRS 64 /* max attributes per event desc */ #define PFMLIB_MAX_ENCODING 4 /* max encoding length */ /* * we add one entry to hold any raw umask users may specify * the last entry in pattrs[] hold that raw umask info */ #define PFMLIB_MAX_PATTRS (PFMLIB_MAX_ATTRS+1) struct pfmlib_pmu; typedef struct { struct pfmlib_pmu *pmu; /* pmu */ int dfl_plm; /* default priv level mask */ int event; /* pidx */ int npattrs; /* number of attrs in pattrs[] */ int nattrs; /* number of attrs in attrs[] */ pfm_os_t osid; /* OS API requested */ int count; /* number of entries in codes[] */ pfmlib_attr_t attrs[PFMLIB_MAX_ATTRS]; /* list of requested attributes */ pfm_event_attr_info_t *pattrs; /* list of possible attributes */ char fstr[PFMLIB_EVT_MAX_NAME_LEN]; /* fully qualified event string */ uint64_t codes[PFMLIB_MAX_ENCODING]; /* event encoding */ void *os_data; } pfmlib_event_desc_t; #define modx(atdesc, a, z) (atdesc[(a)].z) #define attr(e, k) ((e)->pattrs + (e)->attrs[k].id) typedef struct pfmlib_pmu { const char *desc; /* PMU description */ const char *name; /* pmu short name */ const char *perf_name; /* perf_event pmu name (optional) */ pfm_pmu_t pmu; /* PMU model */ int pme_count; /* number of events */ int max_encoding; /* max number of uint64_t to encode an event */ int flags; /* 16 LSB: common, 16 MSB: arch spec*/ int pmu_rev; /* PMU model specific revision */ int num_cntrs; /* number of generic counters */ int num_fixed_cntrs; /* number of fixed counters */ int supported_plm; /* supported priv levels */ pfm_pmu_type_t type; /* PMU type */ const void *pe; /* pointer to event table */ const pfmlib_attr_desc_t *atdesc; /* pointer to attrs table */ int (*pmu_detect)(void *this); int (*pmu_init)(void *this); /* optional */ void (*pmu_terminate)(void *this); /* optional */ int (*get_event_first)(void *this); int (*get_event_next)(void *this, int pidx); int (*get_event_info)(void *this, int pidx, pfm_event_info_t *info); unsigned int (*get_event_nattrs)(void *this, int pidx); int (*event_is_valid)(void *this, int pidx); int (*can_auto_encode)(void *this, int pidx, int uidx); int (*get_event_attr_info)(void *this, int pidx, int umask_idx, pfm_event_attr_info_t *info); int (*get_event_encoding[PFM_OS_MAX])(void *this, pfmlib_event_desc_t *e); void (*validate_pattrs[PFM_OS_MAX])(void *this, pfmlib_event_desc_t *e); int (*os_detect[PFM_OS_MAX])(void *this); int (*validate_table)(void *this, FILE *fp); int (*get_num_events)(void *this); /* optional */ void (*display_reg)(void *this, pfmlib_event_desc_t *e, void *val); /* optional */ } pfmlib_pmu_t; typedef struct { const char *name; const pfmlib_attr_desc_t *atdesc; pfm_os_t id; int flags; int (*detect)(void *this); int (*get_os_attr_info)(void *this, pfmlib_event_desc_t *e); int (*get_os_nattrs)(void *this, pfmlib_event_desc_t *e); int (*encode)(void *this, const char *str, int dfl_plm, void *args); } pfmlib_os_t; #define PFMLIB_OS_FL_ACTIVATED 0x1 /* OS layer detected */ /* * pfmlib_pmu_t common flags (LSB 16 bits) */ #define PFMLIB_PMU_FL_INIT 0x1 /* PMU initialized correctly */ #define PFMLIB_PMU_FL_ACTIVE 0x2 /* PMU is initialized + detected on host */ #define PFMLIB_PMU_FL_RAW_UMASK 0x4 /* PMU supports PFM_ATTR_RAW_UMASKS */ #define PFMLIB_PMU_FL_ARCH_DFL 0x8 /* PMU is arch default */ #define PFMLIB_PMU_FL_NO_SMPL 0x10 /* PMU does not support sampling */ typedef struct { int initdone; int verbose; int debug; int inactive; char *forced_pmu; FILE *fp; /* verbose and debug file descriptor, default stderr or PFMLIB_DEBUG_STDOUT */ } pfmlib_config_t; #define PFMLIB_INITIALIZED() (pfm_cfg.initdone) extern pfmlib_config_t pfm_cfg; extern void __pfm_vbprintf(const char *fmt,...); extern void __pfm_dbprintf(const char *fmt,...); extern void pfmlib_strconcat(char *str, size_t max, const char *fmt, ...); extern int pfmlib_getl(char **buffer, size_t *len, FILE *fp); extern void pfmlib_compact_pattrs(pfmlib_event_desc_t *e, int i); #define evt_strcat(str, fmt, a...) pfmlib_strconcat(str, PFMLIB_EVT_MAX_NAME_LEN, fmt, a) extern int pfmlib_parse_event(const char *event, pfmlib_event_desc_t *d); extern int pfmlib_build_fstr(pfmlib_event_desc_t *e, char **fstr); extern void pfmlib_sort_attr(pfmlib_event_desc_t *e); extern pfmlib_pmu_t * pfmlib_get_pmu_by_type(pfm_pmu_type_t t); extern void pfmlib_release_event(pfmlib_event_desc_t *e); extern size_t pfmlib_check_struct(void *st, size_t usz, size_t refsz, size_t sz); #ifdef CONFIG_PFMLIB_DEBUG #define DPRINT(fmt, a...) \ do { \ __pfm_dbprintf("%s (%s.%d): " fmt, __FILE__, __func__, __LINE__, ## a); \ } while (0) #else #define DPRINT(fmt, a...) #endif extern pfmlib_pmu_t montecito_support; extern pfmlib_pmu_t itanium2_support; extern pfmlib_pmu_t itanium_support; extern pfmlib_pmu_t generic_ia64_support; extern pfmlib_pmu_t amd64_k7_support; extern pfmlib_pmu_t amd64_k8_revb_support; extern pfmlib_pmu_t amd64_k8_revc_support; extern pfmlib_pmu_t amd64_k8_revd_support; extern pfmlib_pmu_t amd64_k8_reve_support; extern pfmlib_pmu_t amd64_k8_revf_support; extern pfmlib_pmu_t amd64_k8_revg_support; extern pfmlib_pmu_t amd64_fam10h_barcelona_support; extern pfmlib_pmu_t amd64_fam10h_shanghai_support; extern pfmlib_pmu_t amd64_fam10h_istanbul_support; extern pfmlib_pmu_t amd64_fam11h_turion_support; extern pfmlib_pmu_t amd64_fam12h_llano_support; extern pfmlib_pmu_t amd64_fam14h_bobcat_support; extern pfmlib_pmu_t amd64_fam15h_interlagos_support; extern pfmlib_pmu_t intel_p6_support; extern pfmlib_pmu_t intel_ppro_support; extern pfmlib_pmu_t intel_pii_support; extern pfmlib_pmu_t intel_pm_support; extern pfmlib_pmu_t sicortex_support; extern pfmlib_pmu_t netburst_support; extern pfmlib_pmu_t netburst_p_support; extern pfmlib_pmu_t intel_coreduo_support; extern pfmlib_pmu_t intel_core_support; extern pfmlib_pmu_t intel_x86_arch_support; extern pfmlib_pmu_t intel_atom_support; extern pfmlib_pmu_t intel_nhm_support; extern pfmlib_pmu_t intel_nhm_ex_support; extern pfmlib_pmu_t intel_nhm_unc_support; extern pfmlib_pmu_t intel_snb_support; extern pfmlib_pmu_t intel_snb_unc_cbo0_support; extern pfmlib_pmu_t intel_snb_unc_cbo1_support; extern pfmlib_pmu_t intel_snb_unc_cbo2_support; extern pfmlib_pmu_t intel_snb_unc_cbo3_support; extern pfmlib_pmu_t intel_snb_ep_support; extern pfmlib_pmu_t intel_ivb_support; extern pfmlib_pmu_t intel_ivb_unc_cbo0_support; extern pfmlib_pmu_t intel_ivb_unc_cbo1_support; extern pfmlib_pmu_t intel_ivb_unc_cbo2_support; extern pfmlib_pmu_t intel_ivb_unc_cbo3_support; extern pfmlib_pmu_t intel_ivb_ep_support; extern pfmlib_pmu_t intel_hsw_support; extern pfmlib_pmu_t intel_snbep_unc_cb0_support; extern pfmlib_pmu_t intel_snbep_unc_cb1_support; extern pfmlib_pmu_t intel_snbep_unc_cb2_support; extern pfmlib_pmu_t intel_snbep_unc_cb3_support; extern pfmlib_pmu_t intel_snbep_unc_cb4_support; extern pfmlib_pmu_t intel_snbep_unc_cb5_support; extern pfmlib_pmu_t intel_snbep_unc_cb6_support; extern pfmlib_pmu_t intel_snbep_unc_cb7_support; extern pfmlib_pmu_t intel_snbep_unc_ha_support; extern pfmlib_pmu_t intel_snbep_unc_imc0_support; extern pfmlib_pmu_t intel_snbep_unc_imc1_support; extern pfmlib_pmu_t intel_snbep_unc_imc2_support; extern pfmlib_pmu_t intel_snbep_unc_imc3_support; extern pfmlib_pmu_t intel_snbep_unc_pcu_support; extern pfmlib_pmu_t intel_snbep_unc_qpi0_support; extern pfmlib_pmu_t intel_snbep_unc_qpi1_support; extern pfmlib_pmu_t intel_snbep_unc_ubo_support; extern pfmlib_pmu_t intel_snbep_unc_r2pcie_support; extern pfmlib_pmu_t intel_snbep_unc_r3qpi0_support; extern pfmlib_pmu_t intel_snbep_unc_r3qpi1_support; extern pfmlib_pmu_t intel_knc_support; extern pfmlib_pmu_t power4_support; extern pfmlib_pmu_t ppc970_support; extern pfmlib_pmu_t ppc970mp_support; extern pfmlib_pmu_t power5_support; extern pfmlib_pmu_t power5p_support; extern pfmlib_pmu_t power6_support; extern pfmlib_pmu_t power7_support; extern pfmlib_pmu_t power8_support; extern pfmlib_pmu_t torrent_support; extern pfmlib_pmu_t sparc_support; extern pfmlib_pmu_t sparc_ultra12_support; extern pfmlib_pmu_t sparc_ultra3_support; extern pfmlib_pmu_t sparc_ultra3i_support; extern pfmlib_pmu_t sparc_ultra3plus_support; extern pfmlib_pmu_t sparc_ultra4plus_support; extern pfmlib_pmu_t sparc_niagara1_support; extern pfmlib_pmu_t sparc_niagara2_support; extern pfmlib_pmu_t cell_support; extern pfmlib_pmu_t perf_event_support; extern pfmlib_pmu_t intel_wsm_sp_support; extern pfmlib_pmu_t intel_wsm_dp_support; extern pfmlib_pmu_t intel_wsm_unc_support; extern pfmlib_pmu_t arm_cortex_a8_support; extern pfmlib_pmu_t arm_cortex_a9_support; extern pfmlib_pmu_t arm_cortex_a15_support; extern pfmlib_pmu_t arm_1176_support; extern pfmlib_pmu_t mips_74k_support; extern pfmlib_pmu_t s390x_cpum_cf_support; extern pfmlib_os_t *pfmlib_os; extern pfmlib_os_t pfmlib_os_perf; extern pfmlib_os_t pfmlib_os_perf_ext; extern char *pfmlib_forced_pmu; #define this_pe(t) (((pfmlib_pmu_t *)t)->pe) #define this_atdesc(t) (((pfmlib_pmu_t *)t)->atdesc) #define LIBPFM_ARRAY_SIZE(a) (sizeof(a) / sizeof(typeof(*(a)))) /* * population count (number of bits set) */ static inline int pfmlib_popcnt(unsigned long v) { int sum = 0; for(; v ; v >>=1) { if (v & 0x1) sum++; } return sum; } /* * find next bit set */ static inline size_t pfmlib_fnb(unsigned long value, size_t nbits, int p) { unsigned long m; size_t i; for(i=p; i < nbits; i++) { m = 1 << i; if (value & m) return i; } return i; } /* * PMU + internal idx -> external opaque idx */ static inline int pfmlib_pidx2idx(pfmlib_pmu_t *pmu, int pidx) { int idx; idx = pmu->pmu << PFMLIB_PMU_SHIFT; idx |= pidx; return idx; } #define pfmlib_for_each_bit(x, m) \ for((x) = pfmlib_fnb((m), (sizeof(m)<<3), 0); (x) < (sizeof(m)<<3); (x) = pfmlib_fnb((m), (sizeof(m)<<3), (x)+1)) #ifdef __linux__ #define PFMLIB_VALID_PERF_PATTRS(f) \ .validate_pattrs[PFM_OS_PERF_EVENT] = f, \ .validate_pattrs[PFM_OS_PERF_EVENT_EXT] = f #define PFMLIB_ENCODE_PERF(f) \ .get_event_encoding[PFM_OS_PERF_EVENT] = f, \ .get_event_encoding[PFM_OS_PERF_EVENT_EXT] = f #define PFMLIB_OS_DETECT(f) \ .os_detect[PFM_OS_PERF_EVENT] = f, \ .os_detect[PFM_OS_PERF_EVENT_EXT] = f #else #define PFMLIB_VALID_PERF_PATTRS(f) \ .validate_pattrs[PFM_OS_PERF_EVENT] = NULL, \ .validate_pattrs[PFM_OS_PERF_EVENT_EXT] = NULL #define PFMLIB_ENCODE_PERF(f) \ .get_event_encoding[PFM_OS_PERF_EVENT] = NULL, \ .get_event_encoding[PFM_OS_PERF_EVENT_EXT] = NULL #define PFMLIB_OS_DETECT(f) \ .os_detect[PFM_OS_PERF_EVENT] = NULL, \ .os_detect[PFM_OS_PERF_EVENT_EXT] = NULL #endif static inline int is_empty_attr(const pfmlib_attr_desc_t *a) { return !a || !a->name || strlen(a->name) == 0 ? 1 : 0; } #endif /* __PFMLIB_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_powerpc.c0000600003276200002170000000612112247131124020205 0ustar ralphundrgrad/* * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * * pfmlib_gen_powerpc.c * * Support for libpfm4 for the PowerPC 970, 970MP, Power4,4+,5,5+,6,7 processors. */ #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" int pfm_gen_powerpc_get_event_info(void *this, int pidx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const pme_power_entry_t *pe = this_pe(this); /* * pmu and idx filled out by caller */ info->name = pe[pidx].pme_name; info->desc = pe[pidx].pme_long_desc; info->code = pe[pidx].pme_code; info->equiv = NULL; info->idx = pidx; /* private index */ info->pmu = pmu->pmu; info->is_precise = 0; info->nattrs = 0; return PFM_SUCCESS; } int pfm_gen_powerpc_get_event_attr_info(void *this, int pidx, int umask_idx, pfm_event_attr_info_t *info) { /* No attributes are supported */ return PFM_ERR_ATTR; } int pfm_gen_powerpc_get_encoding(void *this, pfmlib_event_desc_t *e) { const pme_power_entry_t *pe = this_pe(this); e->count = 1; e->codes[0] = (uint64_t)pe[e->event].pme_code; evt_strcat(e->fstr, "%s", pe[e->event].pme_name); return PFM_SUCCESS; } int pfm_gen_powerpc_get_event_first(void *this) { return 0; } int pfm_gen_powerpc_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } int pfm_gen_powerpc_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } int pfm_gen_powerpc_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const pme_power_entry_t *pe = this_pe(this); int i; int ret = PFM_ERR_INVAL; for(i=0; i < pmu->pme_count; i++) { if (!pe[i].pme_name) { fprintf(fp, "pmu: %s event%d: :: no name\n", pmu->name, i); goto error; } if (!pe[i].pme_long_desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", pmu->name, i, pe[i].pme_name); goto error; } } ret = PFM_SUCCESS; error: return ret; } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_qpi.c0000600003276200002170000000617112247131124022553 0ustar ralphundrgrad/* * pfmlib_intel_snbep_qpi.c : Intel SandyBridge-EP QPI uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_qpi_events.h" static void display_qpi(void *this, pfmlib_event_desc_t *e, void *val) { const intel_x86_entry_t *pe = this_pe(this); pfm_snbep_unc_reg_t *reg = val; __pfm_vbprintf("[UNC_QPI=0x%"PRIx64" event=0x%x sel_ext=%d umask=0x%x en=%d " "inv=%d edge=%d thres=%d] %s\n", reg->val, reg->qpi.unc_event, reg->qpi.unc_event_ext, reg->qpi.unc_umask, reg->qpi.unc_en, reg->qpi.unc_inv, reg->qpi.unc_edge, reg->qpi.unc_thres, pe[e->event].name); } #define DEFINE_QPI_BOX(n) \ pfmlib_pmu_t intel_snbep_unc_qpi##n##_support = {\ .desc = "Intel Sandy Bridge-EP QPI"#n" uncore",\ .name = "snbep_unc_qpi"#n,\ .perf_name = "uncore_qpi_"#n,\ .pmu = PFM_PMU_INTEL_SNBEP_UNC_QPI##n,\ .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_q_pe),\ .type = PFM_PMU_TYPE_UNCORE,\ .num_cntrs = 4,\ .num_fixed_cntrs = 0,\ .max_encoding = 3,\ .pe = intel_snbep_unc_q_pe,\ .atdesc = snbep_unc_mods,\ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_intel_snbep_unc_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding),\ .get_event_first = pfm_intel_x86_get_event_first,\ .get_event_next = pfm_intel_x86_get_event_next,\ .event_is_valid = pfm_intel_x86_event_is_valid,\ .validate_table = pfm_intel_x86_validate_table,\ .get_event_info = pfm_intel_x86_get_event_info,\ .get_event_attr_info = pfm_intel_x86_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs),\ .get_event_nattrs = pfm_intel_x86_get_event_nattrs,\ .display_reg = display_qpi,\ } DEFINE_QPI_BOX(0); DEFINE_QPI_BOX(1); papi-5.3.0/src/libpfm4/lib/pfmlib_intel_knc.c0000600003276200002170000000510512247131124020475 0ustar ralphundrgrad/* * pfmlib_intel_knc.c : Intel Knights Corner (Xeon Phi) * * Copyright (c) 2012, Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_intel_x86_priv.h" /* architecture private */ #include "events/intel_knc_events.h" static int pfm_knc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 11) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 1: /* Knights Corner */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } pfmlib_pmu_t intel_knc_support={ .desc = "Intel Knights Corner", .name = "knc", .pmu = PFM_PMU_INTEL_KNC, .pme_count = LIBPFM_ARRAY_SIZE(intel_knc_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 2, .max_encoding = 1, .pe = intel_knc_pe, .atdesc = intel_x86_mods, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_knc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_powerpc_perf_event.c0000600003276200002170000000517712247131124022434 0ustar ralphundrgrad/* * pfmlib_powerpc_perf_event.c : perf_event IBM Power/Torrent functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_power_priv.h" /* architecture private */ #include "pfmlib_perf_event_priv.h" int pfm_gen_powerpc_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; /* * encoding routine changes based on PMU model */ ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; attr->type = PERF_TYPE_RAW; attr->config = e->codes[0]; return PFM_SUCCESS; } void pfm_gen_powerpc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * remove PMU-provided attributes which are either * not accessible under perf_events or fully controlled * by perf_events, e.g., priv levels filters */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { } /* * remove perf_event generic attributes not supported * by PPC */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* no precise sampling */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_netburst_perf_event.c0000600003276200002170000000560412247131124024011 0ustar ralphundrgrad/* pfmlib_intel_netburst_perf_event.c : perf_event Intel Netburst functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file implements the common code for all Intel X86 processors. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_netburst_priv.h" #include "pfmlib_perf_event_priv.h" int pfm_netburst_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { const netburst_entry_t *pe = this_pe(this); struct perf_event_attr *attr = e->os_data; int perf_code = pe[e->event].perf_code; uint64_t escr; int ret; ret = pfm_netburst_get_encoding(this, e); if (ret != PFM_SUCCESS) return ret; attr->type = PERF_TYPE_RAW; /* * codes[0] = ESCR * codes[1] = CCCR * * cleanup event_select, and install perf specific code */ escr = e->codes[0] & ~(0x3full << 25); escr |= perf_code << 25; attr->config = (escr << 32) | e->codes[1]; return PFM_SUCCESS; } void pfm_netburst_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * with perf_events, u and k are handled at the OS level * via exclude_user, exclude_kernel. */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { if (e->pattrs[i].idx == NETBURST_ATTR_U || e->pattrs[i].idx == NETBURST_ATTR_K) compact = 1; } if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* no PEBS support (for now) */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; /* * No hypervisor on Intel */ if (e->pattrs[i].idx == PERF_ATTR_H) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_perf_event.c0000600003276200002170000000753112247131124024120 0ustar ralphundrgrad/* pfmlib_intel_snbep_unc_perf.c : perf_events SNB-EP uncore support * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "pfmlib_perf_event_priv.h" static int find_pmu_type_by_name(const char *name) { char filename[PATH_MAX]; FILE *fp; int ret, type; if (!name) return PFM_ERR_NOTSUPP; sprintf(filename, "/sys/bus/event_source/devices/%s/type", name); fp = fopen(filename, "r"); if (!fp) return PFM_ERR_NOTSUPP; ret = fscanf(fp, "%d", &type); if (ret != 1) type = PFM_ERR_NOTSUPP; fclose(fp); return type; } int pfm_intel_snbep_unc_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; pfm_intel_x86_reg_t reg; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; ret = find_pmu_type_by_name(pmu->perf_name); if (ret < 0) return ret; attr->type = ret; reg.val = e->codes[0]; attr->config = reg.val; /* * various filters */ if (e->count == 2) attr->config1 = e->codes[1]; if (e->count == 3) attr->config2 = e->codes[2]; /* * uncore measures at all priv levels * * user cannot set per-event priv levels because * attributes are simply not there * * dfl_plm is ignored in this case */ attr->exclude_hv = 0; attr->exclude_kernel = 0; attr->exclude_user = 0; return PFM_SUCCESS; } void pfm_intel_snbep_unc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; int no_smpl = pmu->flags & PFMLIB_PMU_FL_NO_SMPL; int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* No precise sampling mode for uncore */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; /* * No hypervisor for uncore */ if (e->pattrs[i].idx == PERF_ATTR_H) compact = 1; if (no_smpl && ( e->pattrs[i].idx == PERF_ATTR_FR || e->pattrs[i].idx == PERF_ATTR_PR || e->pattrs[i].idx == PERF_ATTR_PE)) compact = 1; /* * uncore has no priv level support */ if (pmu->supported_plm == 0 && ( e->pattrs[i].idx == PERF_ATTR_U || e->pattrs[i].idx == PERF_ATTR_K || e->pattrs[i].idx == PERF_ATTR_MG || e->pattrs[i].idx == PERF_ATTR_MH)) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_power4.c0000600003276200002170000000436012247131124017751 0ustar ralphundrgrad/* * pfmlib_power4.c : IBM Power4 support * * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/power4_events.h" static int pfm_power4_detect(void* this) { if (__is_processor(PV_POWER4) || __is_processor(PV_POWER4p)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t power4_support={ .desc = "POWER4", .name = "power4", .pmu = PFM_PMU_POWER4, .pme_count = LIBPFM_ARRAY_SIZE(power4_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 8, .max_encoding = 1, .pe = power4_pe, .pmu_detect = pfm_power4_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_montecito_priv.h0000600003276200002170000001274012247131124021600 0ustar ralphundrgrad/* * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_MONTECITO_PRIV_H__ #define __PFMLIB_MONTECITO_PRIV_H__ /* * Event type definitions * * The virtual events are not really defined in the specs but are an artifact used * to quickly and easily setup EAR and/or BTB. The event type encodes the exact feature * which must be configured in combination with a counting monitor. * For instance, DATA_EAR_CACHE_LAT4 is a virtual D-EAR cache event. If the user * requests this event, this will configure a counting monitor to count DATA_EAR_EVENTS * and PMC11 will be configured for cache mode. The latency is encoded in the umask, here * it would correspond to 4 cycles. * */ #define PFMLIB_MONT_EVENT_NORMAL 0x0 /* standard counter */ #define PFMLIB_MONT_EVENT_ETB 0x1 /* virtual event used with ETB configuration */ #define PFMLIB_MONT_EVENT_IEAR_TLB 0x2 /* virtual event used for I-EAR TLB configuration */ #define PFMLIB_MONT_EVENT_IEAR_CACHE 0x3 /* virtual event used for I-EAR cache configuration */ #define PFMLIB_MONT_EVENT_DEAR_TLB 0x4 /* virtual event used for D-EAR TLB configuration */ #define PFMLIB_MONT_EVENT_DEAR_CACHE 0x5 /* virtual event used for D-EAR cache configuration */ #define PFMLIB_MONT_EVENT_DEAR_ALAT 0x6 /* virtual event used for D-EAR ALAT configuration */ #define event_is_ear(e) ((e)->pme_type >= PFMLIB_MONT_EVENT_IEAR_TLB &&(e)->pme_type <= PFMLIB_MONT_EVENT_DEAR_ALAT) #define event_is_iear(e) ((e)->pme_type == PFMLIB_MONT_EVENT_IEAR_TLB || (e)->pme_type == PFMLIB_MONT_EVENT_IEAR_CACHE) #define event_is_dear(e) ((e)->pme_type >= PFMLIB_MONT_EVENT_DEAR_TLB && (e)->pme_type <= PFMLIB_MONT_EVENT_DEAR_ALAT) #define event_is_ear_cache(e) ((e)->pme_type == PFMLIB_MONT_EVENT_DEAR_CACHE || (e)->pme_type == PFMLIB_MONT_EVENT_IEAR_CACHE) #define event_is_ear_tlb(e) ((e)->pme_type == PFMLIB_MONT_EVENT_IEAR_TLB || (e)->pme_type == PFMLIB_MONT_EVENT_DEAR_TLB) #define event_is_ear_alat(e) ((e)->pme_type == PFMLIB_MONT_EVENT_DEAR_ALAT) #define event_is_etb(e) ((e)->pme_type == PFMLIB_MONT_EVENT_ETB) /* * Itanium encoding structure * (code must be first 8 bits) */ typedef struct { unsigned long pme_code:8; /* major event code */ unsigned long pme_type:3; /* see definitions above */ unsigned long pme_caf:2; /* Active, Floating, Causal, Self-Floating */ unsigned long pme_ig1:3; /* ignored */ unsigned long pme_umask:16; /* unit mask*/ unsigned long pme_ig:32; /* ignored */ } pme_mont_entry_code_t; typedef union { unsigned long pme_vcode; pme_mont_entry_code_t pme_mont_code; /* must not be larger than vcode */ } pme_mont_code_t; typedef union { unsigned long qual; /* generic qualifier */ struct { unsigned long pme_iar:1; /* instruction address range supported */ unsigned long pme_opm:1; /* opcode match supported */ unsigned long pme_dar:1; /* data address range supported */ unsigned long pme_all:1; /* supports all_thrd=1 */ unsigned long pme_mesi:1; /* event supports MESI */ unsigned long pme_res1:11; /* reserved */ unsigned long pme_group:3; /* event group */ unsigned long pme_set:4; /* event set*/ unsigned long pme_res2:41; /* reserved */ } pme_qual; } pme_mont_qualifiers_t; typedef struct { char *pme_name; pme_mont_code_t pme_entry_code; unsigned long pme_counters; /* supported counters */ unsigned int pme_maxincr; pme_mont_qualifiers_t pme_qualifiers; char *pme_desc; /* text description of the event */ } pme_mont_entry_t; /* * We embed the umask value into the event code. Because it really is * like a subevent. * pme_code: * - lower 16 bits: major event code * - upper 16 bits: unit mask */ #define pme_code pme_entry_code.pme_mont_code.pme_code #define pme_umask pme_entry_code.pme_mont_code.pme_umask #define pme_used pme_qualifiers.pme_qual_struct.pme_used #define pme_type pme_entry_code.pme_mont_code.pme_type #define pme_caf pme_entry_code.pme_mont_code.pme_caf #define event_opcm_ok(e) ((e)->pme_qualifiers.pme_qual.pme_opm==1) #define event_iarr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_iar==1) #define event_darr_ok(e) ((e)->pme_qualifiers.pme_qual.pme_dar==1) #define event_all_ok(e) ((e)->pme_qualifiers.pme_qual.pme_all==1) #define event_mesi_ok(e) ((e)->pme_qualifiers.pme_qual.pme_mesi==1) #endif /* __PFMLIB_MONTECITO_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_cbo.c0000600003276200002170000000707412247131124022530 0ustar ralphundrgrad/* * pfmlib_intel_snb_unc_cbo.c : Intel SandyBridge-EP C-Box uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_cbo_events.h" static void display_cbo(void *this, pfmlib_event_desc_t *e, void *val) { const intel_x86_entry_t *pe = this_pe(this); pfm_snbep_unc_reg_t *reg = val; pfm_snbep_unc_reg_t f; __pfm_vbprintf("[UNC_CBO=0x%"PRIx64" event=0x%x umask=0x%x en=%d " "inv=%d edge=%d thres=%d tid_en=%d] %s\n", reg->val, reg->cbo.unc_event, reg->cbo.unc_umask, reg->cbo.unc_en, reg->cbo.unc_inv, reg->cbo.unc_edge, reg->cbo.unc_thres, reg->cbo.unc_tid, pe[e->event].name); if (e->count == 1) return; f.val = e->codes[1]; __pfm_vbprintf("[UNC_CBOX_FILTER=0x%"PRIx64" tid=%d core=0x%x nid=0x%x" " state=0x%x opc=0x%x]\n", f.val, f.cbo_filt.tid, f.cbo_filt.cid, f.cbo_filt.nid, f.cbo_filt.state, f.cbo_filt.opc); } #define DEFINE_C_BOX(n) \ pfmlib_pmu_t intel_snbep_unc_cb##n##_support = {\ .desc = "Intel Sandy Bridge-EP C-Box "#n" uncore",\ .name = "snbep_unc_cbo"#n,\ .perf_name = "uncore_cbox_"#n,\ .pmu = PFM_PMU_INTEL_SNBEP_UNC_CB##n,\ .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_c_pe),\ .type = PFM_PMU_TYPE_UNCORE,\ .num_cntrs = 4,\ .num_fixed_cntrs = 0,\ .max_encoding = 2,\ .pe = intel_snbep_unc_c_pe,\ .atdesc = snbep_unc_mods,\ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_intel_snbep_unc_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding),\ .get_event_first = pfm_intel_x86_get_event_first,\ .get_event_next = pfm_intel_x86_get_event_next,\ .event_is_valid = pfm_intel_x86_event_is_valid,\ .validate_table = pfm_intel_x86_validate_table,\ .get_event_info = pfm_intel_x86_get_event_info,\ .get_event_attr_info = pfm_intel_snbep_unc_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs),\ .get_event_nattrs = pfm_intel_x86_get_event_nattrs,\ .can_auto_encode = pfm_intel_x86_can_auto_encode, \ .display_reg = display_cbo,\ } DEFINE_C_BOX(0); DEFINE_C_BOX(1); DEFINE_C_BOX(2); DEFINE_C_BOX(3); DEFINE_C_BOX(4); DEFINE_C_BOX(5); DEFINE_C_BOX(6); DEFINE_C_BOX(7); papi-5.3.0/src/libpfm4/lib/pfmlib_mips.c0000600003276200002170000002142712247131124017504 0ustar ralphundrgrad/* * pfmlib_mips.c : support for MIPS chips * * Copyright (c) 2011 Samara Technology Group, Inc * Contributed by Philip Mucci * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_mips_priv.h" pfm_mips_config_t pfm_mips_cfg; static const pfmlib_attr_desc_t mips_mods[]={ PFM_ATTR_B("k", "monitor at system level"), PFM_ATTR_B("u", "monitor at user level"), PFM_ATTR_B("s", "monitor at supervisor level"), PFM_ATTR_B("e", "monitor at exception level "), PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; #ifdef CONFIG_PFMLIB_OS_LINUX /* * helper function to retrieve one value from /proc/cpuinfo * for internal libpfm use only * attr: the attribute (line) to look for * ret_buf: a buffer to store the value of the attribute (as a string) * maxlen : number of bytes of capacity in ret_buf * * ret_buf is null terminated. * * Return: * 0 : attribute found, ret_buf populated * -1: attribute not found */ static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { FILE *fp = NULL; int ret = -1; size_t attr_len, buf_len = 0; char *p, *value = NULL; char *buffer = NULL; if (attr == NULL || ret_buf == NULL || maxlen < 1) return -1; attr_len = strlen(attr); fp = fopen("/proc/cpuinfo", "r"); if (fp == NULL) return -1; while(pfmlib_getl(&buffer, &buf_len, fp) != -1){ /* skip blank lines */ if (*buffer == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) goto error; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncmp(attr, buffer, attr_len)) break; } strncpy(ret_buf, value, maxlen-1); ret_buf[maxlen-1] = '\0'; ret = 0; error: free(buffer); fclose(fp); return ret; } #else static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { DPRINT("/proc/cpuinfo ignored\n"); } #endif static void pfm_mips_display_reg(pfm_mips_sel_reg_t reg, uint64_t cntrs, char *fstr) { __pfm_vbprintf("[0x%"PRIx64" mask=0x%x usr=%d sys=%d sup=%d int=%d cntrs=0x%"PRIx64"] %s\n", reg.val, reg.perfsel64.sel_event_mask, reg.perfsel64.sel_usr, reg.perfsel64.sel_os, reg.perfsel64.sel_sup, reg.perfsel64.sel_exl, cntrs, fstr); } int pfm_mips_detect(void *this) { int ret; char buffer[1024]; DPRINT("mips_detect\n"); ret = pfmlib_getcpuinfo_attr("cpu model", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; if (strstr(buffer,"MIPS") == NULL) return PFM_ERR_NOTSUPP; strncpy(pfm_mips_cfg.model,buffer,strlen(buffer)); /* ret = pfmlib_getcpuinfo_attr("CPU implementer", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_mips_cfg.implementer = strtol(buffer, NULL, 16); ret = pfmlib_getcpuinfo_attr("CPU part", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_mips_cfg.part = strtol(buffer, NULL, 16); ret = pfmlib_getcpuinfo_attr("CPU architecture", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_mips_cfg.architecture = strtol(buffer, NULL, 16); */ return PFM_SUCCESS; } int pfm_mips_get_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; const mips_entry_t *pe = this_pe(this); pfm_event_attr_info_t *a; pfm_mips_sel_reg_t reg; uint64_t ival, cntmask = 0; int plmmsk = 0, code; int k, id; reg.val = 0; code = pe[e->event].code; /* truncates bit 7 (counter info) */ reg.perfsel64.sel_event_mask = code; for (k = 0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; ival = e->attrs[k].ival; switch(a->idx) { case MIPS_ATTR_K: /* os */ reg.perfsel64.sel_os = !!ival; plmmsk |= _MIPS_ATTR_K; break; case MIPS_ATTR_U: /* user */ reg.perfsel64.sel_usr = !!ival; plmmsk |= _MIPS_ATTR_U; break; case MIPS_ATTR_S: /* supervisor */ reg.perfsel64.sel_sup = !!ival; plmmsk |= _MIPS_ATTR_S; break; case MIPS_ATTR_E: /* int */ reg.perfsel64.sel_exl = !!ival; plmmsk |= _MIPS_ATTR_E; } } /* * handle case where no priv level mask was passed. * then we use the dfl_plm */ if (!(plmmsk & MIPS_PLM_ALL)) { if (e->dfl_plm & PFM_PLM0) reg.perfsel64.sel_os = 1; if (e->dfl_plm & PFM_PLM1) reg.perfsel64.sel_sup = 1; if (e->dfl_plm & PFM_PLM2) reg.perfsel64.sel_exl = 1; if (e->dfl_plm & PFM_PLM3) reg.perfsel64.sel_usr = 1; } evt_strcat(e->fstr, "%s", pe[e->event].name); for (k = 0; k < e->npattrs; k++) { if (e->pattrs[k].ctrl != PFM_ATTR_CTRL_PMU) continue; id = e->pattrs[k].idx; switch(id) { case MIPS_ATTR_K: evt_strcat(e->fstr, ":%s=%lu", mips_mods[id].name, reg.perfsel64.sel_os); break; case MIPS_ATTR_U: evt_strcat(e->fstr, ":%s=%lu", mips_mods[id].name, reg.perfsel64.sel_usr); break; case MIPS_ATTR_S: evt_strcat(e->fstr, ":%s=%lu", mips_mods[id].name, reg.perfsel64.sel_sup); break; case MIPS_ATTR_E: evt_strcat(e->fstr, ":%s=%lu", mips_mods[id].name, reg.perfsel64.sel_exl); break; } } e->codes[0] = reg.val; /* cycles and instructions support all counters */ if (code == 0 || code == 1) { cntmask = (1ULL << pmu->num_cntrs) -1; } else { /* event work on odd counters only */ for (k = !!(code & 0x80) ; k < pmu->num_cntrs; k+=2) { cntmask |= 1ULL << k; } } e->codes[1] = cntmask; e->count = 2; pfm_mips_display_reg(reg, cntmask, e->fstr); return PFM_SUCCESS; } int pfm_mips_get_event_first(void *this) { return 0; } int pfm_mips_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } int pfm_mips_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } int pfm_mips_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const mips_entry_t *pe = this_pe(this); int i, j, error = 0; for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", pmu->name, i, pe[i].name); error++; } for (j=i+1; j < pmu->pme_count; j++) { if (pe[i].code == pe[j].code) { fprintf(fp, "pmu: %s events %s and %s have the same code 0x%x\n", pmu->name, pe[i].name, pe[j].name, pe[i].code); error++; } } } if (!pmu->supported_plm) { fprintf(fp, "pmu: %s supported_plm=0, is that right?\n", pmu->name); error++; } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } unsigned int pfm_mips_get_event_nattrs(void *this, int pidx) { /* assume all pmus have the same number of attributes */ return MIPS_NUM_ATTRS; } int pfm_mips_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { /* no umasks, so all attrs are modifiers */ info->name = mips_mods[attr_idx].name; info->desc = mips_mods[attr_idx].desc; info->type = mips_mods[attr_idx].type; info->type = mips_mods[attr_idx].type; info->equiv= NULL; info->idx = attr_idx; /* private index */ info->code = attr_idx; info->is_dfl = 0; info->is_precise = 0; info->ctrl = PFM_ATTR_CTRL_PMU;; return PFM_SUCCESS; } int pfm_mips_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const mips_entry_t *pe = this_pe(this); info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].code; info->equiv = NULL; info->idx = idx; /* private index */ info->pmu = pmu->pmu; info->is_precise = 0; /* no attributes defined for MIPS yet */ info->nattrs = pfm_mips_get_event_nattrs(this, idx); return PFM_SUCCESS; } papi-5.3.0/src/libpfm4/lib/pfmlib_sicortex_priv.h0000600003276200002170000001111312247131124021430 0ustar ralphundrgrad/* * Contributed by Philip Mucci based on code from * Copyright (c) 2004-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux/ia64. */ #ifndef __PFMLIB_SICORTEX_PRIV_H__ #define __PFMLIB_SICORTEX_PRIV_H__ #include "pfmlib_gen_mips64_priv.h" #define PFMLIB_SICORTEX_MAX_UMASK 5 typedef struct { char *pme_uname; /* unit mask name */ char *pme_udesc; /* event/umask description */ unsigned int pme_ucode; /* unit mask code */ } pme_sicortex_umask_t; typedef struct { char *pme_name; char *pme_desc; /* text description of the event */ unsigned int pme_code; /* event mask, holds room for four events, low 8 bits cntr0, ... high 8 bits cntr3 */ unsigned int pme_counters; /* Which counter event lives on */ unsigned int pme_numasks; /* number of umasks */ pme_sicortex_umask_t pme_umasks[PFMLIB_SICORTEX_MAX_UMASK]; /* umask desc */ } pme_sicortex_entry_t; /* * SiCortex specific */ typedef union { uint64_t val; /* complete register value */ struct { unsigned long sel_exl:1; /* int level */ unsigned long sel_os:1; /* system level */ unsigned long sel_sup:1; /* supervisor level */ unsigned long sel_usr:1; /* user level */ unsigned long sel_int:1; /* enable intr */ unsigned long sel_event_mask:6; /* event mask */ unsigned long sel_res1:23; /* reserved */ unsigned long sel_res2:32; /* reserved */ } perfsel; } pfm_sicortex_sel_reg_t; #define PMU_SICORTEX_SCB_NUM_COUNTERS 256 typedef union { uint64_t val; struct { unsigned long Interval:4; unsigned long IntBit:5; unsigned long NoInc:1; unsigned long AddrAssert:1; unsigned long MagicEvent:2; unsigned long Reserved:19; } sicortex_ScbPerfCtl_reg; struct { unsigned long HistGte:20; unsigned long Reserved:12; } sicortex_ScbPerfHist_reg; struct { unsigned long Bucket:8; unsigned long Reserved:24; } sicortex_ScbPerfBuckNum_reg; struct { unsigned long ena:1; unsigned long Reserved:31; } sicortex_ScbPerfEna_reg; struct { unsigned long event:15; unsigned long hist:1; unsigned long ifOther:2; unsigned long Reserved:15; } sicortex_ScbPerfBucket_reg; } pmc_sicortex_scb_reg_t; typedef union { uint64_t val; struct { unsigned long Reserved:2; uint64_t VPCL:38; unsigned long VPCH:2; } sicortex_CpuPerfVPC_reg; struct { unsigned long Reserved:5; unsigned long PEA:31; unsigned long Reserved2:12; unsigned long ASID:8; unsigned long L2STOP:4; unsigned long L2STATE:3; unsigned long L2HIT:1; } sicortex_CpuPerfPEA_reg; } pmd_sicortex_cpu_reg_t; #define PFMLIB_SICORTEX_INPUT_SCB_NONE (unsigned long)0x0 #define PFMLIB_SICORTEX_INPUT_SCB_INTERVAL (unsigned long)0x1 #define PFMLIB_SICORTEX_INPUT_SCB_NOINC (unsigned long)0x2 #define PFMLIB_SICORTEX_INPUT_SCB_HISTGTE (unsigned long)0x4 #define PFMLIB_SICORTEX_INPUT_SCB_BUCKET (unsigned long)0x8 static pme_sicortex_umask_t sicortex_scb_umasks[PFMLIB_SICORTEX_MAX_UMASK] = { { "IFOTHER_NONE","Both buckets count independently",0x00 }, { "IFOTHER_AND","Increment where this event counts and the opposite bucket counts",0x02 }, { "IFOTHER_ANDNOT","Increment where this event counts and the opposite bucket does not",0x04 }, { "HIST_NONE","Count cycles where the event is asserted",0x0 }, { "HIST_EDGE","Histogram on edges of the specified event",0x1 } }; #endif /* __PFMLIB_GEN_MIPS64_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_arm_perf_event.c0000600003276200002170000000523112247131124021523 0ustar ralphundrgrad/* * pfmlib_arm_perf_event.c : perf_event ARM functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_arm_priv.h" #include "pfmlib_perf_event_priv.h" int pfm_arm_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; /* * use generic raw encoding function first */ ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; if (e->count > 1) { DPRINT("%s: unsupported count=%d\n", e->count); return PFM_ERR_NOTSUPP; } attr->type = PERF_TYPE_RAW; attr->config = e->codes[0]; return PFM_SUCCESS; } void pfm_arm_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { int i, compact; for (i = 0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * with perf_events, u and k, hv are handled at the OS * level via attr.exclude_* fields */ if (arm_has_plm(this, e) && e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { if ( e->pattrs[i].idx == ARM_ATTR_U || e->pattrs[i].idx == ARM_ATTR_K || e->pattrs[i].idx == ARM_ATTR_HV) compact = 1; } if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snb.c0000600003276200002170000001052212247131124020503 0ustar ralphundrgrad/* * pfmlib_intel_snb.c : Intel Sandy Bridge core PMU * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_snb_events.h" #include "events/intel_snbep_events.h" static int pfm_snb_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 42: /* Sandy Bridge (Core i7 26xx, 25xx) */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_snb_ep_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 45: /* Sandy Bridge EP */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_snb_init(void *this) { pfm_intel_x86_cfg.arch_version = 3; return PFM_SUCCESS; } pfmlib_pmu_t intel_snb_support={ .desc = "Intel Sandy Bridge", .name = "snb", .pmu = PFM_PMU_INTEL_SNB, .pme_count = LIBPFM_ARRAY_SIZE(intel_snb_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 8, /* consider with HT off by default */ .num_fixed_cntrs = 3, .max_encoding = 2, /* offcore_response */ .pe = intel_snb_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_snb_detect, .pmu_init = pfm_snb_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; pfmlib_pmu_t intel_snb_ep_support={ .desc = "Intel Sandy Bridge EP", .name = "snb_ep", .pmu = PFM_PMU_INTEL_SNB_EP, .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 8, /* consider with HT off by default */ .num_fixed_cntrs = 3, .max_encoding = 2, /* offcore_response */ .pe = intel_snbep_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_snb_ep_detect, .pmu_init = pfm_snb_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; papi-5.3.0/src/libpfm4/lib/pfmlib_ia64_priv.h0000600003276200002170000010033512247131124020340 0ustar ralphundrgrad/* * Copyright (c) 2003-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux/ia64. */ #ifndef __PFMLIB_PRIV_IA64_H__ #define __PFMLIB_PRIV_IA64_H__ /* * architected PMC register structure */ typedef union { unsigned long pmc_val; /* generic PMC register */ struct { unsigned long pmc_plm:4; /* privilege level mask */ unsigned long pmc_ev:1; /* external visibility */ unsigned long pmc_oi:1; /* overflow interrupt */ unsigned long pmc_pm:1; /* privileged monitor */ unsigned long pmc_ig1:1; /* reserved */ unsigned long pmc_es:8; /* event select */ unsigned long pmc_ig2:48; /* reserved */ } pmc_gen_count_reg; /* This is the Itanium-specific PMC layout for counter config */ struct { unsigned long pmc_plm:4; /* privilege level mask */ unsigned long pmc_ev:1; /* external visibility */ unsigned long pmc_oi:1; /* overflow interrupt */ unsigned long pmc_pm:1; /* privileged monitor */ unsigned long pmc_ig1:1; /* reserved */ unsigned long pmc_es:7; /* event select */ unsigned long pmc_ig2:1; /* reserved */ unsigned long pmc_umask:4; /* unit mask */ unsigned long pmc_thres:3; /* threshold */ unsigned long pmc_ig3:1; /* reserved (missing from table on p6-17) */ unsigned long pmc_ism:2; /* instruction set mask */ unsigned long pmc_ig4:38; /* reserved */ } pmc_ita_count_reg; /* Opcode matcher */ struct { unsigned long ignored1:3; unsigned long mask:27; /* mask encoding bits {40:27}{12:0} */ unsigned long ignored2:3; unsigned long match:27; /* match encoding bits {40:27}{12:0} */ unsigned long b:1; /* B-syllable */ unsigned long f:1; /* F-syllable */ unsigned long i:1; /* I-syllable */ unsigned long m:1; /* M-syllable */ } pmc8_9_ita_reg; /* Instruction Event Address Registers */ struct { unsigned long iear_plm:4; /* privilege level mask */ unsigned long iear_ig1:2; /* reserved */ unsigned long iear_pm:1; /* privileged monitor */ unsigned long iear_tlb:1; /* cache/tlb mode */ unsigned long iear_ig2:8; /* reserved */ unsigned long iear_umask:4; /* unit mask */ unsigned long iear_ig3:4; /* reserved */ unsigned long iear_ism:2; /* instruction set */ unsigned long iear_ig4:38; /* reserved */ } pmc10_ita_reg; /* Data Event Address Registers */ struct { unsigned long dear_plm:4; /* privilege level mask */ unsigned long dear_ig1:2; /* reserved */ unsigned long dear_pm:1; /* privileged monitor */ unsigned long dear_tlb:1; /* cache/tlb mode */ unsigned long dear_ig2:8; /* reserved */ unsigned long dear_umask:4; /* unit mask */ unsigned long dear_ig3:4; /* reserved */ unsigned long dear_ism:2; /* instruction set */ unsigned long dear_ig4:2; /* reserved */ unsigned long dear_pt:1; /* pass tags */ unsigned long dear_ig5:35; /* reserved */ } pmc11_ita_reg; /* Branch Trace Buffer registers */ struct { unsigned long btbc_plm:4; /* privilege level */ unsigned long btbc_ig1:2; unsigned long btbc_pm:1; /* privileged monitor */ unsigned long btbc_tar:1; /* target address register */ unsigned long btbc_tm:2; /* taken mask */ unsigned long btbc_ptm:2; /* predicted taken address mask */ unsigned long btbc_ppm:2; /* predicted predicate mask */ unsigned long btbc_bpt:1; /* branch prediction table */ unsigned long btbc_bac:1; /* branch address calculator */ unsigned long btbc_ig2:48; } pmc12_ita_reg; struct { unsigned long irange_ta:1; /* tag all bit */ unsigned long irange_ig:63; } pmc13_ita_reg; /* This is the Itanium2-specific PMC layout for counter config */ struct { unsigned long pmc_plm:4; /* privilege level mask */ unsigned long pmc_ev:1; /* external visibility */ unsigned long pmc_oi:1; /* overflow interrupt */ unsigned long pmc_pm:1; /* privileged monitor */ unsigned long pmc_ig1:1; /* reserved */ unsigned long pmc_es:8; /* event select */ unsigned long pmc_umask:4; /* unit mask */ unsigned long pmc_thres:3; /* threshold */ unsigned long pmc_enable:1; /* pmc4 only: power enable bit */ unsigned long pmc_ism:2; /* instruction set mask */ unsigned long pmc_ig2:38; /* reserved */ } pmc_ita2_counter_reg; /* opcode matchers */ struct { unsigned long opcm_ig_ad:1; /* ignore instruction address range checking */ unsigned long opcm_inv:1; /* invert range check */ unsigned long opcm_bit2:1; /* must be 1 */ unsigned long opcm_mask:27; /* mask encoding bits {41:27}{12:0} */ unsigned long opcm_ig1:3; /* reserved */ unsigned long opcm_match:27; /* match encoding bits {41:27}{12:0} */ unsigned long opcm_b:1; /* B-syllable */ unsigned long opcm_f:1; /* F-syllable */ unsigned long opcm_i:1; /* I-syllable */ unsigned long opcm_m:1; /* M-syllable */ } pmc8_9_ita2_reg; /* * instruction event address register configuration * * The register has two layout depending on the value of the ct field. * In cache mode(ct=1x): * - ct is 1 bit, umask is 8 bits * In TLB mode (ct=00): * - ct is 2 bits, umask is 7 bits * ct=11 <=> cache mode and use a latency with eighth bit set * ct=01 => nothing monitored * * The ct=01 value is the only reason why we cannot fix the layout * to ct 1 bit and umask 8 bits. Even though in TLB mode, only 6 bits * are effectively used for the umask, if the user inadvertently use * a umask with the most significant bit set, it would be equivalent * to no monitoring. */ struct { unsigned long iear_plm:4; /* privilege level mask */ unsigned long iear_pm:1; /* privileged monitor */ unsigned long iear_umask:8; /* event unit mask: 7 bits in TLB mode, 8 bits in cache mode */ unsigned long iear_ct:1; /* cache tlb bit13: 0 for TLB mode, 1 for cache mode */ unsigned long iear_ism:2; /* instruction set */ unsigned long iear_ig4:48; /* reserved */ } pmc10_ita2_cache_reg; struct { unsigned long iear_plm:4; /* privilege level mask */ unsigned long iear_pm:1; /* privileged monitor */ unsigned long iear_umask:7; /* event unit mask: 7 bits in TLB mode, 8 bits in cache mode */ unsigned long iear_ct:2; /* cache tlb bit13: 0 for TLB mode, 1 for cache mode */ unsigned long iear_ism:2; /* instruction set */ unsigned long iear_ig4:48; /* reserved */ } pmc10_ita2_tlb_reg; /* data event address register configuration */ struct { unsigned long dear_plm:4; /* privilege level mask */ unsigned long dear_ig1:2; /* reserved */ unsigned long dear_pm:1; /* privileged monitor */ unsigned long dear_mode:2; /* mode */ unsigned long dear_ig2:7; /* reserved */ unsigned long dear_umask:4; /* unit mask */ unsigned long dear_ig3:4; /* reserved */ unsigned long dear_ism:2; /* instruction set */ unsigned long dear_ig4:38; /* reserved */ } pmc11_ita2_reg; /* branch trace buffer configuration register */ struct { unsigned long btbc_plm:4; /* privilege level */ unsigned long btbc_ig1:2; unsigned long btbc_pm:1; /* privileged monitor */ unsigned long btbc_ds:1; /* data selector */ unsigned long btbc_tm:2; /* taken mask */ unsigned long btbc_ptm:2; /* predicted taken address mask */ unsigned long btbc_ppm:2; /* predicted predicate mask */ unsigned long btbc_brt:2; /* branch type mask */ unsigned long btbc_ig2:48; } pmc12_ita2_reg; /* data address range configuration register */ struct { unsigned long darc_ig1:3; unsigned long darc_cfg_dbrp0:2; /* constraint on dbr0 */ unsigned long darc_ig2:6; unsigned long darc_cfg_dbrp1:2; /* constraint on dbr1 */ unsigned long darc_ig3:6; unsigned long darc_cfg_dbrp2:2; /* constraint on dbr2 */ unsigned long darc_ig4:6; unsigned long darc_cfg_dbrp3:2; /* constraint on dbr3 */ unsigned long darc_ig5:16; unsigned long darc_ena_dbrp0:1; /* enable constraint dbr0 */ unsigned long darc_ena_dbrp1:1; /* enable constraint dbr1 */ unsigned long darc_ena_dbrp2:1; /* enable constraint dbr2 */ unsigned long darc_ena_dbrp3:1; /* enable constraint dbr3 */ unsigned long darc_ig6:15; } pmc13_ita2_reg; /* instruction address range configuration register */ struct { unsigned long iarc_ig1:1; unsigned long iarc_ibrp0:1; /* constrained by ibr0 */ unsigned long iarc_ig2:2; unsigned long iarc_ibrp1:1; /* constrained by ibr1 */ unsigned long iarc_ig3:2; unsigned long iarc_ibrp2:1; /* constrained by ibr2 */ unsigned long iarc_ig4:2; unsigned long iarc_ibrp3:1; /* constrained by ibr3 */ unsigned long iarc_ig5:2; unsigned long iarc_fine:1; /* fine mode */ unsigned long iarc_ig6:50; } pmc14_ita2_reg; /* opcode matcher configuration register */ struct { unsigned long opcmc_ibrp0_pmc8:1; unsigned long opcmc_ibrp1_pmc9:1; unsigned long opcmc_ibrp2_pmc8:1; unsigned long opcmc_ibrp3_pmc9:1; unsigned long opcmc_ig1:60; } pmc15_ita2_reg; /* This is the Montecito-specific PMC layout for counters PMC4-PMC15 */ struct { unsigned long pmc_plm:4; /* privilege level mask */ unsigned long pmc_ev:1; /* external visibility */ unsigned long pmc_oi:1; /* overflow interrupt */ unsigned long pmc_pm:1; /* privileged monitor */ unsigned long pmc_ig1:1; /* ignored */ unsigned long pmc_es:8; /* event select */ unsigned long pmc_umask:4; /* unit mask */ unsigned long pmc_thres:3; /* threshold */ unsigned long pmc_ig2:1; /* ignored */ unsigned long pmc_ism:2; /* instruction set: must be 2 */ unsigned long pmc_all:1; /* 0=only self, 1=both threads */ unsigned long pmc_i:1; /* Invalidate */ unsigned long pmc_s:1; /* Shared */ unsigned long pmc_e:1; /* Exclusive */ unsigned long pmc_m:1; /* Modified */ unsigned long pmc_res3:33; /* reserved */ } pmc_mont_counter_reg; /* opcode matchers mask registers */ struct { unsigned long opcm_mask:41; /* opcode mask */ unsigned long opcm_ig1:7; /* ignored */ unsigned long opcm_b:1; /* B-syllable */ unsigned long opcm_f:1; /* F-syllable */ unsigned long opcm_i:1; /* I-syllable */ unsigned long opcm_m:1; /* M-syllable */ unsigned long opcm_ig2:4; /* ignored */ unsigned long opcm_inv:1; /* inverse range for ibrp0 */ unsigned long opcm_ig_ad:1; /* ignore address range restrictions */ unsigned long opcm_ig3:6; /* ignored */ } pmc32_34_mont_reg; /* opcode matchers match registers */ struct { unsigned long opcm_match:41; /* opcode match */ unsigned long opcm_ig1:23; /* ignored */ } pmc33_35_mont_reg; /* opcode matcher config register */ struct { unsigned long opcm_ch0_ig_opcm:1; /* chan0 opcode constraint */ unsigned long opcm_ch1_ig_opcm:1; /* chan1 opcode constraint */ unsigned long opcm_ch2_ig_opcm:1; /* chan2 opcode constraint */ unsigned long opcm_ch3_ig_opcm:1; /* chan3 opcode constraint */ unsigned long opcm_res:28; /* reserved */ unsigned long opcm_ig:32; /* ignored */ } pmc36_mont_reg; /* * instruction event address register configuration (I-EAR) * * The register has two layouts depending on the value of the ct field. * In cache mode(ct=1x): * - ct is 1 bit, umask is 8 bits * In TLB mode (ct=0x): * - ct is 2 bits, umask is 7 bits * ct=11 => cache mode using a latency filter with eighth bit set * ct=01 => nothing monitored * * The ct=01 value is the only reason why we cannot fix the layout * to ct 1 bit and umask 8 bits. Even though in TLB mode, only 6 bits * are effectively used for the umask, if the user inadvertently sets * a umask with the most significant bit set, it would be equivalent * to no monitoring. */ struct { unsigned long iear_plm:4; /* privilege level mask */ unsigned long iear_pm:1; /* privileged monitor */ unsigned long iear_umask:8; /* event unit mask */ unsigned long iear_ct:1; /* =1 for i-cache */ unsigned long iear_res:2; /* reserved */ unsigned long iear_ig:48; /* ignored */ } pmc37_mont_cache_reg; struct { unsigned long iear_plm:4; /* privilege level mask */ unsigned long iear_pm:1; /* privileged monitor */ unsigned long iear_umask:7; /* event unit mask */ unsigned long iear_ct:2; /* 00=i-tlb, 01=nothing 1x=illegal */ unsigned long iear_res:50; /* reserved */ } pmc37_mont_tlb_reg; /* data event address register configuration (D-EAR) */ struct { unsigned long dear_plm:4; /* privilege level mask */ unsigned long dear_ig1:2; /* ignored */ unsigned long dear_pm:1; /* privileged monitor */ unsigned long dear_mode:2; /* mode */ unsigned long dear_ig2:7; /* ignored */ unsigned long dear_umask:4; /* unit mask */ unsigned long dear_ig3:4; /* ignored */ unsigned long dear_ism:2; /* instruction set: must be 2 */ unsigned long dear_ig4:38; /* ignored */ } pmc40_mont_reg; /* IP event address register (IP-EAR) */ struct { unsigned long ipear_plm:4; /* privilege level mask */ unsigned long ipear_ig1:2; /* ignored */ unsigned long ipear_pm:1; /* privileged monitor */ unsigned long ipear_ig2:1; /* ignored */ unsigned long ipear_mode:3; /* mode */ unsigned long ipear_delay:8; /* delay */ unsigned long ipear_ig3:45; /* reserved */ } pmc42_mont_reg; /* execution trace buffer configuration register (ETB) */ struct { unsigned long etbc_plm:4; /* privilege level */ unsigned long etbc_res1:2; /* reserved */ unsigned long etbc_pm:1; /* privileged monitor */ unsigned long etbc_ds:1; /* data selector */ unsigned long etbc_tm:2; /* taken mask */ unsigned long etbc_ptm:2; /* predicted taken address mask */ unsigned long etbc_ppm:2; /* predicted predicate mask */ unsigned long etbc_brt:2; /* branch type mask */ unsigned long etbc_ig:48; /* ignored */ } pmc39_mont_reg; /* data address range configuration register */ struct { unsigned long darc_res1:3; /* reserved */ unsigned long darc_cfg_dtag0:2; /* constraints on dbrp0 */ unsigned long darc_res2:6; /* reserved */ unsigned long darc_cfg_dtag1:2; /* constraints on dbrp1 */ unsigned long darc_res3:6; /* reserved */ unsigned long darc_cfg_dtag2:2; /* constraints on dbrp2 */ unsigned long darc_res4:6; /* reserved */ unsigned long darc_cfg_dtag3:2; /* constraints on dbrp3 */ unsigned long darc_res5:16; /* reserved */ unsigned long darc_ena_dbrp0:1; /* enable constraints dbrp0 */ unsigned long darc_ena_dbrp1:1; /* enable constraints dbrp1 */ unsigned long darc_ena_dbrp2:1; /* enable constraints dbrp2 */ unsigned long darc_ena_dbrp3:1; /* enable constraint dbr3 */ unsigned long darc_res6:15; } pmc41_mont_reg; /* instruction address range configuration register */ struct { unsigned long iarc_res1:1; /* reserved */ unsigned long iarc_ig_ibrp0:1; /* constrained by ibrp0 */ unsigned long iarc_res2:2; /* reserved */ unsigned long iarc_ig_ibrp1:1; /* constrained by ibrp1 */ unsigned long iarc_res3:2; /* reserved */ unsigned long iarc_ig_ibrp2:1; /* constrained by ibrp2 */ unsigned long iarc_res4:2; /* reserved */ unsigned long iarc_ig_ibrp3:1; /* constrained by ibrp3 */ unsigned long iarc_res5:2; /* reserved */ unsigned long iarc_fine:1; /* fine mode */ unsigned long iarc_ig6:50; /* reserved */ } pmc38_mont_reg; } pfm_gen_ia64_pmc_reg_t; typedef struct { unsigned long pmd_val; /* generic counter value */ /* counting pmd register */ struct { unsigned long pmd_count:32; /* 32-bit hardware counter */ unsigned long pmd_sxt32:32; /* sign extension of bit 32 */ } pmd_ita_counter_reg; struct { unsigned long iear_v:1; /* valid bit */ unsigned long iear_tlb:1; /* tlb miss bit */ unsigned long iear_ig1:3; /* reserved */ unsigned long iear_icla:59; /* instruction cache line address {60:51} sxt {50}*/ } pmd0_ita_reg; struct { unsigned long iear_lat:12; /* latency */ unsigned long iear_ig1:52; /* reserved */ } pmd1_ita_reg; struct { unsigned long dear_daddr; /* data address */ } pmd2_ita_reg; struct { unsigned long dear_latency:12; /* latency */ unsigned long dear_ig1:50; /* reserved */ unsigned long dear_level:2; /* level */ } pmd3_ita_reg; struct { unsigned long btb_b:1; /* branch bit */ unsigned long btb_mp:1; /* mispredict bit */ unsigned long btb_slot:2; /* which slot, 3=not taken branch */ unsigned long btb_addr:60; /* b=1, bundle address, b=0 target address */ } pmd8_15_ita_reg; struct { unsigned long btbi_bbi:3; /* branch buffer index */ unsigned long btbi_full:1; /* full bit (sticky) */ unsigned long btbi_ignored:60; } pmd16_ita_reg; struct { unsigned long dear_vl:1; /* valid bit */ unsigned long dear_ig1:1; /* reserved */ unsigned long dear_slot:2; /* slot number */ unsigned long dear_iaddr:60; /* instruction address */ } pmd17_ita_reg; /* counting pmd register */ struct { unsigned long pmd_count:47; /* 47-bit hardware counter */ unsigned long pmd_sxt47:17; /* sign extension of bit 46 */ } pmd_ita2_counter_reg; /* instruction event address register: data address register */ struct { unsigned long iear_stat:2; /* status bit */ unsigned long iear_ig1:3; unsigned long iear_iaddr:59; /* instruction cache line address {60:51} sxt {50}*/ } pmd0_ita2_reg; /* instruction event address register: data address register */ struct { unsigned long iear_latency:12; /* latency */ unsigned long iear_overflow:1; /* latency overflow */ unsigned long iear_ig1:51; /* reserved */ } pmd1_ita2_reg; /* data event address register: data address register */ struct { unsigned long dear_daddr; /* data address */ } pmd2_ita2_reg; /* data event address register: data address register */ struct { unsigned long dear_latency:13; /* latency */ unsigned long dear_overflow:1; /* overflow */ unsigned long dear_stat:2; /* status */ unsigned long dear_ig1:48; /* ignored */ } pmd3_ita2_reg; /* branch trace buffer data register when pmc12.ds == 0 */ struct { unsigned long btb_b:1; /* branch bit */ unsigned long btb_mp:1; /* mispredict bit */ unsigned long btb_slot:2; /* which slot, 3=not taken branch */ unsigned long btb_addr:60; /* bundle address(b=1), target address(b=0) */ } pmd8_15_ita2_reg; /* branch trace buffer data register when pmc12.ds == 1 */ struct { unsigned long btb_b:1; /* branch bit */ unsigned long btb_mp:1; /* mispredict bit */ unsigned long btb_slot:2; /* which slot, 3=not taken branch */ unsigned long btb_loaddr:37; /* b=1, bundle address, b=0 target address */ unsigned long btb_pred:20; /* low 20bits of L1IBR */ unsigned long btb_hiaddr:3; /* hi 3bits of bundle address(b=1) or target address (b=0)*/ } pmd8_15_ds_ita2_reg; /* branch trace buffer index register */ struct { unsigned long btbi_bbi:3; /* next entry index */ unsigned long btbi_full:1; /* full bit (sticky) */ unsigned long btbi_pmd8ext_b1:1; /* pmd8 ext */ unsigned long btbi_pmd8ext_bruflush:1; /* pmd8 ext */ unsigned long btbi_pmd8ext_ig:2; /* pmd8 ext */ unsigned long btbi_pmd9ext_b1:1; /* pmd9 ext */ unsigned long btbi_pmd9ext_bruflush:1; /* pmd9 ext */ unsigned long btbi_pmd9ext_ig:2; /* pmd9 ext */ unsigned long btbi_pmd10ext_b1:1; /* pmd10 ext */ unsigned long btbi_pmd10ext_bruflush:1; /* pmd10 ext */ unsigned long btbi_pmd10ext_ig:2; /* pmd10 ext */ unsigned long btbi_pmd11ext_b1:1; /* pmd11 ext */ unsigned long btbi_pmd11ext_bruflush:1; /* pmd11 ext */ unsigned long btbi_pmd11ext_ig:2; /* pmd11 ext */ unsigned long btbi_pmd12ext_b1:1; /* pmd12 ext */ unsigned long btbi_pmd12ext_bruflush:1; /* pmd12 ext */ unsigned long btbi_pmd12ext_ig:2; /* pmd12 ext */ unsigned long btbi_pmd13ext_b1:1; /* pmd13 ext */ unsigned long btbi_pmd13ext_bruflush:1; /* pmd13 ext */ unsigned long btbi_pmd13ext_ig:2; /* pmd13 ext */ unsigned long btbi_pmd14ext_b1:1; /* pmd14 ext */ unsigned long btbi_pmd14ext_bruflush:1; /* pmd14 ext */ unsigned long btbi_pmd14ext_ig:2; /* pmd14 ext */ unsigned long btbi_pmd15ext_b1:1; /* pmd15 ext */ unsigned long btbi_pmd15ext_bruflush:1; /* pmd15 ext */ unsigned long btbi_pmd15ext_ig:2; /* pmd15 ext */ unsigned long btbi_ignored:28; } pmd16_ita2_reg; /* data event address register: data address register */ struct { unsigned long dear_slot:2; /* slot */ unsigned long dear_bn:1; /* bundle bit (if 1 add 16 to address) */ unsigned long dear_vl:1; /* valid */ unsigned long dear_iaddr:60; /* instruction address (2-bundle window)*/ } pmd17_ita2_reg; struct { unsigned long pmd_count:47; /* 47-bit hardware counter */ unsigned long pmd_sxt47:17; /* sign extension of bit 46 */ } pmd_mont_counter_reg; /* data event address register */ struct { unsigned long dear_daddr; /* data address */ } pmd32_mont_reg; /* data event address register (D-EAR) */ struct { unsigned long dear_latency:13; /* latency */ unsigned long dear_ov:1; /* latency overflow */ unsigned long dear_stat:2; /* status */ unsigned long dear_ig:48; /* ignored */ } pmd33_mont_reg; /* instruction event address register (I-EAR) */ struct { unsigned long iear_stat:2; /* status bit */ unsigned long iear_ig:3; /* ignored */ unsigned long iear_iaddr:59; /* instruction cache line address {60:51} sxt {50}*/ } pmd34_mont_reg; /* instruction event address register (I-EAR) */ struct { unsigned long iear_latency:12; /* latency */ unsigned long iear_ov:1; /* latency overflow */ unsigned long iear_ig:51; /* ignored */ } pmd35_mont_reg; /* data event address register (D-EAR) */ struct { unsigned long dear_slot:2; /* slot */ unsigned long dear_bn:1; /* bundle bit (if 1 add 16 to iaddr) */ unsigned long dear_vl:1; /* valid */ unsigned long dear_iaddr:60; /* instruction address (2-bundle window)*/ } pmd36_mont_reg; /* execution trace buffer index register (ETB) */ struct { unsigned long etbi_ebi:4; /* next entry index */ unsigned long etbi_ig1:1; /* ignored */ unsigned long etbi_full:1; /* ETB overflowed at least once */ unsigned long etbi_ig2:58; /* ignored */ } pmd38_mont_reg; /* execution trace buffer extension register (ETB) */ struct { unsigned long etb_pmd48ext_b1:1; /* pmd48 ext */ unsigned long etb_pmd48ext_bruflush:1; /* pmd48 ext */ unsigned long etb_pmd48ext_res:2; /* reserved */ unsigned long etb_pmd56ext_b1:1; /* pmd56 ext */ unsigned long etb_pmd56ext_bruflush:1; /* pmd56 ext */ unsigned long etb_pmd56ext_res:2; /* reserved */ unsigned long etb_pmd49ext_b1:1; /* pmd49 ext */ unsigned long etb_pmd49ext_bruflush:1; /* pmd49 ext */ unsigned long etb_pmd49ext_res:2; /* reserved */ unsigned long etb_pmd57ext_b1:1; /* pmd57 ext */ unsigned long etb_pmd57ext_bruflush:1; /* pmd57 ext */ unsigned long etb_pmd57ext_res:2; /* reserved */ unsigned long etb_pmd50ext_b1:1; /* pmd50 ext */ unsigned long etb_pmd50ext_bruflush:1; /* pmd50 ext */ unsigned long etb_pmd50ext_res:2; /* reserved */ unsigned long etb_pmd58ext_b1:1; /* pmd58 ext */ unsigned long etb_pmd58ext_bruflush:1; /* pmd58 ext */ unsigned long etb_pmd58ext_res:2; /* reserved */ unsigned long etb_pmd51ext_b1:1; /* pmd51 ext */ unsigned long etb_pmd51ext_bruflush:1; /* pmd51 ext */ unsigned long etb_pmd51ext_res:2; /* reserved */ unsigned long etb_pmd59ext_b1:1; /* pmd59 ext */ unsigned long etb_pmd59ext_bruflush:1; /* pmd59 ext */ unsigned long etb_pmd59ext_res:2; /* reserved */ unsigned long etb_pmd52ext_b1:1; /* pmd52 ext */ unsigned long etb_pmd52ext_bruflush:1; /* pmd52 ext */ unsigned long etb_pmd52ext_res:2; /* reserved */ unsigned long etb_pmd60ext_b1:1; /* pmd60 ext */ unsigned long etb_pmd60ext_bruflush:1; /* pmd60 ext */ unsigned long etb_pmd60ext_res:2; /* reserved */ unsigned long etb_pmd53ext_b1:1; /* pmd53 ext */ unsigned long etb_pmd53ext_bruflush:1; /* pmd53 ext */ unsigned long etb_pmd53ext_res:2; /* reserved */ unsigned long etb_pmd61ext_b1:1; /* pmd61 ext */ unsigned long etb_pmd61ext_bruflush:1; /* pmd61 ext */ unsigned long etb_pmd61ext_res:2; /* reserved */ unsigned long etb_pmd54ext_b1:1; /* pmd54 ext */ unsigned long etb_pmd54ext_bruflush:1; /* pmd54 ext */ unsigned long etb_pmd54ext_res:2; /* reserved */ unsigned long etb_pmd62ext_b1:1; /* pmd62 ext */ unsigned long etb_pmd62ext_bruflush:1; /* pmd62 ext */ unsigned long etb_pmd62ext_res:2; /* reserved */ unsigned long etb_pmd55ext_b1:1; /* pmd55 ext */ unsigned long etb_pmd55ext_bruflush:1; /* pmd55 ext */ unsigned long etb_pmd55ext_res:2; /* reserved */ unsigned long etb_pmd63ext_b1:1; /* pmd63 ext */ unsigned long etb_pmd63ext_bruflush:1; /* pmd63 ext */ unsigned long etb_pmd63ext_res:2; /* reserved */ } pmd39_mont_reg; /* * execution trace buffer extension register when used with IP-EAR * * to be used in conjunction with pmd48_63_ipear_reg (see below) */ struct { unsigned long ipear_pmd48ext_cycles:2; /* pmd48 upper 2 bits of cycles */ unsigned long ipear_pmd48ext_f:1; /* pmd48 flush bit */ unsigned long ipear_pmd48ext_ef:1; /* pmd48 early freeze */ unsigned long ipear_pmd56ext_cycles:2; /* pmd56 upper 2 bits of cycles */ unsigned long ipear_pmd56ext_f:1; /* pmd56 flush bit */ unsigned long ipear_pmd56ext_ef:1; /* pmd56 early freeze */ unsigned long ipear_pmd49ext_cycles:2; /* pmd49 upper 2 bits of cycles */ unsigned long ipear_pmd49ext_f:1; /* pmd49 flush bit */ unsigned long ipear_pmd49ext_ef:1; /* pmd49 early freeze */ unsigned long ipear_pmd57ext_cycles:2; /* pmd57 upper 2 bits of cycles */ unsigned long ipear_pmd57ext_f:1; /* pmd57 flush bit */ unsigned long ipear_pmd57ext_ef:1; /* pmd57 early freeze */ unsigned long ipear_pmd50ext_cycles:2; /* pmd50 upper 2 bits of cycles */ unsigned long ipear_pmd50ext_f:1; /* pmd50 flush bit */ unsigned long ipear_pmd50ext_ef:1; /* pmd50 early freeze */ unsigned long ipear_pmd58ext_cycles:2; /* pmd58 upper 2 bits of cycles */ unsigned long ipear_pmd58ext_f:1; /* pmd58 flush bit */ unsigned long ipear_pmd58ext_ef:1; /* pmd58 early freeze */ unsigned long ipear_pmd51ext_cycles:2; /* pmd51 upper 2 bits of cycles */ unsigned long ipear_pmd51ext_f:1; /* pmd51 flush bit */ unsigned long ipear_pmd51ext_ef:1; /* pmd51 early freeze */ unsigned long ipear_pmd59ext_cycles:2; /* pmd59 upper 2 bits of cycles */ unsigned long ipear_pmd59ext_f:1; /* pmd59 flush bit */ unsigned long ipear_pmd59ext_ef:1; /* pmd59 early freeze */ unsigned long ipear_pmd52ext_cycles:2; /* pmd52 upper 2 bits of cycles */ unsigned long ipear_pmd52ext_f:1; /* pmd52 flush bit */ unsigned long ipear_pmd52ext_ef:1; /* pmd52 early freeze */ unsigned long ipear_pmd60ext_cycles:2; /* pmd60 upper 2 bits of cycles */ unsigned long ipear_pmd60ext_f:1; /* pmd60 flush bit */ unsigned long ipear_pmd60ext_ef:1; /* pmd60 early freeze */ unsigned long ipear_pmd53ext_cycles:2; /* pmd53 upper 2 bits of cycles */ unsigned long ipear_pmd53ext_f:1; /* pmd53 flush bit */ unsigned long ipear_pmd53ext_ef:1; /* pmd53 early freeze */ unsigned long ipear_pmd61ext_cycles:2; /* pmd61 upper 2 bits of cycles */ unsigned long ipear_pmd61ext_f:1; /* pmd61 flush bit */ unsigned long ipear_pmd61ext_ef:1; /* pmd61 early freeze */ unsigned long ipear_pmd54ext_cycles:2; /* pmd54 upper 2 bits of cycles */ unsigned long ipear_pmd54ext_f:1; /* pmd54 flush bit */ unsigned long ipear_pmd54ext_ef:1; /* pmd54 early freeze */ unsigned long ipear_pmd62ext_cycles:2; /* pmd62 upper 2 bits of cycles */ unsigned long ipear_pmd62ext_f:1; /* pmd62 flush bit */ unsigned long ipear_pmd62ext_ef:1; /* pmd62 early freeze */ unsigned long ipear_pmd55ext_cycles:2; /* pmd55 upper 2 bits of cycles */ unsigned long ipear_pmd55ext_f:1; /* pmd55 flush bit */ unsigned long ipear_pmd55ext_ef:1; /* pmd55 early freeze */ unsigned long ipear_pmd63ext_cycles:2; /* pmd63 upper 2 bits of cycles */ unsigned long ipear_pmd63ext_f:1; /* pmd63 flush bit */ unsigned long ipear_pmd63ext_ef:1; /* pmd63 early freeze */ } pmd39_ipear_mont_reg; /* * execution trace buffer data register (ETB) * * when pmc39.ds == 0: pmd48-63 contains branch targets * when pmc39.ds == 1: pmd48-63 content is undefined */ struct { unsigned long etb_s:1; /* source bit */ unsigned long etb_mp:1; /* mispredict bit */ unsigned long etb_slot:2; /* which slot, 3=not taken branch */ unsigned long etb_addr:60; /* bundle address(s=1), target address(s=0) */ } pmd48_63_etb_mont_reg; /* * execution trace buffer when used with IP-EAR with PMD48-63.ef=0 * * The cycles field straddles pmdXX and corresponding extension in * pmd39 (pmd39_ipear_mont_reg). For instance, cycles for pmd48: * * cycles= pmd39_ipear_mont_reg.etb_pmd48ext_cycles << 4 * | pmd48_63_etb_ipear_mont_reg.etb_cycles */ struct { unsigned long ipear_addr:60; /* retired IP[63:4] */ unsigned long ipear_cycles:4; /* lower 4 bit of cycles */ } pmd48_63_ipear_mont_reg; /* * execution trace buffer when used with IP-EAR with PMD48-63.ef=1 * * The cycles field straddles pmdXX and corresponding extension in * pmd39 (pmd39_ipear_mont_reg). For instance, cycles for pmd48: * * cycles= pmd39_ipear_mont_reg.etb_pmd48ext_cycles << 4 * | pmd48_63_etb_ipear_ef_mont_reg.etb_cycles */ struct { unsigned long ipear_delay:8; /* delay count */ unsigned long ipear_addr:52; /* retired IP[61:12] */ unsigned long ipear_cycles:4; /* lower 5 bit of cycles */ } pmd48_63_ipear_ef_mont_reg; } pfm_gen_ia64_pmd_reg_t; #define PFMLIB_ITA2_FL_EVT_NO_QUALCHECK 0x1 /* don't check qualifier constraints */ #define PFMLIB_ITA2_RR_INV 0x1 /* inverse instruction ranges (iranges only) */ #define PFMLIB_ITA2_RR_NO_FINE_MODE 0x2 /* force non fine mode for instruction ranges */ #define PFMLIB_ITA2_EVT_NO_GRP 0 /* event does not belong to a group */ #define PFMLIB_ITA2_EVT_L1_CACHE_GRP 1 /* event belongs to L1 Cache group */ #define PFMLIB_ITA2_EVT_L2_CACHE_GRP 2 /* event belongs to L2 Cache group */ #define PFMLIB_ITA2_EVT_NO_SET -1 /* event does not belong to a set */ /* * counter specific flags */ #define PFMLIB_MONT_FL_EVT_NO_QUALCHECK 0x1 /* don't check qualifier constraints */ #define PFMLIB_MONT_FL_EVT_ALL_THRD 0x2 /* event measured for both threads */ #define PFMLIB_MONT_FL_EVT_ACTIVE_ONLY 0x4 /* measure the event only when the thread is active */ #define PFMLIB_MONT_FL_EVT_ALWAYS 0x8 /* measure the event at all times (active or inactive) */ #define PFMLIB_MONT_RR_INV 0x1 /* inverse instruction ranges (iranges only) */ #define PFMLIB_MONT_RR_NO_FINE_MODE 0x2 /* force non fine mode for instruction ranges */ #define PFMLIB_MONT_IRR_DEMAND_FETCH 0x4 /* demand fetch only for dual events */ #define PFMLIB_MONT_IRR_PREFETCH_MATCH 0x8 /* regular prefetches for dual events */ #define PFMLIB_MONT_EVT_NO_GRP 0 /* event does not belong to a group */ #define PFMLIB_MONT_EVT_L1D_CACHE_GRP 1 /* event belongs to L1D Cache group */ #define PFMLIB_MONT_EVT_L2D_CACHE_GRP 2 /* event belongs to L2D Cache group */ #define PFMLIB_MONT_EVT_NO_SET -1 /* event does not belong to a set */ #define PFMLIB_MONT_EVT_ACTIVE 0 /* event measures only when thread is active */ #define PFMLIB_MONT_EVT_FLOATING 1 #define PFMLIB_MONT_EVT_CAUSAL 2 #define PFMLIB_MONT_EVT_SELF_FLOATING 3 /* floating with .self, causal otherwise */ typedef struct { unsigned long db_mask:56; unsigned long db_plm:4; unsigned long db_ig:2; unsigned long db_w:1; unsigned long db_rx:1; } br_mask_reg_t; typedef union { unsigned long val; br_mask_reg_t db; } dbreg_t; static inline int pfm_ia64_get_cpu_family(void) { return (int)((ia64_get_cpuid(3) >> 24) & 0xff); } static inline int pfm_ia64_get_cpu_model(void) { return (int)((ia64_get_cpuid(3) >> 16) & 0xff); } /* * find last bit set */ static inline int pfm_ia64_fls (unsigned long x) { double d = x; long exp; exp = ia64_getf(d); return exp - 0xffff; } #endif /* __PFMLIB_PRIV_IA64_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_atom.c0000600003276200002170000000633612247131124020671 0ustar ralphundrgrad/* * pfmlib_intel_atom.c : Intel Atom PMU * * Copyright (c) 2008 Google, Inc * Contributed by Stephane Eranian * * Based on work: * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * * This file implements support for Intel Core PMU as specified in the following document: * "IA-32 Intel Architecture Software Developer's Manual - Volume 3B: System * Programming Guide" * * Intel Atom = architectural v3 + PEBS */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_atom_events.h" static int pfm_intel_atom_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; /* * Atom : family 6 model 28 */ if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 28: /* Pineview/Silverthorne */ case 38: /* Lincroft */ case 39: /* Penwell */ case 53: /* Cloverview */ case 54: /* Cedarview */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_intel_atom_init(void *this) { pfm_intel_x86_cfg.arch_version = 3; return PFM_SUCCESS; } pfmlib_pmu_t intel_atom_support={ .desc = "Intel Atom", .name = "atom", .pmu = PFM_PMU_INTEL_ATOM, .pme_count = LIBPFM_ARRAY_SIZE(intel_atom_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 2, .num_fixed_cntrs = 3, .max_encoding = 1, .pe = intel_atom_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .supported_plm = INTEL_X86_PLM, .pmu_detect = pfm_intel_atom_detect, .pmu_init = pfm_intel_atom_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_core.c0000600003276200002170000000570012247131124020653 0ustar ralphundrgrad/* * pfmlib_intel_core.c : Intel Core PMU * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Core PMU = architectural perfmon v2 + PEBS */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_core_events.h" static int pfm_core_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 15: /* Merom */ case 23: /* Penryn */ case 29: /* Dunnington */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_core_init(void *this) { pfm_intel_x86_cfg.arch_version = 2; return PFM_SUCCESS; } pfmlib_pmu_t intel_core_support={ .desc = "Intel Core", .name = "core", .pmu = PFM_PMU_INTEL_CORE, .pme_count = LIBPFM_ARRAY_SIZE(intel_core_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 2, .num_fixed_cntrs = 3, .max_encoding = 1, .supported_plm = INTEL_X86_PLM, .pe = intel_core_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_core_detect, .pmu_init = pfm_core_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_fam11h.c0000600003276200002170000000527512247131124020607 0ustar ralphundrgrad/* * pfmlib_amd64_fam11h.c : AMD64 Family 11h * * Copyright (c) 2012 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_fam11h.h" #define DEFINE_FAM11H_REV(d, n, r, pmuid) \ static int \ pfm_amd64_fam11h_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ } \ pfmlib_pmu_t amd64_fam11h_##n##_support={ \ .desc = "AMD64 Fam11h "#d, \ .name = "amd64_fam11h_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_fam11h_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_FAM10H_PLM, \ .num_cntrs = 4, \ .max_encoding = 1, \ .pe = amd64_fam11h_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_fam11h_##n##_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ } DEFINE_FAM11H_REV(Turion, turion, 0, PFM_PMU_AMD64_FAM11H_TURION); papi-5.3.0/src/libpfm4/lib/pfmlib_power_priv.h0000600003276200002170000000663412247131124020740 0ustar ralphundrgrad/****************************/ /* THIS IS OPEN SOURCE CODE */ /****************************/ #ifndef __PFMLIB_POWER_PRIV_H__ #define __PFMLIB_POWER_PRIV_H__ /* * File: pfmlib_power_priv.h * CVS: * Author: Corey Ashford * cjashfor@us.ibm.com * Mods: * * * (C) Copyright IBM Corporation, 2009. All Rights Reserved. * Contributed by Corey Ashford * */ typedef struct { char *pme_name; unsigned pme_code; char *pme_short_desc; char *pme_long_desc; } pme_power_entry_t; typedef struct { char *pme_name; unsigned pme_code; char *pme_desc; uint64_t pme_modmsk; } pme_torrent_entry_t; /* Attribute "type "for PowerBus MCD events */ #define TORRENT_ATTR_MCD_TYPE 0 /* Attribute "sel" for PowerBus bus utilization events */ #define TORRENT_ATTR_UTIL_SEL 1 /* Attribute "lo_cmp" for PowerBus utilization events */ #define TORRENT_ATTR_UTIL_LO_CMP 2 /* Attribute "hi_cmp" for PowerBus utilization events */ #define TORRENT_ATTR_UTIL_HI_CMP 3 #define _TORRENT_ATTR_MCD_TYPE (1 << TORRENT_ATTR_MCD_TYPE) #define _TORRENT_ATTR_MCD (_TORRENT_ATTR_MCD_TYPE) #define _TORRENT_ATTR_UTIL_SEL (1 << TORRENT_ATTR_UTIL_SEL) #define _TORRENT_ATTR_UTIL_LO_CMP (1 << TORRENT_ATTR_UTIL_LO_CMP) #define _TORRENT_ATTR_UTIL_HI_CMP (1 << TORRENT_ATTR_UTIL_HI_CMP) #define _TORRENT_ATTR_UTIL_LO (_TORRENT_ATTR_UTIL_SEL | \ _TORRENT_ATTR_UTIL_LO_CMP) #define _TORRENT_ATTR_UTIL_HI (_TORRENT_ATTR_UTIL_SEL | \ _TORRENT_ATTR_UTIL_HI_CMP) /* * These definitions were taken from the reg.h file which, until Linux * 2.6.18, resided in /usr/include/asm-ppc64. Most of the unneeded * definitions have been removed, but there are still a few in this file * that are currently unused by libpfm. */ #ifndef _POWER_REG_H #define _POWER_REG_H #define __stringify_1(x) #x #define __stringify(x) __stringify_1(x) #ifdef __powerpc__ #define mfspr(rn) ({unsigned long rval; \ asm volatile("mfspr %0," __stringify(rn) \ : "=r" (rval)); rval;}) #else #define mfspr(rn) (0) #endif /* Special Purpose Registers (SPRNs)*/ #define SPRN_PVR 0x11F /* Processor Version Register */ /* Processor Version Register (PVR) field extraction */ #define PVR_VER(pvr) (((pvr) >> 16) & 0xFFFF) /* Version field */ #define PVR_REV(pvr) (((pvr) >> 0) & 0xFFFF) /* Revison field */ #define __is_processor(pv) (PVR_VER(mfspr(SPRN_PVR)) == (pv)) /* 64-bit processors */ #define PV_POWER4 0x0035 #define PV_POWER4p 0x0038 #define PV_970 0x0039 #define PV_POWER5 0x003A #define PV_POWER5p 0x003B #define PV_970FX 0x003C #define PV_POWER6 0x003E #define PV_POWER7 0x003F #define PV_POWER7p 0x004a #define PV_970MP 0x0044 #define PV_970GX 0x0045 #define PV_POWER8 0x004b extern int pfm_gen_powerpc_get_event_info(void *this, int pidx, pfm_event_info_t *info); extern int pfm_gen_powerpc_get_event_attr_info(void *this, int pidx, int umask_idx, pfm_event_attr_info_t *info); extern int pfm_gen_powerpc_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_gen_powerpc_get_event_first(void *this); extern int pfm_gen_powerpc_get_event_next(void *this, int idx); extern int pfm_gen_powerpc_event_is_valid(void *this, int pidx); extern int pfm_gen_powerpc_validate_table(void *this, FILE *fp); extern void pfm_gen_powerpc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_gen_powerpc_get_perf_encoding(void *this, pfmlib_event_desc_t *e); #endif /* _POWER_REG_H */ #endif papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_ultra12.c0000600003276200002170000000431712247131124021215 0ustar ralphundrgrad/* * pfmlib_sparc_ultra12.c : SPARC Ultra I, II * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Core PMU = architectural perfmon v2 + PEBS */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_sparc_priv.h" #include "events/sparc_ultra12_events.h" pfmlib_pmu_t sparc_ultra12_support={ .desc = "Ultra Sparc I/II", .name = "ultra12", .pmu = PFM_PMU_SPARC_ULTRA12, .pme_count = LIBPFM_ARRAY_SIZE(ultra12_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .max_encoding = 2, .num_cntrs = 2, .pe = ultra12_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_imc.c0000600003276200002170000000534412247131124022533 0ustar ralphundrgrad/* * pfmlib_intel_snbep_unc_imc.c : Intel SandyBridge-EP Integrated Memory Controller (IMC) uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_imc_events.h" #define DEFINE_IMC_BOX(n) \ pfmlib_pmu_t intel_snbep_unc_imc##n##_support = { \ .desc = "Intel Sandy Bridge-EP IMC"#n" uncore", \ .name = "snbep_unc_imc"#n, \ .perf_name = "uncore_imc_"#n, \ .pmu = PFM_PMU_INTEL_SNBEP_UNC_IMC##n, \ .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_m_pe), \ .type = PFM_PMU_TYPE_UNCORE, \ .num_cntrs = 4, \ .num_fixed_cntrs = 1, \ .max_encoding = 1, \ .pe = intel_snbep_unc_m_pe, \ .atdesc = snbep_unc_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_intel_snbep_unc_detect, \ .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding, \ PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding), \ .get_event_first = pfm_intel_x86_get_event_first, \ .get_event_next = pfm_intel_x86_get_event_next, \ .event_is_valid = pfm_intel_x86_event_is_valid, \ .validate_table = pfm_intel_x86_validate_table, \ .get_event_info = pfm_intel_x86_get_event_info, \ .get_event_attr_info = pfm_intel_x86_get_event_attr_info, \ PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs), \ .get_event_nattrs = pfm_intel_x86_get_event_nattrs, \ }; DEFINE_IMC_BOX(0); DEFINE_IMC_BOX(1); DEFINE_IMC_BOX(2); DEFINE_IMC_BOX(3); papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_priv.h0000600003276200002170000000326712247131124020713 0ustar ralphundrgrad#ifndef __PFMLIB_SPARC_PRIV_H__ #define __PFMLIB_SPARC_PRIV_H__ typedef struct { char *uname; /* mask name */ char *udesc; /* mask description */ int ubit; /* umask bit position */ } sparc_mask_t; #define EVENT_MASK_BITS 8 typedef struct { char *name; /* event name */ char *desc; /* event description */ char ctrl; /* S0 or S1 */ char __pad; int code; /* S0/S1 encoding */ int numasks; /* number of entries in masks */ sparc_mask_t umasks[EVENT_MASK_BITS]; } sparc_entry_t; typedef union { unsigned int val; struct { unsigned int ctrl_s0 : 1; unsigned int ctrl_s1 : 1; unsigned int reserved1 : 14; unsigned int code : 8; unsigned int umask : 8; } config; } pfm_sparc_reg_t; #define PME_CTRL_S0 1 #define PME_CTRL_S1 2 #define SPARC_ATTR_K 0 #define SPARC_ATTR_U 1 #define SPARC_ATTR_H 2 #define SPARC_PLM (PFM_PLM0|PFM_PLM3) #define NIAGARA2_PLM (SPARC_PLM|PFM_PLMH) extern int pfm_sparc_detect(void *this); extern int pfm_sparc_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_sparc_get_event_first(void *this); extern int pfm_sparc_get_event_next(void *this, int idx); extern int pfm_sparc_event_is_valid(void *this, int pidx); extern int pfm_sparc_validate_table(void *this, FILE *fp); extern int pfm_sparc_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info); extern int pfm_sparc_get_event_info(void *this, int idx, pfm_event_info_t *info); extern unsigned int pfm_sparc_get_event_nattrs(void *this, int pidx); extern void pfm_sparc_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); extern int pfm_sparc_get_perf_encoding(void *this, pfmlib_event_desc_t *e); #endif /* __PFMLIB_SPARC_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_k8.c0000600003276200002170000000600112247131124020040 0ustar ralphundrgrad/* * pfmlib_amd64_k8.c : AMD64 K8 * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_k8.h" #define DEFINE_K8_REV(d, n, r, pmuid) \ static int \ pfm_amd64_k8_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ }; \ pfmlib_pmu_t amd64_k8_##n##_support={ \ .desc = "AMD64 K8 "#d, \ .name = "amd64_k8_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_k8_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_K7_PLM, \ .num_cntrs = 4, \ .max_encoding = 1, \ .pe = amd64_k8_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_k8_##n##_detect, \ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding, \ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ .get_num_events = pfm_amd64_get_num_events, \ } DEFINE_K8_REV(RevB, revb, AMD64_K8_REV_B, PFM_PMU_AMD64_K8_REVB); DEFINE_K8_REV(RevC, revc, AMD64_K8_REV_C, PFM_PMU_AMD64_K8_REVC); DEFINE_K8_REV(RevD, revd, AMD64_K8_REV_D, PFM_PMU_AMD64_K8_REVD); DEFINE_K8_REV(RevE, reve, AMD64_K8_REV_E, PFM_PMU_AMD64_K8_REVE); DEFINE_K8_REV(RevF, revf, AMD64_K8_REV_F, PFM_PMU_AMD64_K8_REVF); DEFINE_K8_REV(RevG, revg, AMD64_K8_REV_G, PFM_PMU_AMD64_K8_REVG); papi-5.3.0/src/libpfm4/lib/pfmlib_intel_wsm.c0000600003276200002170000001041312247131124020526 0ustar ralphundrgrad/* * pfmlib_intel_wsm.c : Intel Westmere core PMU * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "events/intel_wsm_events.h" static int pfm_wsm_sp_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 37: /* Clarkdale */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_wsm_dp_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 44: /* Westmere-EP, Gulftown */ case 47: /* Westmere E7 */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_wsm_init(void *this) { pfm_intel_x86_cfg.arch_version = 3; return PFM_SUCCESS; } pfmlib_pmu_t intel_wsm_sp_support={ .desc = "Intel Westmere (single-socket)", .name = "wsm", .pmu = PFM_PMU_INTEL_WSM, .pme_count = LIBPFM_ARRAY_SIZE(intel_wsm_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 4, .num_fixed_cntrs = 3, .max_encoding = 2, /* because of OFFCORE_RESPONSE */ .pe = intel_wsm_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_wsm_sp_detect, .pmu_init = pfm_wsm_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; pfmlib_pmu_t intel_wsm_dp_support={ .desc = "Intel Westmere DP", .name = "wsm_dp", .pmu = PFM_PMU_INTEL_WSM_DP, .pme_count = LIBPFM_ARRAY_SIZE(intel_wsm_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 4, .num_fixed_cntrs = 3, .max_encoding = 2, /* because of OFFCORE_RESPONSE */ .pe = intel_wsm_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_wsm_dp_detect, .pmu_init = pfm_wsm_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; papi-5.3.0/src/libpfm4/lib/pfmlib_power7.c0000600003276200002170000000440612247131124017755 0ustar ralphundrgrad/* * pfmlib_power7.c : IBM Power6 support * * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/power7_events.h" static int pfm_power7_detect(void* this) { if (__is_processor(PV_POWER7) || __is_processor(PV_POWER7p)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t power7_support={ .desc = "POWER7", .name = "power7", .pmu = PFM_PMU_POWER7, .pme_count = LIBPFM_ARRAY_SIZE(power7_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 4, .num_fixed_cntrs = 2, .max_encoding = 1, .pe = power7_pe, .pmu_detect = pfm_power7_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_perf_event.c0000600003276200002170000000562712247131124021670 0ustar ralphundrgrad/* * pfmlib_amd64_perf_event.c : perf_event AMD64 functions * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_amd64_priv.h" /* architecture private */ #include "pfmlib_perf_event_priv.h" int pfm_amd64_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; int ret; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; /* * use generic raw encoding function first */ ret = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (ret != PFM_SUCCESS) return ret; if (e->count > 1) { DPRINT("%s: unsupported count=%d\n", e->count); return PFM_ERR_NOTSUPP; } /* all events treated as raw for now */ attr->type = PERF_TYPE_RAW; attr->config = e->codes[0]; return PFM_SUCCESS; } void pfm_amd64_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; int i, compact; for (i=0; i < e->npattrs; i++) { compact = 0; /* umasks never conflict */ if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; /* * with perf_events, u and k are handled at the OS level * via attr.exclude_* fields */ if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PMU) { if (e->pattrs[i].idx == AMD64_ATTR_U || e->pattrs[i].idx == AMD64_ATTR_K || e->pattrs[i].idx == AMD64_ATTR_H) compact = 1; } if (e->pattrs[i].ctrl == PFM_ATTR_CTRL_PERF_EVENT) { /* No precise mode on AMD */ if (e->pattrs[i].idx == PERF_ATTR_PR) compact = 1; /* older processors do not support hypervisor priv level */ if (!IS_FAMILY_10H(pmu) && e->pattrs[i].idx == PERF_ATTR_H) compact = 1; } if (compact) { pfmlib_compact_pattrs(e, i); i--; } } } papi-5.3.0/src/libpfm4/lib/pfmlib_cell_priv.h0000600003276200002170000000565712247131124020527 0ustar ralphundrgrad/* * Copyright (c) 2007 TOSHIBA CORPORATION based on code from * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PFMLIB_CELL_PRIV_H__ #define __PFMLIB_CELL_PRIV_H__ #define PFM_CELL_PME_FREQ_PPU_MFC 0 #define PFM_CELL_PME_FREQ_SPU 1 #define PFM_CELL_PME_FREQ_HALF 2 typedef struct { char *pme_name; /* event name */ char *pme_desc; /* event description */ unsigned long long pme_code; /* event code */ unsigned int pme_type; /* count type */ unsigned int pme_freq; /* debug_bus_control's frequency value */ unsigned int pme_enable_word; } pme_cell_entry_t; /* PMC register */ #define REG_PM0_CONTROL 0x0000 #define REG_PM1_CONTROL 0x0001 #define REG_PM2_CONTROL 0x0002 #define REG_PM3_CONTROL 0x0003 #define REG_PM4_CONTROL 0x0004 #define REG_PM5_CONTROL 0x0005 #define REG_PM6_CONTROL 0x0006 #define REG_PM7_CONTROL 0x0007 #define REG_PM0_EVENT 0x0008 #define REG_PM1_EVENT 0x0009 #define REG_PM2_EVENT 0x000A #define REG_PM3_EVENT 0x000B #define REG_PM4_EVENT 0x000C #define REG_PM5_EVENT 0x000D #define REG_PM6_EVENT 0x000E #define REG_PM7_EVENT 0x000F #define REG_GROUP_CONTROL 0x0010 #define REG_DEBUG_BUS_CONTROL 0x0011 #define REG_TRACE_ADDRESS 0x0012 #define REG_EXT_TRACE_TIMER 0x0013 #define REG_PM_STATUS 0x0014 #define REG_PM_CONTROL 0x0015 #define REG_PM_INTERVAL 0x0016 #define REG_PM_START_STOP 0x0017 #define NONE_SIGNAL 0x0000 #define SIGNAL_SPU 41 #define SIGNAL_SPU_TRIGGER 42 #define SIGNAL_SPU_EVENT 43 #define COUNT_TYPE_BOTH_TYPE 1 #define COUNT_TYPE_CUMULATIVE_LEN 2 #define COUNT_TYPE_OCCURRENCE 3 #define COUNT_TYPE_MULTI_CYCLE 4 #define COUNT_TYPE_SINGLE_CYCLE 5 #define WORD_0_ONLY 1 /* 0001 */ #define WORD_2_ONLY 4 /* 0100 */ #define WORD_0_AND_1 3 /* 0011 */ #define WORD_0_AND_2 5 /* 0101 */ #define WORD_NONE 0 #endif /* __PFMLIB_CELL_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_common.c0000600003276200002170000011046112247131124020021 0ustar ralphundrgrad/* * pfmlib_common.c: set of functions common to all PMU models * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include "pfmlib_priv.h" static pfmlib_pmu_t *pfmlib_pmus[]= { #ifdef CONFIG_PFMLIB_ARCH_IA64 #if 0 &montecito_support, &itanium2_support, &itanium_support, &generic_ia64_support, /* must always be last for IA-64 */ #endif #endif #ifdef CONFIG_PFMLIB_ARCH_I386 /* 32-bit only processors */ &intel_pii_support, &intel_ppro_support, &intel_p6_support, &intel_pm_support, &intel_coreduo_support, #endif #ifdef CONFIG_PFMLIB_ARCH_X86 /* 32 and 64 bit processors */ &netburst_support, &netburst_p_support, &amd64_k7_support, &amd64_k8_revb_support, &amd64_k8_revc_support, &amd64_k8_revd_support, &amd64_k8_reve_support, &amd64_k8_revf_support, &amd64_k8_revg_support, &amd64_fam10h_barcelona_support, &amd64_fam10h_shanghai_support, &amd64_fam10h_istanbul_support, &amd64_fam11h_turion_support, &amd64_fam12h_llano_support, &amd64_fam14h_bobcat_support, &amd64_fam15h_interlagos_support, &intel_core_support, &intel_atom_support, &intel_nhm_support, &intel_nhm_ex_support, &intel_nhm_unc_support, &intel_wsm_sp_support, &intel_wsm_dp_support, &intel_wsm_unc_support, &intel_snb_support, &intel_snb_unc_cbo0_support, &intel_snb_unc_cbo1_support, &intel_snb_unc_cbo2_support, &intel_snb_unc_cbo3_support, &intel_snb_ep_support, &intel_ivb_support, &intel_ivb_unc_cbo0_support, &intel_ivb_unc_cbo1_support, &intel_ivb_unc_cbo2_support, &intel_ivb_unc_cbo3_support, &intel_ivb_ep_support, &intel_hsw_support, &intel_snbep_unc_cb0_support, &intel_snbep_unc_cb1_support, &intel_snbep_unc_cb2_support, &intel_snbep_unc_cb3_support, &intel_snbep_unc_cb4_support, &intel_snbep_unc_cb5_support, &intel_snbep_unc_cb6_support, &intel_snbep_unc_cb7_support, &intel_snbep_unc_ha_support, &intel_snbep_unc_imc0_support, &intel_snbep_unc_imc1_support, &intel_snbep_unc_imc2_support, &intel_snbep_unc_imc3_support, &intel_snbep_unc_pcu_support, &intel_snbep_unc_qpi0_support, &intel_snbep_unc_qpi1_support, &intel_snbep_unc_ubo_support, &intel_snbep_unc_r2pcie_support, &intel_snbep_unc_r3qpi0_support, &intel_snbep_unc_r3qpi1_support, &intel_knc_support, &intel_x86_arch_support, /* must always be last for x86 */ #endif #ifdef CONFIG_PFMLIB_ARCH_MIPS &mips_74k_support, #endif #ifdef CONFIG_PFMLIB_ARCH_SICORTEX &sicortex_support, #endif #ifdef CONFIG_PFMLIB_ARCH_POWERPC &power4_support, &ppc970_support, &ppc970mp_support, &power5_support, &power5p_support, &power6_support, &power7_support, &power8_support, &torrent_support, #endif #ifdef CONFIG_PFMLIB_ARCH_SPARC &sparc_ultra12_support, &sparc_ultra3_support, &sparc_ultra3i_support, &sparc_ultra3plus_support, &sparc_ultra4plus_support, &sparc_niagara1_support, &sparc_niagara2_support, #endif #ifdef CONFIG_PFMLIB_CELL &cell_support, #endif #ifdef CONFIG_PFMLIB_ARCH_ARM &arm_cortex_a8_support, &arm_cortex_a9_support, &arm_cortex_a15_support, &arm_1176_support, #endif #ifdef CONFIG_PFMLIB_ARCH_S390X &s390x_cpum_cf_support, #endif #ifdef __linux__ &perf_event_support, #endif }; #define PFMLIB_NUM_PMUS (int)(sizeof(pfmlib_pmus)/sizeof(pfmlib_pmu_t *)) static pfmlib_os_t pfmlib_os_none; pfmlib_os_t *pfmlib_os = &pfmlib_os_none; static pfmlib_os_t *pfmlib_oses[]={ &pfmlib_os_none, #ifdef __linux__ &pfmlib_os_perf, &pfmlib_os_perf_ext, #endif }; #define PFMLIB_NUM_OSES (int)(sizeof(pfmlib_oses)/sizeof(pfmlib_os_t *)) /* * Mapping table from PMU index to pfmlib_pmu_t * table is populated from pfmlib_pmus[] when the library * is initialized. * * Some entries can be NULL if PMU is not implemented on host * architecture or if the initialization failed. */ static pfmlib_pmu_t *pfmlib_pmus_map[PFM_PMU_MAX]; #define pfmlib_for_each_pmu_event(p, e) \ for(e=(p)->get_event_first((p)); e != -1; e = (p)->get_event_next((p), e)) #define for_each_pmu_event_attr(u, i) \ for((u)=0; (u) < (i)->nattrs; (u) = (u)+1) #define pfmlib_for_each_pmu(x) \ for((x)= 0 ; (x) < PFMLIB_NUM_PMUS; (x)++) #define pfmlib_for_each_pmu(x) \ for((x)= 0 ; (x) < PFMLIB_NUM_PMUS; (x)++) #define pfmlib_for_each_os(x) \ for((x)= 0 ; (x) < PFMLIB_NUM_OSES; (x)++) pfmlib_config_t pfm_cfg; void __pfm_dbprintf(const char *fmt, ...) { va_list ap; if (pfm_cfg.debug == 0) return; va_start(ap, fmt); vfprintf(pfm_cfg.fp, fmt, ap); va_end(ap); } void __pfm_vbprintf(const char *fmt, ...) { va_list ap; if (pfm_cfg.verbose == 0) return; va_start(ap, fmt); vfprintf(pfm_cfg.fp, fmt, ap); va_end(ap); } /* * pfmlib_getl: our own equivalent to GNU getline() extension. * This avoids a dependency on having a C library with * support for getline(). */ int pfmlib_getl(char **buffer, size_t *len, FILE *fp) { #define GETL_DFL_LEN 32 char *b; int c; size_t maxsz, maxi, d, i = 0; if (!len || !fp || !buffer) return -1; b = *buffer; if (!b) *len = 0; maxsz = *len; maxi = maxsz - 2; while ((c = fgetc(fp)) != EOF) { if (maxsz == 0 || i == maxi) { if (maxsz == 0) maxsz = GETL_DFL_LEN; else maxsz <<= 1; if (*buffer) d = &b[i] - *buffer; else d = 0; *buffer = realloc(*buffer, maxsz); if (!*buffer) return -1; b = *buffer + d; maxi = maxsz - d - 2; i = 0; *len = maxsz; } b[i++] = c; if (c == '\n') break; } b[i] = '\0'; return c != EOF ? 0 : -1; } /* * append fmt+args to str such that the string is no * more than max characters incl. null termination */ void pfmlib_strconcat(char *str, size_t max, const char *fmt, ...) { va_list ap; size_t len, todo; len = strlen(str); todo = max - strlen(str); va_start(ap, fmt); vsnprintf(str+len, todo, fmt, ap); va_end(ap); } /* * compact all pattrs starting from index i */ void pfmlib_compact_pattrs(pfmlib_event_desc_t *e, int i) { int j; for (j = i+1; j < e->npattrs; j++) e->pattrs[j - 1] = e->pattrs[j]; e->npattrs--; } static void pfmlib_compact_attrs(pfmlib_event_desc_t *e, int i) { int j; for (j = i+1; j < e->nattrs; j++) e->attrs[j - 1] = e->attrs[j]; e->nattrs--; } /* * 0 : different attribute * 1 : exactly same attribute (duplicate can be removed) * -1 : same attribute but value differ, this is an error */ static inline int pfmlib_same_attr(pfmlib_event_desc_t *d, int i, int j) { pfm_event_attr_info_t *a1, *a2; pfmlib_attr_t *b1, *b2; a1 = attr(d, i); a2 = attr(d, j); b1 = d->attrs+i; b2 = d->attrs+j; if (a1->idx == a2->idx && a1->type == a2->type && a1->ctrl == a2->ctrl) { if (b1->ival == b2->ival) return 1; return -1; } return 0; } static inline int pfmlib_pmu_active(pfmlib_pmu_t *pmu) { return !!(pmu->flags & PFMLIB_PMU_FL_ACTIVE); } static inline int pfmlib_pmu_initialized(pfmlib_pmu_t *pmu) { return !!(pmu->flags & PFMLIB_PMU_FL_INIT); } static inline pfm_pmu_t idx2pmu(int idx) { return (pfm_pmu_t)(idx >> PFMLIB_PMU_SHIFT) & PFMLIB_PMU_MASK; } static inline pfmlib_pmu_t * pmu2pmuidx(pfm_pmu_t pmu) { /* pfm_pmu_t is unsigned int enum, so * just need to check for upper bound */ if (pmu >= PFM_PMU_MAX) return NULL; return pfmlib_pmus_map[pmu]; } /* * external opaque idx -> PMU + internal idx */ static pfmlib_pmu_t * pfmlib_idx2pidx(int idx, int *pidx) { pfmlib_pmu_t *pmu; pfm_pmu_t pmu_id; if (PFMLIB_INITIALIZED() == 0) return NULL; if (idx < 0) return NULL; pmu_id = idx2pmu(idx); pmu = pmu2pmuidx(pmu_id); if (!pmu) return NULL; *pidx = idx & PFMLIB_PMU_PIDX_MASK; if (!pmu->event_is_valid(pmu, *pidx)) return NULL; return pmu; } static pfmlib_os_t * pfmlib_find_os(pfm_os_t id) { int o; pfmlib_os_t *os; pfmlib_for_each_os(o) { os = pfmlib_oses[o]; if (os->id == id && (os->flags & PFMLIB_OS_FL_ACTIVATED)) return os; } return NULL; } size_t pfmlib_check_struct(void *st, size_t usz, size_t refsz, size_t sz) { size_t rsz = sz; /* * if user size is zero, then use ABI0 size */ if (usz == 0) usz = refsz; /* * cannot be smaller than ABI0 size */ if (usz < refsz) { DPRINT("pfmlib_check_struct: user size too small %zu\n", usz); return 0; } /* * if bigger than current ABI, then check that none * of the extra bits are set. This is to avoid mistake * by caller assuming the library set those bits. */ if (usz > sz) { char *addr = (char *)st + sz; char *end = (char *)st + usz; while (addr != end) { if (*addr++) { DPRINT("pfmlib_check_struct: invalid extra bits\n"); return 0; } } } return rsz; } /* * check environment variables for: * LIBPFM_VERBOSE : enable verbose output (must be 1) * LIBPFM_DEBUG : enable debug output (must be 1) */ static void pfmlib_init_env(void) { char *str; pfm_cfg.fp = stderr; str = getenv("LIBPFM_VERBOSE"); if (str && isdigit((int)*str)) pfm_cfg.verbose = *str - '0'; str = getenv("LIBPFM_DEBUG"); if (str && isdigit((int)*str)) pfm_cfg.debug = *str - '0'; str = getenv("LIBPFM_DEBUG_STDOUT"); if (str) pfm_cfg.fp = stdout; pfm_cfg.forced_pmu = getenv("LIBPFM_FORCE_PMU"); str = getenv("LIBPFM_ENCODE_INACTIVE"); if (str) pfm_cfg.inactive = 1; } static int pfmlib_pmu_sanity_checks(pfmlib_pmu_t *p) { /* * check event can be encoded */ if (p->pme_count >= (1<< PFMLIB_PMU_SHIFT)) { DPRINT("too many events for %s\n", p->desc); return PFM_ERR_NOTSUPP; } if (p->max_encoding > PFMLIB_MAX_ENCODING) { DPRINT("max encoding too high (%d > %d) for %s\n", p->max_encoding, PFMLIB_MAX_ENCODING, p->desc); return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } int pfmlib_build_fstr(pfmlib_event_desc_t *e, char **fstr) { /* nothing to do */ if (!fstr) return PFM_SUCCESS; *fstr = malloc(strlen(e->fstr) + 2 + strlen(e->pmu->name) + 1); if (*fstr) sprintf(*fstr, "%s::%s", e->pmu->name, e->fstr); return fstr ? PFM_SUCCESS : PFM_ERR_NOMEM; } static int pfmlib_pmu_activate(pfmlib_pmu_t *p) { int ret; if (p->pmu_init) { ret = p->pmu_init(p); if (ret != PFM_SUCCESS) return ret; } p->flags |= PFMLIB_PMU_FL_ACTIVE; DPRINT("activated %s\n", p->desc); return PFM_SUCCESS; } static inline int pfmlib_match_forced_pmu(const char *name) { const char *p; size_t l; /* skip any lower level specifier */ p = strchr(pfm_cfg.forced_pmu, ','); if (p) l = p - pfm_cfg.forced_pmu; else l = strlen(pfm_cfg.forced_pmu); return !strncasecmp(name, pfm_cfg.forced_pmu, l); } static int pfmlib_init_pmus(void) { pfmlib_pmu_t *p; int i, ret; int nsuccess = 0; /* * activate all detected PMUs * when forced, only the designated PMU * is setup and activated */ pfmlib_for_each_pmu(i) { p = pfmlib_pmus[i]; DPRINT("trying %s\n", p->desc); ret = PFM_SUCCESS; if (!pfm_cfg.forced_pmu) ret = p->pmu_detect(p); else if (!pfmlib_match_forced_pmu(p->name)) ret = PFM_ERR_NOTSUPP; /* * basic checks * failure causes PMU to not be available */ if (pfmlib_pmu_sanity_checks(p) != PFM_SUCCESS) continue; p->flags |= PFMLIB_PMU_FL_INIT; /* * populate mapping table */ pfmlib_pmus_map[p->pmu] = p; if (ret != PFM_SUCCESS) continue; /* * check if exported by OS if needed */ if (p->os_detect[pfmlib_os->id]) { ret = p->os_detect[pfmlib_os->id](p); if (ret != PFM_SUCCESS) { DPRINT("%s PMU not exported by OS\n", p->name); continue; } } ret = pfmlib_pmu_activate(p); if (ret == PFM_SUCCESS) nsuccess++; if (pfm_cfg.forced_pmu) { __pfm_vbprintf("PMU forced to %s (%s) : %s\n", p->name, p->desc, ret == PFM_SUCCESS ? "success" : "failure"); return ret; } } DPRINT("%d PMU detected out of %d supported\n", nsuccess, PFMLIB_NUM_PMUS); return PFM_SUCCESS; } static void pfmlib_init_os(void) { int o; pfmlib_os_t *os; pfmlib_for_each_os(o) { os = pfmlib_oses[o]; if (!os->detect) continue; if (os->detect(os) != PFM_SUCCESS) continue; if (os != &pfmlib_os_none && pfmlib_os == &pfmlib_os_none) pfmlib_os = os; DPRINT("OS layer %s activated\n", os->name); os->flags = PFMLIB_OS_FL_ACTIVATED; } DPRINT("default OS layer: %s\n", pfmlib_os->name); } int pfm_initialize(void) { int ret; /* * not atomic */ if (pfm_cfg.initdone) return PFM_SUCCESS; /* * generic sanity checks */ if (PFM_PMU_MAX & (~PFMLIB_PMU_MASK)) { DPRINT("PFM_PMU_MAX exceeds PFMLIB_PMU_MASK\n"); return PFM_ERR_NOTSUPP; } pfmlib_init_env(); /* must be done before pfmlib_init_pmus() */ pfmlib_init_os(); ret = pfmlib_init_pmus(); if (ret != PFM_SUCCESS) return ret; pfm_cfg.initdone = 1; return ret; } void pfm_terminate(void) { pfmlib_pmu_t *pmu; int i; if (PFMLIB_INITIALIZED() == 0) return; pfmlib_for_each_pmu(i) { pmu = pfmlib_pmus[i]; if (!pfmlib_pmu_active(pmu)) continue; if (pmu->pmu_terminate) pmu->pmu_terminate(pmu); } pfm_cfg.initdone = 0; } int pfm_find_event(const char *str) { pfmlib_event_desc_t e; int ret; if (PFMLIB_INITIALIZED() == 0) return PFM_ERR_NOINIT; if (!str) return PFM_ERR_INVAL; memset(&e, 0, sizeof(e)); ret = pfmlib_parse_event(str, &e); if (ret == PFM_SUCCESS) return pfmlib_pidx2idx(e.pmu, e.event); return ret; } static int pfmlib_sanitize_event(pfmlib_event_desc_t *d) { int i, j, ret; /* * fail if duplicate attributes are found */ for(i=0; i < d->nattrs; i++) { for(j=i+1; j < d->nattrs; j++) { ret = pfmlib_same_attr(d, i, j); if (ret == 1) pfmlib_compact_attrs(d, j); else if (ret == -1) return PFM_ERR_ATTR_SET; } } return PFM_SUCCESS; } static int pfmlib_parse_event_attr(char *str, pfmlib_event_desc_t *d) { pfm_event_attr_info_t *ainfo; char *s, *p, *q, *endptr; char yes[2] = "y"; pfm_attr_t type; int aidx = 0, has_val, has_raw_um = 0, has_um = 0; int ret = PFM_ERR_INVAL; s = str; while(s) { p = strchr(s, PFMLIB_ATTR_DELIM); if (p) *p++ = '\0'; q = strchr(s, '='); if (q) *q++ = '\0'; has_val = !!q; /* * check for raw umasks in hexdecimal only */ if (*s == '0' && tolower(*(s+1)) == 'x') { char *endptr = NULL; /* can only have one raw umask */ if (has_raw_um || has_um) { DPRINT("cannot mix raw umask with umask\n"); return PFM_ERR_ATTR; } if (!(d->pmu->flags & PFMLIB_PMU_FL_RAW_UMASK)) { DPRINT("PMU %s does not support RAW umasks\n", d->pmu->name); return PFM_ERR_ATTR; } /* we have reserved an entry at the end of pattrs */ aidx = d->npattrs; ainfo = d->pattrs + aidx; ainfo->name = "RAW_UMASK"; ainfo->type = PFM_ATTR_RAW_UMASK; ainfo->ctrl = PFM_ATTR_CTRL_PMU; ainfo->idx = strtoul(s, &endptr, 0); ainfo->equiv= NULL; if (*endptr) { DPRINT("raw umask (%s) is not a number\n"); return PFM_ERR_ATTR; } has_raw_um = 1; goto found_attr; } for(aidx = 0; aidx < d->npattrs; aidx++) { if (!strcasecmp(d->pattrs[aidx].name, s)) { ainfo = d->pattrs + aidx; /* disambiguate modifier and umask * with the same name : snb::L2_LINES_IN:I:I=1 */ if (has_val && ainfo->type == PFM_ATTR_UMASK) continue; goto found_attr; } } return PFM_ERR_ATTR; found_attr: type = ainfo->type; if (type == PFM_ATTR_UMASK) { has_um = 1; if (has_raw_um) { DPRINT("cannot mix raw umask with umask\n"); return PFM_ERR_ATTR; } } if (ainfo->equiv) { char *z; /* cannot have equiv for attributes with value */ if (has_val) return PFM_ERR_ATTR_VAL; /* copy because it is const */ z = strdup(ainfo->equiv); if (!z) return PFM_ERR_NOMEM; ret = pfmlib_parse_event_attr(z, d); free(z); if (ret != PFM_SUCCESS) return ret; s = p; continue; } /* * we tolerate missing value for boolean attributes. * Presence of the attribute is equivalent to * attr=1, i.e., attribute is set */ if (type != PFM_ATTR_UMASK && type != PFM_ATTR_RAW_UMASK && !has_val) { if (type != PFM_ATTR_MOD_BOOL) return PFM_ERR_ATTR_VAL; has_val = 1; s = yes; /* no const */ goto handle_bool; } d->attrs[d->nattrs].ival = 0; if ((type == PFM_ATTR_UMASK || type == PFM_ATTR_RAW_UMASK) && has_val) return PFM_ERR_ATTR_VAL; if (has_val) { s = q; handle_bool: ret = PFM_ERR_ATTR_VAL; if (!strlen(s)) goto error; if (d->nattrs == PFMLIB_MAX_ATTRS) { DPRINT("too many attributes\n"); ret = PFM_ERR_TOOMANY; goto error; } endptr = NULL; switch(type) { case PFM_ATTR_MOD_BOOL: if (strlen(s) > 1) goto error; if (tolower((int)*s) == 'y' || tolower((int)*s) == 't' || *s == '1') d->attrs[d->nattrs].ival = 1; else if (tolower((int)*s) == 'n' || tolower((int)*s) == 'f' || *s == '0') d->attrs[d->nattrs].ival = 0; else goto error; break; case PFM_ATTR_MOD_INTEGER: d->attrs[d->nattrs].ival = strtoull(s, &endptr, 0); if (*endptr != '\0') goto error; break; default: goto error; } } d->attrs[d->nattrs].id = aidx; d->nattrs++; s = p; } ret = PFM_SUCCESS; error: return ret; } static int pfmlib_build_event_pattrs(pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu; pfmlib_os_t *os; int i, ret, pmu_nattrs = 0, os_nattrs = 0; int npattrs; /* * cannot satisfy request for an OS that was not activated */ os = pfmlib_find_os(e->osid); if (!os) return PFM_ERR_NOTSUPP; pmu = e->pmu; /* get actual PMU number of attributes for the event */ if (pmu->get_event_nattrs) pmu_nattrs = pmu->get_event_nattrs(pmu, e->event); if (os && os->get_os_nattrs) os_nattrs += os->get_os_nattrs(os, e); npattrs = pmu_nattrs + os_nattrs; /* * add extra entry for raw umask, if supported */ if (pmu->flags & PFMLIB_PMU_FL_RAW_UMASK) npattrs++; if (npattrs) { e->pattrs = malloc(npattrs * sizeof(*e->pattrs)); if (!e->pattrs) return PFM_ERR_NOMEM; } /* collect all actual PMU attrs */ for(i = 0; i < pmu_nattrs; i++) { ret = pmu->get_event_attr_info(pmu, e->event, i, e->pattrs+i); if (ret != PFM_SUCCESS) goto error; } e->npattrs = pmu_nattrs; if (os_nattrs) { if (e->osid == os->id && os->get_os_attr_info) { os->get_os_attr_info(os, e); /* * check for conflicts between HW and OS attributes */ if (pmu->validate_pattrs[e->osid]) pmu->validate_pattrs[e->osid](pmu, e); } } for (i = 0; i < e->npattrs; i++) DPRINT("%d %d %d %d %d %s\n", e->event, i, e->pattrs[i].type, e->pattrs[i].ctrl, e->pattrs[i].idx, e->pattrs[i].name); return PFM_SUCCESS; error: free(e->pattrs); e->pattrs = NULL; return ret; } void pfmlib_release_event(pfmlib_event_desc_t *e) { free(e->pattrs); e->pattrs = NULL; } static int pfmlib_parse_equiv_event(const char *event, pfmlib_event_desc_t *d) { pfmlib_pmu_t *pmu = d->pmu; pfm_event_info_t einfo; char *str, *s, *p; int i; int ret; /* * create copy because string is const */ s = str = strdup(event); if (!str) return PFM_ERR_NOMEM; p = strchr(s, PFMLIB_ATTR_DELIM); if (p) *p++ = '\0'; pfmlib_for_each_pmu_event(pmu, i) { ret = pmu->get_event_info(pmu, i, &einfo); if (ret != PFM_SUCCESS) goto error; if (!strcasecmp(einfo.name, s)) goto found; } free(str); return PFM_ERR_NOTFOUND; found: d->pmu = pmu; d->event = i; /* private index */ /* * build_event_pattrs and parse_event_attr * cannot be factorized with pfmlib_parse_event() * because equivalent event may add its own attributes */ ret = pfmlib_build_event_pattrs(d); if (ret != PFM_SUCCESS) goto error; ret = pfmlib_parse_event_attr(p, d); if (ret == PFM_SUCCESS) ret = pfmlib_sanitize_event(d); error: free(str); if (ret != PFM_SUCCESS) pfmlib_release_event(d); return ret; } int pfmlib_parse_event(const char *event, pfmlib_event_desc_t *d) { pfm_event_info_t einfo; char *str, *s, *p; pfmlib_pmu_t *pmu; const char *pname = NULL; int i, j, ret; /* * create copy because string is const */ s = str = strdup(event); if (!str) return PFM_ERR_NOMEM; /* * ignore everything passed after a comma * (simplify dealing with const event list) * * safe to do before pname, because now * PMU name cannot have commas in them. */ p = strchr(s, PFMLIB_EVENT_DELIM); if (p) *p = '\0'; /* check for optional PMU name */ p = strstr(s, PFMLIB_PMU_DELIM); if (p) { *p = '\0'; pname = s; s = p + strlen(PFMLIB_PMU_DELIM); } p = strchr(s, PFMLIB_ATTR_DELIM); if (p) *p++ = '\0'; /* * for each pmu */ pfmlib_for_each_pmu(j) { pmu = pfmlib_pmus[j]; /* * if no explicit PMU name is given, then * only look for active PMU models */ if (!pname && !pfmlib_pmu_active(pmu)) continue; /* * check for requested PMU name, */ if (pname && strcasecmp(pname, pmu->name)) continue; /* * only allow event on inactive PMU if enabled via * environement variable */ if (pname && !pfmlib_pmu_active(pmu) && !pfm_cfg.inactive) continue; /* * for each event */ pfmlib_for_each_pmu_event(pmu, i) { ret = pmu->get_event_info(pmu, i, &einfo); if (ret != PFM_SUCCESS) goto error; if (!strcasecmp(einfo.name, s)) goto found; } } free(str); return PFM_ERR_NOTFOUND; found: d->pmu = pmu; /* * handle equivalence */ if (einfo.equiv) { ret = pfmlib_parse_equiv_event(einfo.equiv, d); if (ret != PFM_SUCCESS) goto error; } else { d->event = i; /* private index */ ret = pfmlib_build_event_pattrs(d); if (ret != PFM_SUCCESS) goto error; } /* * parse attributes from original event */ ret = pfmlib_parse_event_attr(p, d); if (ret == PFM_SUCCESS) ret = pfmlib_sanitize_event(d); for (i = 0; i < d->nattrs; i++) { pfm_event_attr_info_t *a = attr(d, i); if (a->type != PFM_ATTR_RAW_UMASK) DPRINT("%d %d %d %s\n", d->event, i, a->idx, d->pattrs[d->attrs[i].id].name); else DPRINT("%d %d RAW_UMASK (0x%x)\n", d->event, i, a->idx); } error: free(str); if (ret != PFM_SUCCESS) pfmlib_release_event(d); return ret; } /* sorry, only English supported at this point! */ static const char *pfmlib_err_list[]= { "success", "not supported", "invalid parameters", "pfmlib not initialized", "event not found", "invalid combination of model specific features", "invalid or missing unit mask", "out of memory", "invalid event attribute", "invalid event attribute value", "attribute value already set", "too many parameters", "parameter is too small", }; static int pfmlib_err_count = (int)sizeof(pfmlib_err_list)/sizeof(char *); const char * pfm_strerror(int code) { code = -code; if (code <0 || code >= pfmlib_err_count) return "unknown error code"; return pfmlib_err_list[code]; } int pfm_get_version(void) { return LIBPFM_VERSION; } int pfm_get_event_next(int idx) { pfmlib_pmu_t *pmu; int pidx; pmu = pfmlib_idx2pidx(idx, &pidx); if (!pmu) return -1; pidx = pmu->get_event_next(pmu, pidx); return pidx == -1 ? -1 : pfmlib_pidx2idx(pmu, pidx); } int pfm_get_os_event_encoding(const char *str, int dfl_plm, pfm_os_t uos, void *args) { pfmlib_os_t *os; if (PFMLIB_INITIALIZED() == 0) return PFM_ERR_NOINIT; if (!(args && str)) return PFM_ERR_INVAL; if (dfl_plm & ~(PFM_PLM_ALL)) return PFM_ERR_INVAL; os = pfmlib_find_os(uos); if (!os) return PFM_ERR_NOTSUPP; return os->encode(os, str, dfl_plm, args); } /* * old API maintained for backward compatibility with existing apps * prefer pfm_get_os_event_encoding() */ int pfm_get_event_encoding(const char *str, int dfl_plm, char **fstr, int *idx, uint64_t **codes, int *count) { pfm_pmu_encode_arg_t arg; int ret; if (!(str && codes && count)) return PFM_ERR_INVAL; if ((*codes && !*count) || (!*codes && *count)) return PFM_ERR_INVAL; memset(&arg, 0, sizeof(arg)); arg.fstr = fstr; arg.codes = *codes; arg.count = *count; arg.size = sizeof(arg); /* * request RAW PMU encoding */ ret = pfm_get_os_event_encoding(str, dfl_plm, PFM_OS_NONE, &arg); if (ret != PFM_SUCCESS) return ret; /* handle the case where the array was allocated */ *codes = arg.codes; *count = arg.count; if (idx) *idx = arg.idx; return PFM_SUCCESS; } static int pfmlib_check_event_pattrs(pfmlib_pmu_t *pmu, int pidx, pfm_os_t osid, FILE *fp) { pfmlib_event_desc_t e; int i, j, ret; memset(&e, 0, sizeof(e)); e.event = pidx; e.osid = osid; e.pmu = pmu; ret = pfmlib_build_event_pattrs(&e); if (ret != PFM_SUCCESS) { fprintf(fp, "invalid pattrs for event %d\n", pidx); return ret; } ret = PFM_ERR_ATTR; for (i = 0; i < e.npattrs; i++) { for (j = i+1; j < e.npattrs; j++) { if (!strcmp(e.pattrs[i].name, e.pattrs[j].name)) { fprintf(fp, "event %d duplicate pattrs %s\n", pidx, e.pattrs[i].name); goto error; } } } ret = PFM_SUCCESS; error: /* * release resources allocated for event */ pfmlib_release_event(&e); return ret; } static int pfmlib_validate_encoding(char *buf, int plm) { uint64_t *codes = NULL; int count = 0, ret; ret = pfm_get_event_encoding(buf, plm, NULL, NULL, &codes, &count); if (ret != PFM_SUCCESS) { int i; DPRINT("%s ", buf); for(i=0; i < count; i++) __pfm_dbprintf(" %#"PRIx64, codes[i]); __pfm_dbprintf("\n"); } if (codes) free(codes); return ret; } static int pfmlib_pmu_validate_encoding(pfmlib_pmu_t *pmu, FILE *fp) { pfm_event_info_t einfo; pfm_event_attr_info_t ainfo; char *buf; size_t maxlen = 0, len; int i, u, n = 0, um; int ret, retval = PFM_SUCCESS; pfmlib_for_each_pmu_event(pmu, i) { ret = pmu->get_event_info(pmu, i, &einfo); if (ret != PFM_SUCCESS) return ret; ret = pfmlib_check_event_pattrs(pmu, i, PFM_OS_NONE, fp); if (ret != PFM_SUCCESS) return ret; len = strlen(einfo.name); if (len > maxlen) maxlen = len; for_each_pmu_event_attr(u, &einfo) { ret = pmu->get_event_attr_info(pmu, i, u, &ainfo); if (ret != PFM_SUCCESS) return ret; if (ainfo.type != PFM_ATTR_UMASK) continue; len = strlen(einfo.name) + strlen(ainfo.name); if (len > maxlen) maxlen = len; } } /* 2 = ::, 1=:, 1=eol */ maxlen += strlen(pmu->name) + 2 + 1 + 1; buf = malloc(maxlen); if (!buf) return PFM_ERR_NOMEM; pfmlib_for_each_pmu_event(pmu, i) { ret = pmu->get_event_info(pmu, i, &einfo); if (ret != PFM_SUCCESS) { retval = ret; continue; } um = 0; for_each_pmu_event_attr(u, &einfo) { ret = pmu->get_event_attr_info(pmu, i, u, &ainfo); if (ret != PFM_SUCCESS) { retval = ret; continue; } if (ainfo.type != PFM_ATTR_UMASK) continue; /* * XXX: some events may require more than one umasks to encode */ sprintf(buf, "%s::%s:%s", pmu->name, einfo.name, ainfo.name); ret = pfmlib_validate_encoding(buf, PFM_PLM3|PFM_PLM0); if (ret != PFM_SUCCESS) { if (pmu->can_auto_encode) { if (!pmu->can_auto_encode(pmu, i, u)) continue; } /* * some PMU may not support raw encoding */ if (ret != PFM_ERR_NOTSUPP) { fprintf(fp, "cannot encode event %s : %s\n", buf, pfm_strerror(ret)); retval = ret; } continue; } um++; } if (um == 0) { sprintf(buf, "%s::%s", pmu->name, einfo.name); ret = pfmlib_validate_encoding(buf, PFM_PLM3|PFM_PLM0); if (ret != PFM_SUCCESS) { if (pmu->can_auto_encode) { if (!pmu->can_auto_encode(pmu, i, u)) continue; } if (ret != PFM_ERR_NOTSUPP) { fprintf(fp, "cannot encode event %s : %s\n", buf, pfm_strerror(ret)); retval = ret; } continue; } } n++; } free(buf); return retval; } int pfm_pmu_validate(pfm_pmu_t pmu_id, FILE *fp) { pfmlib_pmu_t *pmu, *pmx; int nos = 0; int i, ret; if (fp == NULL) return PFM_ERR_INVAL; pmu = pmu2pmuidx(pmu_id); if (!pmu) return PFM_ERR_INVAL; if (!pfmlib_pmu_initialized(pmu)) { fprintf(fp, "pmu: %s :: initialization failed\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->name) { fprintf(fp, "pmu id: %d :: no name\n", pmu->pmu); return PFM_ERR_INVAL; } if (!pmu->desc) { fprintf(fp, "pmu: %s :: no description\n", pmu->name); return PFM_ERR_INVAL; } if (pmu->pmu >= PFM_PMU_MAX) { fprintf(fp, "pmu: %s :: invalid PMU id\n", pmu->name); return PFM_ERR_INVAL; } if (pmu->max_encoding >= PFMLIB_MAX_ENCODING) { fprintf(fp, "pmu: %s :: max encoding too high\n", pmu->name); return PFM_ERR_INVAL; } if (pfmlib_pmu_active(pmu) && !pmu->pme_count) { fprintf(fp, "pmu: %s :: no events\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->pmu_detect) { fprintf(fp, "pmu: %s :: missing pmu_detect callback\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->get_event_first) { fprintf(fp, "pmu: %s :: missing get_event_first callback\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->get_event_next) { fprintf(fp, "pmu: %s :: missing get_event_next callback\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->get_event_info) { fprintf(fp, "pmu: %s :: missing get_event_info callback\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->get_event_attr_info) { fprintf(fp, "pmu: %s :: missing get_event_attr_info callback\n", pmu->name); return PFM_ERR_INVAL; } for (i = PFM_OS_NONE; i < PFM_OS_MAX; i++) { if (pmu->get_event_encoding[i]) nos++; } if (!nos) { fprintf(fp, "pmu: %s :: no os event encoding callback\n", pmu->name); return PFM_ERR_INVAL; } if (!pmu->max_encoding) { fprintf(fp, "pmu: %s :: max_encoding is zero\n", pmu->name); return PFM_ERR_INVAL; } /* look for duplicate names, id */ pfmlib_for_each_pmu(i) { pmx = pfmlib_pmus[i]; if (!pfmlib_pmu_active(pmx)) continue; if (pmx == pmu) continue; if (!strcasecmp(pmx->name, pmu->name)) { fprintf(fp, "pmu: %s :: duplicate name\n", pmu->name); return PFM_ERR_INVAL; } if (pmx->pmu == pmu->pmu) { fprintf(fp, "pmu: %s :: duplicate id\n", pmu->name); return PFM_ERR_INVAL; } } if (pmu->validate_table) { ret = pmu->validate_table(pmu, fp); if (ret != PFM_SUCCESS) return ret; } return pfmlib_pmu_validate_encoding(pmu, fp); } int pfm_get_event_info(int idx, pfm_os_t os, pfm_event_info_t *uinfo) { pfm_event_info_t info; pfmlib_event_desc_t e; pfmlib_pmu_t *pmu; size_t sz = sizeof(info); int pidx, ret; if (!PFMLIB_INITIALIZED()) return PFM_ERR_NOINIT; if (os >= PFM_OS_MAX) return PFM_ERR_INVAL; pmu = pfmlib_idx2pidx(idx, &pidx); if (!pmu) return PFM_ERR_INVAL; if (!uinfo) return PFM_ERR_INVAL; sz = pfmlib_check_struct(uinfo, uinfo->size, PFM_EVENT_INFO_ABI0, sz); if (!sz) return PFM_ERR_INVAL; memset(&info, 0, sizeof(info)); info.size = sz; /* default data type is uint64 */ info.dtype = PFM_DTYPE_UINT64; /* reset flags */ info.is_precise = 0; ret = pmu->get_event_info(pmu, pidx, &info); if (ret != PFM_SUCCESS) return ret; info.pmu = pmu->pmu; info.idx = idx; memset(&e, 0, sizeof(e)); e.event = pidx; e.osid = os; e.pmu = pmu; ret = pfmlib_build_event_pattrs(&e); if (ret == PFM_SUCCESS) { info.nattrs = e.npattrs; memcpy(uinfo, &info, sz); } pfmlib_release_event(&e); return ret; } int pfm_get_event_attr_info(int idx, int attr_idx, pfm_os_t os, pfm_event_attr_info_t *uinfo) { pfm_event_attr_info_t info; pfmlib_event_desc_t e; pfmlib_pmu_t *pmu; size_t sz = sizeof(info); int pidx, ret; if (!PFMLIB_INITIALIZED()) return PFM_ERR_NOINIT; if (attr_idx < 0) return PFM_ERR_INVAL; if (os >= PFM_OS_MAX) return PFM_ERR_INVAL; pmu = pfmlib_idx2pidx(idx, &pidx); if (!pmu) return PFM_ERR_INVAL; if (!uinfo) return PFM_ERR_INVAL; sz = pfmlib_check_struct(uinfo, uinfo->size, PFM_ATTR_INFO_ABI0, sz); if (!sz) return PFM_ERR_INVAL; memset(&e, 0, sizeof(e)); e.event = pidx; e.osid = os; e.pmu = pmu; ret = pfmlib_build_event_pattrs(&e); if (ret != PFM_SUCCESS) return ret; ret = PFM_ERR_INVAL; if (attr_idx >= e.npattrs) goto error; /* * copy event_attr_info */ info = e.pattrs[attr_idx]; /* * rewrite size to reflect what we are returning */ info.size = sz; /* * info.idx = private, namespace specific index, * should not be visible externally, so override * with public index */ info.idx = attr_idx; memcpy(uinfo, &info, sz); ret = PFM_SUCCESS; error: pfmlib_release_event(&e); return ret; } int pfm_get_pmu_info(pfm_pmu_t pmuid, pfm_pmu_info_t *uinfo) { pfm_pmu_info_t info; pfmlib_pmu_t *pmu; size_t sz = sizeof(info); int pidx; if (!PFMLIB_INITIALIZED()) return PFM_ERR_NOINIT; if (pmuid >= PFM_PMU_MAX) return PFM_ERR_INVAL; if (!uinfo) return PFM_ERR_INVAL; sz = pfmlib_check_struct(uinfo, uinfo->size, PFM_PMU_INFO_ABI0, sz); if (!sz) return PFM_ERR_INVAL; pmu = pfmlib_pmus_map[pmuid]; if (!pmu) return PFM_ERR_NOTSUPP; info.name = pmu->name; info.desc = pmu->desc; info.pmu = pmuid; info.size = sz; info.max_encoding = pmu->max_encoding; info.num_cntrs = pmu->num_cntrs; info.num_fixed_cntrs = pmu->num_fixed_cntrs; pidx = pmu->get_event_first(pmu); if (pidx == -1) info.first_event = -1; else info.first_event = pfmlib_pidx2idx(pmu, pidx); /* * XXX: pme_count only valid when PMU is detected */ info.is_present = pfmlib_pmu_active(pmu); info.is_dfl = !!(pmu->flags & PFMLIB_PMU_FL_ARCH_DFL); info.type = pmu->type; if (pmu->get_num_events) info.nevents = pmu->get_num_events(pmu); else info.nevents = pmu->pme_count; memcpy(uinfo, &info, sz); return PFM_SUCCESS; } pfmlib_pmu_t * pfmlib_get_pmu_by_type(pfm_pmu_type_t t) { pfmlib_pmu_t *pmu; int i; pfmlib_for_each_pmu(i) { pmu = pfmlib_pmus[i]; if (!pfmlib_pmu_active(pmu)) continue; /* first match */ if (pmu->type != t) continue; return pmu; } return NULL; } static int pfmlib_compare_attr_id(const void *a, const void *b) { const pfmlib_attr_t *t1 = a; const pfmlib_attr_t *t2 = b; if (t1->id < t2->id) return -1; return t1->id == t2->id ? 0 : 1; } void pfmlib_sort_attr(pfmlib_event_desc_t *e) { qsort(e->attrs, e->nattrs, sizeof(pfmlib_attr_t), pfmlib_compare_attr_id); } static int pfmlib_raw_pmu_encode(void *this, const char *str, int dfl_plm, void *data) { pfm_pmu_encode_arg_t arg; pfm_pmu_encode_arg_t *uarg = data; pfmlib_pmu_t *pmu; pfmlib_event_desc_t e; size_t sz = sizeof(arg); int ret, i; sz = pfmlib_check_struct(uarg, uarg->size, PFM_RAW_ENCODE_ABI0, sz); if (!sz) return PFM_ERR_INVAL; memset(&arg, 0, sizeof(arg)); /* * get input data */ memcpy(&arg, uarg, sz); memset(&e, 0, sizeof(e)); e.osid = PFM_OS_NONE; e.dfl_plm = dfl_plm; ret = pfmlib_parse_event(str, &e); if (ret != PFM_SUCCESS) return ret; pmu = e.pmu; if (!pmu->get_event_encoding[PFM_OS_NONE]) { DPRINT("PMU %s does not support PFM_OS_NONE\n", pmu->name); return PFM_ERR_NOTSUPP; } ret = pmu->get_event_encoding[PFM_OS_NONE](pmu, &e); if (ret != PFM_SUCCESS) goto error; /* * return opaque event identifier */ arg.idx = pfmlib_pidx2idx(e.pmu, e.event); if (arg.codes == NULL) { ret = PFM_ERR_NOMEM; arg.codes = malloc(sizeof(uint64_t) * e.count); if (!arg.codes) goto error_fstr; } else if (arg.count < e.count) { ret = PFM_ERR_TOOSMALL; goto error_fstr; } arg.count = e.count; for (i = 0; i < e.count; i++) arg.codes[i] = e.codes[i]; if (arg.fstr) { ret = pfmlib_build_fstr(&e, arg.fstr); if (ret != PFM_SUCCESS) goto error; } ret = PFM_SUCCESS; /* copy out results */ memcpy(uarg, &arg, sz); error_fstr: if (ret != PFM_SUCCESS) free(arg.fstr); error: /* * release resources allocated for event */ pfmlib_release_event(&e); return ret; } static int pfmlib_raw_pmu_detect(void *this) { return PFM_SUCCESS; } static pfmlib_os_t pfmlib_os_none= { .name = "No OS (raw PMU)", .id = PFM_OS_NONE, .flags = PFMLIB_OS_FL_ACTIVATED, .encode = pfmlib_raw_pmu_encode, .detect = pfmlib_raw_pmu_detect, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64.c0000600003276200002170000005221412247131124017445 0ustar ralphundrgrad/* * pfmlib_amd64.c : support for the AMD64 architected PMU * (for both 64 and 32 bit modes) * * Copyright (c) 2009 Google, Inc * Contributed by Stephane Eranian * * Based on: * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_amd64_priv.h" /* architecture private */ const pfmlib_attr_desc_t amd64_mods[]={ PFM_ATTR_B("k", "monitor at priv level 0"), /* monitor priv level 0 */ PFM_ATTR_B("u", "monitor at priv level 1, 2, 3"), /* monitor priv level 1, 2, 3 */ PFM_ATTR_B("e", "edge level"), /* edge */ PFM_ATTR_B("i", "invert"), /* invert */ PFM_ATTR_I("c", "counter-mask in range [0-255]"), /* counter-mask */ PFM_ATTR_B("h", "monitor in hypervisor"), /* monitor in hypervisor*/ PFM_ATTR_B("g", "measure in guest"), /* monitor in guest */ PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; pfmlib_pmu_t amd64_support; pfm_amd64_config_t pfm_amd64_cfg; static int amd64_num_mods(void *this, int idx) { const amd64_entry_t *pe = this_pe(this); unsigned int mask; mask = pe[idx].modmsk; return pfmlib_popcnt(mask); } static inline int amd64_eflag(void *this, int idx, int flag) { const amd64_entry_t *pe = this_pe(this); return !!(pe[idx].flags & flag); } static inline int amd64_uflag(void *this, int idx, int attr, int flag) { const amd64_entry_t *pe = this_pe(this); return !!(pe[idx].umasks[attr].uflags & flag); } static inline int amd64_event_ibsfetch(void *this, int idx) { return amd64_eflag(this, idx, AMD64_FL_IBSFE); } static inline int amd64_event_ibsop(void *this, int idx) { return amd64_eflag(this, idx, AMD64_FL_IBSOP); } static inline int amd64_from_rev(unsigned int flags) { return ((flags) >> 8) & 0xff; } static inline int amd64_till_rev(unsigned int flags) { int till = (((flags)>>16) & 0xff); if (!till) return 0xff; return till; } static void amd64_get_revision(pfm_amd64_config_t *cfg) { pfm_pmu_t rev = PFM_PMU_NONE; if (cfg->family == 6) { cfg->revision = PFM_PMU_AMD64_K7; return; } if (cfg->family == 15) { switch (cfg->model >> 4) { case 0: if (cfg->model == 5 && cfg->stepping < 2) rev = PFM_PMU_AMD64_K8_REVB; if (cfg->model == 4 && cfg->stepping == 0) rev = PFM_PMU_AMD64_K8_REVB; rev = PFM_PMU_AMD64_K8_REVC; break; case 1: rev = PFM_PMU_AMD64_K8_REVD; break; case 2: case 3: rev = PFM_PMU_AMD64_K8_REVE; break; case 4: case 5: case 0xc: rev = PFM_PMU_AMD64_K8_REVF; break; case 6: case 7: case 8: rev = PFM_PMU_AMD64_K8_REVG; break; default: rev = PFM_PMU_AMD64_K8_REVB; } } else if (cfg->family == 16) { /* family 10h */ switch (cfg->model) { case 4: case 5: case 6: rev = PFM_PMU_AMD64_FAM10H_SHANGHAI; break; case 8: case 9: rev = PFM_PMU_AMD64_FAM10H_ISTANBUL; break; default: rev = PFM_PMU_AMD64_FAM10H_BARCELONA; } } else if (cfg->family == 17) { /* family 11h */ switch (cfg->model) { default: rev = PFM_PMU_AMD64_FAM11H_TURION; } } else if (cfg->family == 18) { /* family 12h */ switch (cfg->model) { default: rev = PFM_PMU_AMD64_FAM12H_LLANO; } } else if (cfg->family == 20) { /* family 14h */ switch (cfg->model) { default: rev = PFM_PMU_AMD64_FAM14H_BOBCAT; } } else if (cfg->family == 21) { /* family 15h */ rev = PFM_PMU_AMD64_FAM15H_INTERLAGOS; } cfg->revision = rev; } /* * .byte 0x53 == push ebx. it's universal for 32 and 64 bit * .byte 0x5b == pop ebx. * Some gcc's (4.1.2 on Core2) object to pairing push/pop and ebx in 64 bit mode. * Using the opcode directly avoids this problem. */ static inline void cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, unsigned int *d) { __asm__ __volatile__ (".byte 0x53\n\tcpuid\n\tmovl %%ebx, %%esi\n\t.byte 0x5b" : "=a" (*a), "=S" (*b), "=c" (*c), "=d" (*d) : "a" (op)); } static int amd64_event_valid(void *this, int i) { const amd64_entry_t *pe = this_pe(this); pfmlib_pmu_t *pmu = this; int flags; flags = pe[i].flags; if (pmu->pmu_rev < amd64_from_rev(flags)) return 0; if (pmu->pmu_rev > amd64_till_rev(flags)) return 0; /* no restrictions or matches restrictions */ return 1; } static int amd64_umask_valid(void *this, int i, int attr) { pfmlib_pmu_t *pmu = this; const amd64_entry_t *pe = this_pe(this); int flags; flags = pe[i].umasks[attr].uflags; if (pmu->pmu_rev < amd64_from_rev(flags)) return 0; if (pmu->pmu_rev > amd64_till_rev(flags)) return 0; /* no restrictions or matches restrictions */ return 1; } static unsigned int amd64_num_umasks(void *this, int pidx) { const amd64_entry_t *pe = this_pe(this); unsigned int i, n = 0; /* unit masks + modifiers */ for (i = 0; i < pe[pidx].numasks; i++) if (amd64_umask_valid(this, pidx, i)) n++; return n; } static int amd64_get_umask(void *this, int pidx, int attr_idx) { const amd64_entry_t *pe = this_pe(this); unsigned int i; int n; for (i=0, n = 0; i < pe[pidx].numasks; i++) { if (!amd64_umask_valid(this, pidx, i)) continue; if (n++ == attr_idx) return i; } return -1; } static inline int amd64_attr2mod(void *this, int pidx, int attr_idx) { const amd64_entry_t *pe = this_pe(this); size_t x; int n; n = attr_idx - amd64_num_umasks(this, pidx); pfmlib_for_each_bit(x, pe[pidx].modmsk) { if (n == 0) break; n--; } return x; } void amd64_display_reg(void *this, pfmlib_event_desc_t *e, pfm_amd64_reg_t reg) { pfmlib_pmu_t *pmu = this; if (IS_FAMILY_10H(pmu) || IS_FAMILY_15H(pmu)) __pfm_vbprintf("[0x%"PRIx64" event_sel=0x%x umask=0x%x os=%d usr=%d en=%d int=%d inv=%d edge=%d cnt_mask=%d guest=%d host=%d] %s\n", reg.val, reg.sel_event_mask | (reg.sel_event_mask2 << 8), reg.sel_unit_mask, reg.sel_os, reg.sel_usr, reg.sel_en, reg.sel_int, reg.sel_inv, reg.sel_edge, reg.sel_cnt_mask, reg.sel_guest, reg.sel_host, e->fstr); else __pfm_vbprintf("[0x%"PRIx64" event_sel=0x%x umask=0x%x os=%d usr=%d en=%d int=%d inv=%d edge=%d cnt_mask=%d] %s\n", reg.val, reg.sel_event_mask, reg.sel_unit_mask, reg.sel_os, reg.sel_usr, reg.sel_en, reg.sel_int, reg.sel_inv, reg.sel_edge, reg.sel_cnt_mask, e->fstr); } int pfm_amd64_detect(void *this) { unsigned int a, b, c, d; char buffer[128]; if (pfm_amd64_cfg.family) return PFM_SUCCESS; cpuid(0, &a, &b, &c, &d); strncpy(&buffer[0], (char *)(&b), 4); strncpy(&buffer[4], (char *)(&d), 4); strncpy(&buffer[8], (char *)(&c), 4); buffer[12] = '\0'; if (strcmp(buffer, "AuthenticAMD")) return PFM_ERR_NOTSUPP; cpuid(1, &a, &b, &c, &d); pfm_amd64_cfg.family = (a >> 8) & 0x0000000f; // bits 11 - 8 pfm_amd64_cfg.model = (a >> 4) & 0x0000000f; // Bits 7 - 4 if (pfm_amd64_cfg.family == 0xf) { pfm_amd64_cfg.family += (a >> 20) & 0x000000ff; // Extended family pfm_amd64_cfg.model |= (a >> 12) & 0x000000f0; // Extended model } pfm_amd64_cfg.stepping= a & 0x0000000f; // bits 3 - 0 amd64_get_revision(&pfm_amd64_cfg); if (pfm_amd64_cfg.revision == PFM_PMU_NONE) return PFM_ERR_NOTSUPP; return PFM_SUCCESS; } static int amd64_add_defaults(void *this, pfmlib_event_desc_t *e, unsigned int msk, uint64_t *umask) { const amd64_entry_t *ent, *pe = this_pe(this); unsigned int i; int j, k, added, omit, numasks_grp; int idx; k = e->nattrs; ent = pe+e->event; for(i=0; msk; msk >>=1, i++) { if (!(msk & 0x1)) continue; added = omit = numasks_grp = 0; for (j = 0; j < e->npattrs; j++) { if (e->pattrs[j].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[j].type != PFM_ATTR_UMASK) continue; idx = e->pattrs[j].idx; if (ent->umasks[idx].grpid != i) continue; /* number of umasks in this group */ numasks_grp++; if (amd64_uflag(this, e->event, idx, AMD64_FL_DFL)) { DPRINT("added default for %s j=%d idx=%d\n", ent->umasks[idx].uname, j, idx); *umask |= ent->umasks[idx].ucode; e->attrs[k].id = j; /* pattrs index */ e->attrs[k].ival = 0; k++; added++; } if (amd64_uflag(this, e->event, idx, AMD64_FL_OMIT)) omit++; } /* * fail if no default was found AND at least one umasks cannot be omitted * in the group */ if (!added && omit != numasks_grp) { DPRINT("no default found for event %s unit mask group %d\n", ent->name, i); return PFM_ERR_UMASK; } } e->nattrs = k; return PFM_SUCCESS; } int pfm_amd64_get_encoding(void *this, pfmlib_event_desc_t *e) { const amd64_entry_t *pe = this_pe(this); pfm_amd64_reg_t reg; pfm_event_attr_info_t *a; uint64_t umask = 0; unsigned int plmmsk = 0; int k, ret, grpid; unsigned int grpmsk, ugrpmsk = 0; int grpcounts[AMD64_MAX_GRP]; int ncombo[AMD64_MAX_GRP]; memset(grpcounts, 0, sizeof(grpcounts)); memset(ncombo, 0, sizeof(ncombo)); e->fstr[0] = '\0'; reg.val = 0; /* assume reserved bits are zerooed */ grpmsk = (1 << pe[e->event].ngrp)-1; if (amd64_event_ibsfetch(this, e->event)) reg.ibsfetch.en = 1; else if (amd64_event_ibsop(this, e->event)) reg.ibsop.en = 1; else { reg.sel_event_mask = pe[e->event].code; reg.sel_event_mask2 = pe[e->event].code >> 8; reg.sel_en = 1; /* force enable */ reg.sel_int = 1; /* force APIC */ } for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { grpid = pe[e->event].umasks[a->idx].grpid; ++grpcounts[grpid]; /* * upper layer has removed duplicates * so if we come here more than once, it is for two * diinct umasks */ if (amd64_uflag(this, e->event, a->idx, AMD64_FL_NCOMBO)) ncombo[grpid] = 1; /* * if more than one umask in this group but one is marked * with ncombo, then fail. It is okay to combine umask within * a group as long as none is tagged with NCOMBO */ if (grpcounts[grpid] > 1 && ncombo[grpid]) { DPRINT("event does not support unit mask combination within a group\n"); return PFM_ERR_FEATCOMB; } umask |= pe[e->event].umasks[a->idx].ucode; ugrpmsk |= 1 << pe[e->event].umasks[a->idx].grpid; } else if (a->type == PFM_ATTR_RAW_UMASK) { /* there can only be one RAW_UMASK per event */ /* sanity checks */ if (a->idx & ~0xff) { DPRINT("raw umask is invalid\n"); return PFM_ERR_ATTR; } /* override umask */ umask = a->idx & 0xff; ugrpmsk = grpmsk; } else { /* modifiers */ uint64_t ival = e->attrs[k].ival; switch(a->idx) { //amd64_attr2mod(this, e->osid, e->event, a->idx)) { case AMD64_ATTR_I: /* invert */ reg.sel_inv = !!ival; break; case AMD64_ATTR_E: /* edge */ reg.sel_edge = !!ival; break; case AMD64_ATTR_C: /* counter-mask */ if (ival > 255) return PFM_ERR_ATTR_VAL; reg.sel_cnt_mask = ival; break; case AMD64_ATTR_U: /* USR */ reg.sel_usr = !!ival; plmmsk |= _AMD64_ATTR_U; break; case AMD64_ATTR_K: /* OS */ reg.sel_os = !!ival; plmmsk |= _AMD64_ATTR_K; break; case AMD64_ATTR_G: /* GUEST */ reg.sel_guest = !!ival; plmmsk |= _AMD64_ATTR_G; break; case AMD64_ATTR_H: /* HOST */ reg.sel_host = !!ival; plmmsk |= _AMD64_ATTR_H; break; } } } /* * handle case where no priv level mask was passed. * then we use the dfl_plm */ if (!(plmmsk & (_AMD64_ATTR_K|_AMD64_ATTR_U|_AMD64_ATTR_H))) { if (e->dfl_plm & PFM_PLM0) reg.sel_os = 1; if (e->dfl_plm & PFM_PLM3) reg.sel_usr = 1; if ((IS_FAMILY_10H(this) || IS_FAMILY_15H(this)) && e->dfl_plm & PFM_PLMH) reg.sel_host = 1; } /* * check that there is at least of unit mask in each unit * mask group */ if (ugrpmsk != grpmsk) { ugrpmsk ^= grpmsk; ret = amd64_add_defaults(this, e, ugrpmsk, &umask); if (ret != PFM_SUCCESS) return ret; } reg.sel_unit_mask = umask; e->codes[0] = reg.val; e->count = 1; /* * reorder all the attributes such that the fstr appears always * the same regardless of how the attributes were submitted. */ evt_strcat(e->fstr, "%s", pe[e->event].name); pfmlib_sort_attr(e); for (k = 0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", pe[e->event].umasks[a->idx].uname); else if (a->type == PFM_ATTR_RAW_UMASK) evt_strcat(e->fstr, ":0x%x", a->idx); } for (k = 0; k < e->npattrs; k++) { int idx; if (e->pattrs[k].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[k].type == PFM_ATTR_UMASK) continue; idx = e->pattrs[k].idx; switch(idx) { case AMD64_ATTR_K: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_os); break; case AMD64_ATTR_U: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_usr); break; case AMD64_ATTR_E: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_edge); break; case AMD64_ATTR_I: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_inv); break; case AMD64_ATTR_C: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_cnt_mask); break; case AMD64_ATTR_H: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_host); break; case AMD64_ATTR_G: evt_strcat(e->fstr, ":%s=%lu", amd64_mods[idx].name, reg.sel_guest); break; } } amd64_display_reg(this, e, reg); return PFM_SUCCESS; } int pfm_amd64_get_event_first(void *this) { pfmlib_pmu_t *pmu = this; int idx; for(idx=0; idx < pmu->pme_count; idx++) if (amd64_event_valid(this, idx)) return idx; return -1; } int pfm_amd64_get_event_next(void *this, int idx) { pfmlib_pmu_t *pmu = this; /* basic validity checks on idx down by caller */ if (idx >= (pmu->pme_count-1)) return -1; /* validate event fo this host PMU */ if (!amd64_event_valid(this, idx)) return -1; for(++idx; idx < pmu->pme_count; idx++) { if (amd64_event_valid(this, idx)) return idx; } return -1; } int pfm_amd64_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *pmu = this; if (pidx < 0 || pidx >= pmu->pme_count) return 0; /* valid revision */ return amd64_event_valid(this, pidx); } int pfm_amd64_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { const amd64_entry_t *pe = this_pe(this); int numasks, idx; numasks = amd64_num_umasks(this, pidx); if (attr_idx < numasks) { idx = amd64_get_umask(this, pidx, attr_idx); if (idx == -1) return PFM_ERR_ATTR; info->name = pe[pidx].umasks[idx].uname; info->desc = pe[pidx].umasks[idx].udesc; info->code = pe[pidx].umasks[idx].ucode; info->type = PFM_ATTR_UMASK; info->is_dfl = amd64_uflag(this, pidx, idx, AMD64_FL_DFL); } else { idx = amd64_attr2mod(this, pidx, attr_idx); info->name = amd64_mods[idx].name; info->desc = amd64_mods[idx].desc; info->type = amd64_mods[idx].type; info->code = idx; info->is_dfl = 0; } info->is_precise = 0; info->equiv = NULL; info->ctrl = PFM_ATTR_CTRL_PMU; info->idx = idx; /* namespace specific index */ info->dfl_val64 = 0; return PFM_SUCCESS; } int pfm_amd64_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const amd64_entry_t *pe = this_pe(this); info->name = pe[idx].name; info->desc = pe[idx].desc; info->equiv = NULL; info->code = pe[idx].code; info->idx = idx; info->pmu = pmu->pmu; info->is_precise = 0; info->nattrs = amd64_num_umasks(this, idx); info->nattrs += amd64_num_mods(this, idx); return PFM_SUCCESS; } int pfm_amd64_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const amd64_entry_t *pe = this_pe(this); const char *name = pmu->name; unsigned int j, k; int i, ndfl; int error = 0; if (!pmu->atdesc) { fprintf(fp, "pmu: %s missing attr_desc\n", pmu->name); error++; } if (!pmu->supported_plm && pmu->type == PFM_PMU_TYPE_CORE) { fprintf(fp, "pmu: %s supported_plm not set\n", pmu->name); error++; } for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", name, i, pe[i].name); error++; } if (pe[i].numasks && pe[i].umasks == NULL) { fprintf(fp, "pmu: %s event%d: %s :: numasks but no umasks\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks == 0 && pe[i].umasks) { fprintf(fp, "pmu: %s event%d: %s :: numasks=0 but umasks defined\n", pmu->name, i, pe[i].name); error++; } if (pe[i].numasks && pe[i].ngrp == 0) { fprintf(fp, "pmu: %s event%d: %s :: ngrp cannot be zero\n", name, i, pe[i].name); error++; } if (pe[i].numasks == 0 && pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s :: ngrp must be zero\n", name, i, pe[i].name); error++; } if (pe[i].ngrp >= AMD64_MAX_GRP) { fprintf(fp, "pmu: %s event%d: %s :: ngrp too big (max=%d)\n", name, i, pe[i].name, AMD64_MAX_GRP); error++; } for(ndfl = 0, j= 0; j < pe[i].numasks; j++) { if (!pe[i].umasks[j].uname) { fprintf(fp, "pmu: %s event%d: %s umask%d :: no name\n", pmu->name, i, pe[i].name, j); error++; } if (!pe[i].umasks[j].udesc) { fprintf(fp, "pmu: %s event%d:%s umask%d: %s :: no description\n", name, i, pe[i].name, j, pe[i].umasks[j].uname); error++; } if (pe[i].ngrp && pe[i].umasks[j].grpid >= pe[i].ngrp) { fprintf(fp, "pmu: %s event%d: %s umask%d: %s :: invalid grpid %d (must be < %d)\n", name, i, pe[i].name, j, pe[i].umasks[j].uname, pe[i].umasks[j].grpid, pe[i].ngrp); error++; } if (pe[i].umasks[j].uflags & AMD64_FL_DFL) { for(k=0; k < j; k++) if ((pe[i].umasks[k].uflags == pe[i].umasks[j].uflags) && (pe[i].umasks[k].grpid == pe[i].umasks[j].grpid)) ndfl++; if (pe[i].numasks == 1) ndfl = 1; } } if (pe[i].numasks > 1 && ndfl) { fprintf(fp, "pmu: %s event%d: %s :: more than one default unit mask with same code\n", name, i, pe[i].name); error++; } /* if only one umask, then ought to be default */ if (pe[i].numasks == 1 && ndfl != 1) { fprintf(fp, "pmu: %s event%d: %s, only one umask but no default\n", pmu->name, i, pe[i].name); error++; } if (pe[i].flags & AMD64_FL_NCOMBO) { fprintf(fp, "pmu: %s event%d: %s :: NCOMBO is unit mask only flag\n", name, i, pe[i].name); error++; } for(j=0; j < pe[i].numasks; j++) { if (pe[i].umasks[j].uflags & AMD64_FL_NCOMBO) continue; for(k=j+1; k < pe[i].numasks; k++) { if (pe[i].umasks[k].uflags & AMD64_FL_NCOMBO) continue; if ((pe[i].umasks[j].ucode & pe[i].umasks[k].ucode)) { fprintf(fp, "pmu: %s event%d: %s :: umask %s and %s have overlapping code bits\n", name, i, pe[i].name, pe[i].umasks[j].uname, pe[i].umasks[k].uname); error++; } } } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } unsigned int pfm_amd64_get_event_nattrs(void *this, int pidx) { unsigned int nattrs; nattrs = amd64_num_umasks(this, pidx); nattrs += amd64_num_mods(this, pidx); return nattrs; } int pfm_amd64_get_num_events(void *this) { pfmlib_pmu_t *pmu = this; int i, num = 0; /* * count actual number of events for specific PMU. * Table may contain more events for the family than * what a specific model actually supports. */ for (i = 0; i < pmu->pme_count; i++) if (amd64_event_valid(this, i)) num++; return num; } papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_priv.h0000600003276200002170000001730712247131124020516 0ustar ralphundrgrad/* * Copyright (c) 2004-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef __PFMLIB_AMD64_PRIV_H__ #define __PFMLIB_AMD64_PRIV_H__ #define AMD64_MAX_GRP 4 /* must be < 32 (int) */ typedef struct { const char *uname; /* unit mask name */ const char *udesc; /* event/umask description */ unsigned int ucode; /* unit mask code */ unsigned int uflags; /* unit mask flags */ unsigned int grpid; /* unit mask group id */ } amd64_umask_t; typedef struct { const char *name; /* event name */ const char *desc; /* event description */ const amd64_umask_t *umasks;/* list of umasks */ unsigned int code; /* event code */ unsigned int numasks;/* number of umasks */ unsigned int flags; /* flags */ unsigned int modmsk; /* modifiers bitmask */ unsigned int ngrp; /* number of unit masks groups */ } amd64_entry_t; /* * we keep an internal revision type to avoid * dealing with arbitrarily large pfm_pmu_t * which would not fit into the 8 bits reserved * in amd64_entry_t.flags or amd64_umask_t.flags */ #define AMD64_FAM10H AMD64_FAM10H_REV_B typedef enum { AMD64_CPU_UN = 0, AMD64_K7, AMD64_K8_REV_B, AMD64_K8_REV_C, AMD64_K8_REV_D, AMD64_K8_REV_E, AMD64_K8_REV_F, AMD64_K8_REV_G, AMD64_FAM10H_REV_B, AMD64_FAM10H_REV_C, AMD64_FAM10H_REV_D, AMD64_FAM14H_REV_B, } amd64_rev_t; typedef struct { pfm_pmu_t revision; int family; /* 0 means nothing detected yet */ int model; int stepping; } pfm_amd64_config_t; extern pfm_amd64_config_t pfm_amd64_cfg; /* * flags values (bottom 8 bits only) * bits 00-07: flags * bits 08-15: from revision * bits 16-23: till revision */ #define AMD64_FROM_REV(rev) ((rev)<<8) #define AMD64_TILL_REV(rev) ((rev)<<16) #define AMD64_NOT_SUPP 0x1ff00 #define AMD64_FL_NCOMBO 0x01 /* unit mask can be combined */ #define AMD64_FL_IBSFE 0x02 /* IBS fetch */ #define AMD64_FL_IBSOP 0x04 /* IBS op */ #define AMD64_FL_DFL 0x08 /* unit mask is default choice */ #define AMD64_FL_OMIT 0x10 /* umask can be omitted */ #define AMD64_FL_TILL_K8_REV_C AMD64_TILL_REV(AMD64_K8_REV_C) #define AMD64_FL_K8_REV_D AMD64_FROM_REV(AMD64_K8_REV_D) #define AMD64_FL_K8_REV_E AMD64_FROM_REV(AMD64_K8_REV_E) #define AMD64_FL_TILL_K8_REV_E AMD64_TILL_REV(AMD64_K8_REV_E) #define AMD64_FL_K8_REV_F AMD64_FROM_REV(AMD64_K8_REV_F) #define AMD64_FL_TILL_FAM10H_REV_B AMD64_TILL_REV(AMD64_FAM10H_REV_B) #define AMD64_FL_FAM10H_REV_C AMD64_FROM_REV(AMD64_FAM10H_REV_C) #define AMD64_FL_TILL_FAM10H_REV_C AMD64_TILL_REV(AMD64_FAM10H_REV_C) #define AMD64_FL_FAM10H_REV_D AMD64_FROM_REV(AMD64_FAM10H_REV_D) #define AMD64_ATTR_K 0 #define AMD64_ATTR_U 1 #define AMD64_ATTR_E 2 #define AMD64_ATTR_I 3 #define AMD64_ATTR_C 4 #define AMD64_ATTR_H 5 #define AMD64_ATTR_G 6 #define _AMD64_ATTR_U (1 << AMD64_ATTR_U) #define _AMD64_ATTR_K (1 << AMD64_ATTR_K) #define _AMD64_ATTR_I (1 << AMD64_ATTR_I) #define _AMD64_ATTR_E (1 << AMD64_ATTR_E) #define _AMD64_ATTR_C (1 << AMD64_ATTR_C) #define _AMD64_ATTR_H (1 << AMD64_ATTR_H) #define _AMD64_ATTR_G (1 << AMD64_ATTR_G) #define AMD64_BASIC_ATTRS \ (_AMD64_ATTR_I|_AMD64_ATTR_E|_AMD64_ATTR_C|_AMD64_ATTR_U|_AMD64_ATTR_K) #define AMD64_K8_ATTRS (AMD64_BASIC_ATTRS) #define AMD64_FAM10H_ATTRS (AMD64_BASIC_ATTRS|_AMD64_ATTR_H|_AMD64_ATTR_G) #define AMD64_FAM12H_ATTRS (AMD64_BASIC_ATTRS|_AMD64_ATTR_H|_AMD64_ATTR_G) #define AMD64_FAM14H_ATTRS (AMD64_BASIC_ATTRS|_AMD64_ATTR_H|_AMD64_ATTR_G) #define AMD64_FAM15H_ATTRS (AMD64_BASIC_ATTRS|_AMD64_ATTR_H|_AMD64_ATTR_G) #define AMD64_FAM10H_PLM (PFM_PLM0|PFM_PLM3|PFM_PLMH) #define AMD64_K7_PLM (PFM_PLM0|PFM_PLM3) /* * AMD64 MSR definitions */ typedef union { uint64_t val; /* complete register value */ struct { uint64_t sel_event_mask:8; /* event mask */ uint64_t sel_unit_mask:8; /* unit mask */ uint64_t sel_usr:1; /* user level */ uint64_t sel_os:1; /* system level */ uint64_t sel_edge:1; /* edge detec */ uint64_t sel_pc:1; /* pin control */ uint64_t sel_int:1; /* enable APIC intr */ uint64_t sel_res1:1; /* reserved */ uint64_t sel_en:1; /* enable */ uint64_t sel_inv:1; /* invert counter mask */ uint64_t sel_cnt_mask:8; /* counter mask */ uint64_t sel_event_mask2:4; /* 10h only: event mask [11:8] */ uint64_t sel_res2:4; /* reserved */ uint64_t sel_guest:1; /* 10h only: guest only counter */ uint64_t sel_host:1; /* 10h only: host only counter */ uint64_t sel_res3:22; /* reserved */ } perfsel; struct { uint64_t maxcnt:16; uint64_t cnt:16; uint64_t lat:16; uint64_t en:1; uint64_t val:1; uint64_t comp:1; uint64_t icmiss:1; uint64_t phyaddrvalid:1; uint64_t l1tlbpgsz:2; uint64_t l1tlbmiss:1; uint64_t l2tlbmiss:1; uint64_t randen:1; uint64_t reserved:6; } ibsfetch; struct { uint64_t maxcnt:16; uint64_t reserved1:1; uint64_t en:1; uint64_t val:1; uint64_t reserved2:45; } ibsop; } pfm_amd64_reg_t; /* MSR 0xc001000-0xc001003 */ /* let's define some handy shortcuts! */ #define sel_event_mask perfsel.sel_event_mask #define sel_unit_mask perfsel.sel_unit_mask #define sel_usr perfsel.sel_usr #define sel_os perfsel.sel_os #define sel_edge perfsel.sel_edge #define sel_pc perfsel.sel_pc #define sel_int perfsel.sel_int #define sel_en perfsel.sel_en #define sel_inv perfsel.sel_inv #define sel_cnt_mask perfsel.sel_cnt_mask #define sel_event_mask2 perfsel.sel_event_mask2 #define sel_guest perfsel.sel_guest #define sel_host perfsel.sel_host #define IS_FAMILY_10H(p) (((pfmlib_pmu_t *)(p))->pmu_rev >= AMD64_FAM10H) #define IS_FAMILY_15H(p) (((pfmlib_pmu_t *)(p))->pmu == PFM_PMU_AMD64_FAM15H_INTERLAGOS) extern int pfm_amd64_get_encoding(void *this, pfmlib_event_desc_t *e); extern int pfm_amd64_get_event_first(void *this); extern int pfm_amd64_get_event_next(void *this, int idx); extern int pfm_amd64_event_is_valid(void *this, int idx); extern int pfm_amd64_get_event_attr_info(void *this, int idx, int attr_idx, pfm_event_attr_info_t *info); extern int pfm_amd64_get_event_info(void *this, int idx, pfm_event_info_t *info); extern int pfm_amd64_validate_table(void *this, FILE *fp); extern int pfm_amd64_detect(void *this); extern const pfmlib_attr_desc_t amd64_mods[]; extern unsigned int pfm_amd64_get_event_nattrs(void *this, int pidx); extern int pfm_amd64_get_num_events(void *this); extern int pfm_amd64_get_perf_encoding(void *this, pfmlib_event_desc_t *e); extern void pfm_amd64_perf_validate_pattrs(void *this, pfmlib_event_desc_t *e); #endif /* __PFMLIB_AMD64_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_arm.c0000600003276200002170000001704712247131124017316 0ustar ralphundrgrad/* * pfmlib_arm.c : support for ARM chips * * Copyright (c) 2010 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_arm_priv.h" const pfmlib_attr_desc_t arm_mods[]={ PFM_ATTR_B("k", "monitor at kernel level"), PFM_ATTR_B("u", "monitor at user level"), PFM_ATTR_B("hv", "monitor in hypervisor"), PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */ }; pfm_arm_config_t pfm_arm_cfg; #ifdef CONFIG_PFMLIB_OS_LINUX /* * helper function to retrieve one value from /proc/cpuinfo * for internal libpfm use only * attr: the attribute (line) to look for * ret_buf: a buffer to store the value of the attribute (as a string) * maxlen : number of bytes of capacity in ret_buf * * ret_buf is null terminated. * * Return: * 0 : attribute found, ret_buf populated * -1: attribute not found */ static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { FILE *fp = NULL; int ret = -1; size_t attr_len, buf_len = 0; char *p, *value = NULL; char *buffer = NULL; if (attr == NULL || ret_buf == NULL || maxlen < 1) return -1; attr_len = strlen(attr); fp = fopen("/proc/cpuinfo", "r"); if (fp == NULL) return -1; while(pfmlib_getl(&buffer, &buf_len, fp) != -1){ /* skip blank lines */ if (*buffer == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) goto error; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncmp(attr, buffer, attr_len)) break; } strncpy(ret_buf, value, maxlen-1); ret_buf[maxlen-1] = '\0'; ret = 0; error: free(buffer); fclose(fp); return ret; } #else static int pfmlib_getcpuinfo_attr(const char *attr, char *ret_buf, size_t maxlen) { return -1; } #endif static int arm_num_mods(void *this, int idx) { const arm_entry_t *pe = this_pe(this); unsigned int mask; mask = pe[idx].modmsk; return pfmlib_popcnt(mask); } static inline int arm_attr2mod(void *this, int pidx, int attr_idx) { const arm_entry_t *pe = this_pe(this); size_t x; int n; n = attr_idx; pfmlib_for_each_bit(x, pe[pidx].modmsk) { if (n == 0) break; n--; } return x; } static void pfm_arm_display_reg(void *this, pfmlib_event_desc_t *e, pfm_arm_reg_t reg) { __pfm_vbprintf("[0x%x] %s\n", reg.val, e->fstr); } int pfm_arm_detect(void *this) { int ret; char buffer[128]; ret = pfmlib_getcpuinfo_attr("CPU implementer", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_arm_cfg.implementer = strtol(buffer, NULL, 16); ret = pfmlib_getcpuinfo_attr("CPU part", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_arm_cfg.part = strtol(buffer, NULL, 16); ret = pfmlib_getcpuinfo_attr("CPU architecture", buffer, sizeof(buffer)); if (ret == -1) return PFM_ERR_NOTSUPP; pfm_arm_cfg.architecture = strtol(buffer, NULL, 16); return PFM_SUCCESS; } int pfm_arm_get_encoding(void *this, pfmlib_event_desc_t *e) { const arm_entry_t *pe = this_pe(this); pfm_event_attr_info_t *a; pfm_arm_reg_t reg; unsigned int plm = 0; int i, idx, has_plm = 0; reg.val = pe[e->event].code; for (i = 0; i < e->nattrs; i++) { a = attr(e, i); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type > PFM_ATTR_UMASK) { uint64_t ival = e->attrs[i].ival; switch(a->idx) { case ARM_ATTR_U: /* USR */ if (ival) plm |= PFM_PLM3; has_plm = 1; break; case ARM_ATTR_K: /* OS */ if (ival) plm |= PFM_PLM0; has_plm = 1; break; case ARM_ATTR_HV: /* HYPERVISOR */ if (ival) plm |= PFM_PLMH; has_plm = 1; break; default: return PFM_ERR_ATTR; } } } if (arm_has_plm(this, e)) { if (!has_plm) plm = e->dfl_plm; reg.evtsel.excl_pl1 = !(plm & PFM_PLM0); reg.evtsel.excl_usr = !(plm & PFM_PLM3); reg.evtsel.excl_hyp = !(plm & PFM_PLMH); } evt_strcat(e->fstr, "%s", pe[e->event].name); e->codes[0] = reg.val; e->count = 1; for (i = 0; i < e->npattrs; i++) { if (e->pattrs[i].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[i].type == PFM_ATTR_UMASK) continue; idx = e->pattrs[i].idx; switch(idx) { case ARM_ATTR_K: evt_strcat(e->fstr, ":%s=%lu", arm_mods[idx].name, !reg.evtsel.excl_pl1); break; case ARM_ATTR_U: evt_strcat(e->fstr, ":%s=%lu", arm_mods[idx].name, !reg.evtsel.excl_usr); break; case ARM_ATTR_HV: evt_strcat(e->fstr, ":%s=%lu", arm_mods[idx].name, !reg.evtsel.excl_hyp); break; } } pfm_arm_display_reg(this, e, reg); return PFM_SUCCESS; } int pfm_arm_get_event_first(void *this) { return 0; } int pfm_arm_get_event_next(void *this, int idx) { pfmlib_pmu_t *p = this; if (idx >= (p->pme_count-1)) return -1; return idx+1; } int pfm_arm_event_is_valid(void *this, int pidx) { pfmlib_pmu_t *p = this; return pidx >= 0 && pidx < p->pme_count; } int pfm_arm_validate_table(void *this, FILE *fp) { pfmlib_pmu_t *pmu = this; const arm_entry_t *pe = this_pe(this); int i, error = 0; for(i=0; i < pmu->pme_count; i++) { if (!pe[i].name) { fprintf(fp, "pmu: %s event%d: :: no name (prev event was %s)\n", pmu->name, i, i > 1 ? pe[i-1].name : "??"); error++; } if (!pe[i].desc) { fprintf(fp, "pmu: %s event%d: %s :: no description\n", pmu->name, i, pe[i].name); error++; } } return error ? PFM_ERR_INVAL : PFM_SUCCESS; } int pfm_arm_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { int idx; idx = arm_attr2mod(this, pidx, attr_idx); info->name = arm_mods[idx].name; info->desc = arm_mods[idx].desc; info->type = arm_mods[idx].type; info->code = idx; info->is_dfl = 0; info->equiv = NULL; info->ctrl = PFM_ATTR_CTRL_PMU; info->idx = idx; /* namespace specific index */ info->dfl_val64 = 0; info->is_precise = 0; return PFM_SUCCESS; } unsigned int pfm_arm_get_event_nattrs(void *this, int pidx) { return arm_num_mods(this, pidx); } int pfm_arm_get_event_info(void *this, int idx, pfm_event_info_t *info) { pfmlib_pmu_t *pmu = this; const arm_entry_t *pe = this_pe(this); info->name = pe[idx].name; info->desc = pe[idx].desc; info->code = pe[idx].code; info->equiv = NULL; info->idx = idx; /* private index */ info->pmu = pmu->pmu; info->is_precise = 0; /* no attributes defined for ARM yet */ info->nattrs = 0; return PFM_SUCCESS; } papi-5.3.0/src/libpfm4/lib/pfmlib_intel_nhm_unc.c0000600003276200002170000002456512247131124021364 0ustar ralphundrgrad/* * pfmlib_intel_nhm_unc.c : Intel Nehalem/Westmere uncore PMU * * Copyright (c) 2008 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #define NHM_UNC_ATTR_E 0 #define NHM_UNC_ATTR_I 1 #define NHM_UNC_ATTR_C 2 #define NHM_UNC_ATTR_O 3 #define _NHM_UNC_ATTR_I (1 << NHM_UNC_ATTR_I) #define _NHM_UNC_ATTR_E (1 << NHM_UNC_ATTR_E) #define _NHM_UNC_ATTR_C (1 << NHM_UNC_ATTR_C) #define _NHM_UNC_ATTR_O (1 << NHM_UNC_ATTR_O) #define NHM_UNC_ATTRS \ (_NHM_UNC_ATTR_I|_NHM_UNC_ATTR_E|_NHM_UNC_ATTR_C|_NHM_UNC_ATTR_O) #define NHM_UNC_MOD_OCC_BIT 17 #define NHM_UNC_MOD_EDGE_BIT 18 #define NHM_UNC_MOD_INV_BIT 23 #define NHM_UNC_MOD_CMASK_BIT 24 #define NHM_UNC_MOD_OCC (1 << NHM_UNC_MOD_OCC_BIT) #define NHM_UNC_MOD_EDGE (1 << NHM_UNC_MOD_EDGE_BIT) #define NHM_UNC_MOD_INV (1 << NHM_UNC_MOD_INV_BIT) /* Intel Nehalem/Westmere uncore event table */ #include "events/intel_nhm_unc_events.h" #include "events/intel_wsm_unc_events.h" static const pfmlib_attr_desc_t nhm_unc_mods[]={ PFM_ATTR_B("e", "edge level"), /* edge */ PFM_ATTR_B("i", "invert"), /* invert */ PFM_ATTR_I("c", "counter-mask in range [0-255]"), /* counter-mask */ PFM_ATTR_B("o", "queue occupancy"), /* queue occupancy */ PFM_ATTR_NULL }; static int pfm_nhm_unc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 26: /* Nehalem */ case 30: case 31: break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_wsm_unc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 37: /* Westmere */ case 44: break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_nhm_unc_get_encoding(void *this, pfmlib_event_desc_t *e) { pfm_intel_x86_reg_t reg; pfm_event_attr_info_t *a; const intel_x86_entry_t *pe = this_pe(this); unsigned int grpmsk, ugrpmsk = 0; int umodmsk = 0, modmsk_r = 0; uint64_t val; uint64_t umask; unsigned int modhw = 0; int k, ret, grpid, last_grpid = -1; int grpcounts[INTEL_X86_NUM_GRP]; int ncombo[INTEL_X86_NUM_GRP]; char umask_str[PFMLIB_EVT_MAX_NAME_LEN]; memset(grpcounts, 0, sizeof(grpcounts)); memset(ncombo, 0, sizeof(ncombo)); pe = this_pe(this); umask_str[0] = e->fstr[0] = '\0'; reg.val = 0; val = pe[e->event].code; grpmsk = (1 << pe[e->event].ngrp)-1; reg.val |= val; /* preset some filters from code */ /* take into account hardcoded umask */ umask = (val >> 8) & 0xff; modmsk_r = pe[e->event].modmsk_req; for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { grpid = pe[e->event].umasks[a->idx].grpid; /* * cfor certain events groups are meant to be * exclusive, i.e., only unit masks of one group * can be used */ if (last_grpid != -1 && grpid != last_grpid && intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) { DPRINT("exclusive unit mask group error\n"); return PFM_ERR_FEATCOMB; } /* * upper layer has removed duplicates * so if we come here more than once, it is for two * disinct umasks * * NCOMBO=no combination of unit masks within the same * umask group */ ++grpcounts[grpid]; if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_NCOMBO)) ncombo[grpid] = 1; if (grpcounts[grpid] > 1 && ncombo[grpid]) { DPRINT("event does not support unit mask combination within a group\n"); return PFM_ERR_FEATCOMB; } evt_strcat(umask_str, ":%s", pe[e->event].umasks[a->idx].uname); last_grpid = grpid; modhw |= pe[e->event].umasks[a->idx].modhw; umask |= pe[e->event].umasks[a->idx].ucode >> 8; ugrpmsk |= 1 << pe[e->event].umasks[a->idx].grpid; reg.val |= umask << 8; modmsk_r |= pe[e->event].umasks[a->idx].umodmsk_req; } else if (a->type == PFM_ATTR_RAW_UMASK) { /* there can only be one RAW_UMASK per event */ /* sanity check */ if (a->idx & ~0xff) { DPRINT("raw umask is 8-bit wide\n"); return PFM_ERR_ATTR; } /* override umask */ umask = a->idx & 0xff; ugrpmsk = grpmsk; } else { uint64_t ival = e->attrs[k].ival; switch(a->idx) { case NHM_UNC_ATTR_I: /* invert */ reg.nhm_unc.usel_inv = !!ival; umodmsk |= _NHM_UNC_ATTR_I; break; case NHM_UNC_ATTR_E: /* edge */ reg.nhm_unc.usel_edge = !!ival; umodmsk |= _NHM_UNC_ATTR_E; break; case NHM_UNC_ATTR_C: /* counter-mask */ /* already forced, cannot overwrite */ if (ival > 255) return PFM_ERR_INVAL; reg.nhm_unc.usel_cnt_mask = ival; umodmsk |= _NHM_UNC_ATTR_C; break; case NHM_UNC_ATTR_O: /* occupancy */ reg.nhm_unc.usel_occ = !!ival; umodmsk |= _NHM_UNC_ATTR_O; break; } } } if ((modhw & _NHM_UNC_ATTR_I) && reg.nhm_unc.usel_inv) return PFM_ERR_ATTR_SET; if ((modhw & _NHM_UNC_ATTR_E) && reg.nhm_unc.usel_edge) return PFM_ERR_ATTR_SET; if ((modhw & _NHM_UNC_ATTR_C) && reg.nhm_unc.usel_cnt_mask) return PFM_ERR_ATTR_SET; if ((modhw & _NHM_UNC_ATTR_O) && reg.nhm_unc.usel_occ) return PFM_ERR_ATTR_SET; /* * check that there is at least of unit mask in each unit * mask group */ if ((ugrpmsk != grpmsk && !intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) || ugrpmsk == 0) { ugrpmsk ^= grpmsk; ret = pfm_intel_x86_add_defaults(this, e, ugrpmsk, &umask, -1); if (ret != PFM_SUCCESS) return ret; } if (modmsk_r && (umodmsk ^ modmsk_r)) { DPRINT("required modifiers missing: 0x%x\n", modmsk_r); return PFM_ERR_ATTR; } evt_strcat(e->fstr, "%s", pe[e->event].name); pfmlib_sort_attr(e); for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", pe[e->event].umasks[a->idx].uname); else if (a->type == PFM_ATTR_RAW_UMASK) evt_strcat(e->fstr, ":0x%x", a->idx); } reg.val |= umask << 8; reg.nhm_unc.usel_en = 1; /* force enable bit to 1 */ reg.nhm_unc.usel_int = 1; /* force APIC int to 1 */ e->codes[0] = reg.val; e->count = 1; for (k = 0; k < e->npattrs; k++) { int idx; if (e->pattrs[k].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[k].type == PFM_ATTR_UMASK) continue; idx = e->pattrs[k].idx; switch(idx) { case NHM_UNC_ATTR_E: evt_strcat(e->fstr, ":%s=%lu", nhm_unc_mods[idx].name, reg.nhm_unc.usel_edge); break; case NHM_UNC_ATTR_I: evt_strcat(e->fstr, ":%s=%lu", nhm_unc_mods[idx].name, reg.nhm_unc.usel_inv); break; case NHM_UNC_ATTR_C: evt_strcat(e->fstr, ":%s=%lu", nhm_unc_mods[idx].name, reg.nhm_unc.usel_cnt_mask); break; case NHM_UNC_ATTR_O: evt_strcat(e->fstr, ":%s=%lu", nhm_unc_mods[idx].name, reg.nhm_unc.usel_occ); break; } } __pfm_vbprintf("[UNC_PERFEVTSEL=0x%"PRIx64" event=0x%x umask=0x%x en=%d int=%d inv=%d edge=%d occ=%d cnt_msk=%d] %s\n", reg.val, reg.nhm_unc.usel_event, reg.nhm_unc.usel_umask, reg.nhm_unc.usel_en, reg.nhm_unc.usel_int, reg.nhm_unc.usel_inv, reg.nhm_unc.usel_edge, reg.nhm_unc.usel_occ, reg.nhm_unc.usel_cnt_mask, pe[e->event].name); return PFM_SUCCESS; } pfmlib_pmu_t intel_nhm_unc_support={ .desc = "Intel Nehalem uncore", .name = "nhm_unc", .perf_name = "uncore", .pmu = PFM_PMU_INTEL_NHM_UNC, .pme_count = LIBPFM_ARRAY_SIZE(intel_nhm_unc_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 8, .num_fixed_cntrs = 1, .max_encoding = 1, .pe = intel_nhm_unc_pe, .atdesc = nhm_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .pmu_detect = pfm_nhm_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_nhm_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_nhm_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; pfmlib_pmu_t intel_wsm_unc_support={ .desc = "Intel Westmere uncore", .name = "wsm_unc", .perf_name = "uncore", .pmu = PFM_PMU_INTEL_WSM_UNC, .pme_count = LIBPFM_ARRAY_SIZE(intel_wsm_unc_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 8, .num_fixed_cntrs = 1, .max_encoding = 1, .pe = intel_wsm_unc_pe, .atdesc = nhm_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK, .pmu_detect = pfm_wsm_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_nhm_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_nhm_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_fam14h.c0000600003276200002170000000530612247131124020605 0ustar ralphundrgrad/* * pfmlib_amd64_fam14h.c : AMD64 Family 14h * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_fam14h.h" #define DEFINE_FAM14H_REV(d, n, r, pmuid) \ static int \ pfm_amd64_fam14h_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ } \ pfmlib_pmu_t amd64_fam14h_##n##_support={ \ .desc = "AMD64 Fam14h "#d, \ .name = "amd64_fam14h_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_fam14h_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_FAM10H_PLM, \ .num_cntrs = 4, \ .max_encoding = 1, \ .pe = amd64_fam14h_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_fam14h_##n##_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ } DEFINE_FAM14H_REV(Bobcat, bobcat, AMD64_FAM14H_REV_B, PFM_PMU_AMD64_FAM14H_BOBCAT); papi-5.3.0/src/libpfm4/lib/pfmlib_s390x_perf_event.c0000600003276200002170000000351412247131124021634 0ustar ralphundrgrad/* * perf_event for Linux on IBM System z * * Copyright IBM Corp. 2012 * Contributed by Hendrik Brueckner * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include /* private library and arch headers */ #include "pfmlib_priv.h" #include "pfmlib_s390x_priv.h" #include "pfmlib_perf_event_priv.h" int pfm_s390x_get_perf_encoding(void *this, pfmlib_event_desc_t *e) { pfmlib_pmu_t *pmu = this; struct perf_event_attr *attr = e->os_data; int rc; if (!pmu->get_event_encoding[PFM_OS_NONE]) return PFM_ERR_NOTSUPP; /* set up raw pmu event encoding */ rc = pmu->get_event_encoding[PFM_OS_NONE](this, e); if (rc == PFM_SUCCESS) { /* currently use raw events only */ attr->type = PERF_TYPE_RAW; attr->config = e->codes[0]; } return rc; } papi-5.3.0/src/libpfm4/lib/pfmlib_power8.c0000600003276200002170000000434112247131124017754 0ustar ralphundrgrad/* * pfmlib_power8.c : IBM Power8 support * * Copyright (C) IBM Corporation, 2013. All rights reserved. * Contributed by Carl Love (carll@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/power8_events.h" static int pfm_power8_detect(void* this) { if (__is_processor(PV_POWER8)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t power8_support={ .desc = "POWER8", .name = "power8", .pmu = PFM_PMU_POWER8, .pme_count = LIBPFM_ARRAY_SIZE(power8_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 4, .num_fixed_cntrs = 2, .max_encoding = 1, .pe = power8_pe, .pmu_detect = pfm_power8_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_ubo.c0000600003276200002170000000504112247131124022542 0ustar ralphundrgrad/* * pfmlib_intel_snbep_unc_ubo.c : Intel SandyBridge-EP U-Box uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_ubo_events.h" pfmlib_pmu_t intel_snbep_unc_ubo_support = { .desc = "Intel Sandy Bridge-EP U-Box uncore", .name = "snbep_unc_ubo", .perf_name = "uncore_ubox", .pmu = PFM_PMU_INTEL_SNBEP_UNC_UBOX, .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_u_pe), .type = PFM_PMU_TYPE_UNCORE, .num_cntrs = 2, .num_fixed_cntrs = 1, .max_encoding = 1, .pe = intel_snbep_unc_u_pe, .atdesc = snbep_unc_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | PFMLIB_PMU_FL_NO_SMPL, .pmu_detect = pfm_intel_snbep_unc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_niagara.c0000600003276200002170000000615312247131124021325 0ustar ralphundrgrad/* * pfmlib_sparc_niagara.c : SPARC Niagara I, II * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Core PMU = architectural perfmon v2 + PEBS */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_sparc_priv.h" #include "events/sparc_niagara1_events.h" #include "events/sparc_niagara2_events.h" pfmlib_pmu_t sparc_niagara1_support={ .desc = "Sparc Niagara I", .name = "niagara1", .pmu = PFM_PMU_SPARC_NIAGARA1, .pme_count = LIBPFM_ARRAY_SIZE(niagara1_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .max_encoding = 2, .num_cntrs = 2, .pe = niagara1_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; pfmlib_pmu_t sparc_niagara2_support={ .desc = "Sparc Niagara II", .name = "niagara2", .pmu = PFM_PMU_SPARC_NIAGARA2, .pme_count = LIBPFM_ARRAY_SIZE(niagara2_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = NIAGARA2_PLM, .num_cntrs = 2, .max_encoding = 2, .pe = niagara2_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snb_unc.c0000600003276200002170000000600212247131124021346 0ustar ralphundrgrad/* * pfmlib_intel_snb_unc.c : Intel SandyBridge C-Box uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #define INTEL_SNB_UNC_ATTRS \ (_INTEL_X86_ATTR_I|_INTEL_X86_ATTR_E|_INTEL_X86_ATTR_C) #include "events/intel_snb_unc_events.h" static int pfm_snb_unc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch (pfm_intel_x86_cfg.model) { case 42: /* Sandy Bridge (Core i7 26xx, 25xx) */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } #define SNB_UNC_CBOX(n, p) \ pfmlib_pmu_t intel_snb_unc_cbo##n##_support={ \ .desc = "Intel Sandy Bridge C-box"#n" uncore", \ .name = "snb_unc_cbo"#n, \ .perf_name = "uncore_cbox_"#n, \ .pmu = PFM_PMU_INTEL_SNB_UNC_CB##n, \ .pme_count = LIBPFM_ARRAY_SIZE(intel_snb_unc_##p##_pe), \ .type = PFM_PMU_TYPE_UNCORE, \ .num_cntrs = 2, \ .num_fixed_cntrs = 1, \ .max_encoding = 1,\ .pe = intel_snb_unc_##p##_pe, \ .atdesc = intel_x86_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_snb_unc_detect, \ .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, \ PFMLIB_ENCODE_PERF(pfm_intel_nhm_unc_get_perf_encoding), \ PFMLIB_OS_DETECT(pfm_intel_x86_perf_detect), \ .get_event_first = pfm_intel_x86_get_event_first, \ .get_event_next = pfm_intel_x86_get_event_next, \ .event_is_valid = pfm_intel_x86_event_is_valid, \ .validate_table = pfm_intel_x86_validate_table, \ .get_event_info = pfm_intel_x86_get_event_info, \ .get_event_attr_info = pfm_intel_x86_get_event_attr_info, \ PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs),\ .get_event_nattrs = pfm_intel_x86_get_event_nattrs,\ } SNB_UNC_CBOX(0, cbo0); SNB_UNC_CBOX(1, cbo); SNB_UNC_CBOX(2, cbo); SNB_UNC_CBOX(3, cbo); papi-5.3.0/src/libpfm4/lib/pfmlib_sparc_ultra3.c0000600003276200002170000000776212247131124021144 0ustar ralphundrgrad/* * pfmlib_sparc_ultra3.c : SPARC Ultra I, II * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Core PMU = architectural perfmon v2 + PEBS */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_sparc_priv.h" #include "events/sparc_ultra3_events.h" #include "events/sparc_ultra3i_events.h" #include "events/sparc_ultra3plus_events.h" pfmlib_pmu_t sparc_ultra3_support={ .desc = "Ultra Sparc III", .name = "ultra3", .pmu = PFM_PMU_SPARC_ULTRA3, .pme_count = LIBPFM_ARRAY_SIZE(ultra3_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .max_encoding = 2, .num_cntrs = 2, .pe = ultra3_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; pfmlib_pmu_t sparc_ultra3i_support={ .desc = "Ultra Sparc IIIi", .name = "ultra3i", .pmu = PFM_PMU_SPARC_ULTRA3I, .pme_count = LIBPFM_ARRAY_SIZE(ultra3i_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .num_cntrs = 2, .max_encoding = 2, .pe = ultra3i_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; pfmlib_pmu_t sparc_ultra3plus_support={ .desc = "Ultra Sparc III+", .name = "ultra3p", .pmu = PFM_PMU_SPARC_ULTRA3PLUS, .pme_count = LIBPFM_ARRAY_SIZE(ultra3plus_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = SPARC_PLM, .max_encoding = 2, .num_cntrs = 2, .pe = ultra3plus_pe, .atdesc = NULL, .flags = 0, .pmu_detect = pfm_sparc_detect, .get_event_encoding[PFM_OS_NONE] = pfm_sparc_get_encoding, PFMLIB_ENCODE_PERF(pfm_sparc_get_perf_encoding), .get_event_first = pfm_sparc_get_event_first, .get_event_next = pfm_sparc_get_event_next, .event_is_valid = pfm_sparc_event_is_valid, .validate_table = pfm_sparc_validate_table, .get_event_info = pfm_sparc_get_event_info, .get_event_attr_info = pfm_sparc_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_sparc_perf_validate_pattrs), .get_event_nattrs = pfm_sparc_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_itanium.c0000600003276200002170000010161312247131124020176 0ustar ralphundrgrad/* * pfmlib_itanium.c : support for Itanium-family PMU * * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include /* public headers */ #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_priv_ia64.h" /* architecture private */ #include "pfmlib_itanium_priv.h" /* PMU private */ #include "itanium_events.h" /* PMU private */ #define is_ear(i) event_is_ear(itanium_pe+(i)) #define is_ear_tlb(i) event_is_tlb_ear(itanium_pe+(i)) #define is_iear(i) event_is_iear(itanium_pe+(i)) #define is_dear(i) event_is_dear(itanium_pe+(i)) #define is_btb(i) event_is_btb(itanium_pe+(i)) #define has_opcm(i) event_opcm_ok(itanium_pe+(i)) #define has_iarr(i) event_iarr_ok(itanium_pe+(i)) #define has_darr(i) event_darr_ok(itanium_pe+(i)) #define evt_use_opcm(e) ((e)->pfp_ita_pmc8.opcm_used != 0 || (e)->pfp_ita_pmc9.opcm_used !=0) #define evt_use_irange(e) ((e)->pfp_ita_irange.rr_used) #define evt_use_drange(e) ((e)->pfp_ita_drange.rr_used) #define evt_umask(e) itanium_pe[(e)].pme_umask /* let's define some handy shortcuts! */ #define pmc_plm pmc_ita_count_reg.pmc_plm #define pmc_ev pmc_ita_count_reg.pmc_ev #define pmc_oi pmc_ita_count_reg.pmc_oi #define pmc_pm pmc_ita_count_reg.pmc_pm #define pmc_es pmc_ita_count_reg.pmc_es #define pmc_umask pmc_ita_count_reg.pmc_umask #define pmc_thres pmc_ita_count_reg.pmc_thres #define pmc_ism pmc_ita_count_reg.pmc_ism /* * Description of the PMC register mappings use by * this module (as reported in pfmlib_reg_t.reg_num): * * 0 -> PMC0 * 1 -> PMC1 * n -> PMCn * * The following are in the model specific rr_br[]: * IBR0 -> 0 * IBR1 -> 1 * ... * IBR7 -> 7 * DBR0 -> 0 * DBR1 -> 1 * ... * DBR7 -> 7 * * We do not use a mapping table, instead we make up the * values on the fly given the base. */ #define PFMLIB_ITA_PMC_BASE 0 static int pfm_ita_detect(void) { int ret = PFMLIB_ERR_NOTSUPP; /* * we support all chips (there is only one!) in the Itanium family */ if (pfm_ia64_get_cpu_family() == 0x07) ret = PFMLIB_SUCCESS; return ret; } /* * Part of the following code will eventually go into a perfmon library */ static int valid_assign(unsigned int *as, pfmlib_regmask_t *r_pmcs, unsigned int cnt) { unsigned int i; for(i=0; i < cnt; i++) { if (as[i]==0) return PFMLIB_ERR_NOASSIGN; /* * take care of restricted PMC registers */ if (pfm_regmask_isset(r_pmcs, as[i])) return PFMLIB_ERR_NOASSIGN; } return PFMLIB_SUCCESS; } /* * Automatically dispatch events to corresponding counters following constraints. */ static int pfm_ita_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp) { #define has_counter(e,b) (itanium_pe[e].pme_counters & (1 << (b)) ? (b) : 0) pfmlib_ita_input_param_t *param = mod_in; pfm_ita_pmc_reg_t reg; pfmlib_event_t *e; pfmlib_reg_t *pc, *pd; pfmlib_regmask_t *r_pmcs; unsigned int i,j,k,l, m; unsigned int max_l0, max_l1, max_l2, max_l3; unsigned int assign[PMU_ITA_NUM_COUNTERS]; unsigned int cnt; e = inp->pfp_events; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; cnt = inp->pfp_event_count; r_pmcs = &inp->pfp_unavail_pmcs; if (PFMLIB_DEBUG()) { for (m=0; m < cnt; m++) { DPRINT("ev[%d]=%s counters=0x%lx\n", m, itanium_pe[e[m].event].pme_name, itanium_pe[e[m].event].pme_counters); } } if (cnt > PMU_ITA_NUM_COUNTERS) return PFMLIB_ERR_TOOMANY; max_l0 = PMU_ITA_FIRST_COUNTER + PMU_ITA_NUM_COUNTERS; max_l1 = PMU_ITA_FIRST_COUNTER + PMU_ITA_NUM_COUNTERS*(cnt>1); max_l2 = PMU_ITA_FIRST_COUNTER + PMU_ITA_NUM_COUNTERS*(cnt>2); max_l3 = PMU_ITA_FIRST_COUNTER + PMU_ITA_NUM_COUNTERS*(cnt>3); DPRINT("max_l0=%u max_l1=%u max_l2=%u max_l3=%u\n", max_l0, max_l1, max_l2, max_l3); /* * This code needs fixing. It is not very pretty and * won't handle more than 4 counters if more become * available ! * For now, worst case in the loop nest: 4! (factorial) */ for (i=PMU_ITA_FIRST_COUNTER; i < max_l0; i++) { assign[0]= has_counter(e[0].event,i); if (max_l1 == PMU_ITA_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (j=PMU_ITA_FIRST_COUNTER; j < max_l1; j++) { if (j == i) continue; assign[1] = has_counter(e[1].event,j); if (max_l2 == PMU_ITA_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (k=PMU_ITA_FIRST_COUNTER; k < max_l2; k++) { if(k == i || k == j) continue; assign[2] = has_counter(e[2].event,k); if (max_l3 == PMU_ITA_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; for (l=PMU_ITA_FIRST_COUNTER; l < max_l3; l++) { if(l == i || l == j || l == k) continue; assign[3] = has_counter(e[3].event,l); if (valid_assign(assign, r_pmcs, cnt) == PFMLIB_SUCCESS) goto done; } } } } /* we cannot satisfy the constraints */ return PFMLIB_ERR_NOASSIGN; done: for (j=0; j < cnt ; j++ ) { reg.pmc_val = 0; /* clear all */ /* if plm is 0, then assume not specified per-event and use default */ reg.pmc_plm = e[j].plm ? e[j].plm : inp->pfp_dfl_plm; reg.pmc_oi = 1; /* overflow interrupt */ reg.pmc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc_thres = param ? param->pfp_ita_counters[j].thres: 0; reg.pmc_ism = param ? param->pfp_ita_counters[j].ism : PFMLIB_ITA_ISM_BOTH; reg.pmc_umask = is_ear(e[j].event) ? 0x0 : evt_umask(e[j].event); reg.pmc_es = itanium_pe[e[j].event].pme_code; pc[j].reg_num = assign[j]; pc[j].reg_value = reg.pmc_val; pc[j].reg_addr = assign[j]; pc[j].reg_alt_addr= assign[j]; pd[j].reg_num = assign[j]; pd[j].reg_addr = assign[j]; pd[j].reg_alt_addr = assign[j]; __pfm_vbprintf("[PMC%u(pmc%u)=0x%06lx thres=%d es=0x%02x plm=%d umask=0x%x pm=%d ism=0x%x oi=%d] %s\n", assign[j], assign[j], reg.pmc_val, reg.pmc_thres, reg.pmc_es,reg.pmc_plm, reg.pmc_umask, reg.pmc_pm, reg.pmc_ism, reg.pmc_oi, itanium_pe[e[j].event].pme_name); __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[j].reg_num, pd[j].reg_num); } /* number of PMC registers programmed */ outp->pfp_pmc_count = cnt; outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } static int pfm_dispatch_iear(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_ita_pmc_reg_t reg; pfmlib_ita_input_param_t *param = mod_in; pfmlib_ita_input_param_t fake_param; pfmlib_reg_t *pc, *pd; unsigned int pos1, pos2; int iear_idx = -1; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_iear(inp->pfp_events[i].event)) iear_idx = i; } if (param == NULL || mod_in->pfp_ita_iear.ear_used == 0) { /* * case 3: no I-EAR event, no (or nothing) in param->pfp_ita2_iear.ear_used */ if (iear_idx == -1) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; pfm_ita_get_ear_mode(inp->pfp_events[iear_idx].event, ¶m->pfp_ita_iear.ear_mode); param->pfp_ita_iear.ear_umask = evt_umask(inp->pfp_events[iear_idx].event); param->pfp_ita_iear.ear_ism = PFMLIB_ITA_ISM_BOTH; /* force both instruction sets */ DPRINT("I-EAR event with no info\n"); } /* sanity check on the mode */ if (param->pfp_ita_iear.ear_mode < 0 || param->pfp_ita_iear.ear_mode > 2) return PFMLIB_ERR_INVAL; /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running I-EAR), use param info */ reg.pmc_val = 0; /* if plm is 0, then assume not specified per-event and use default */ reg.pmc10_ita_reg.iear_plm = param->pfp_ita_iear.ear_plm ? param->pfp_ita_iear.ear_plm : inp->pfp_dfl_plm; reg.pmc10_ita_reg.iear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc10_ita_reg.iear_tlb = param->pfp_ita_iear.ear_mode; reg.pmc10_ita_reg.iear_umask = param->pfp_ita_iear.ear_umask; reg.pmc10_ita_reg.iear_ism = param->pfp_ita_iear.ear_ism; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 10)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 10; /* PMC10 is I-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = 10; pc[pos1].reg_alt_addr= 10; pos1++; pd[pos2].reg_num = 0; pd[pos2].reg_addr = 0; pd[pos2].reg_alt_addr = 0; pos2++; pd[pos2].reg_num = 1; pd[pos2].reg_addr = 1; pd[pos2].reg_alt_addr = 1; pos2++; __pfm_vbprintf("[PMC10(pmc10)=0x%lx tlb=%s plm=%d pm=%d ism=0x%x umask=0x%x]\n", reg.pmc_val, reg.pmc10_ita_reg.iear_tlb ? "Yes" : "No", reg.pmc10_ita_reg.iear_plm, reg.pmc10_ita_reg.iear_pm, reg.pmc10_ita_reg.iear_ism, reg.pmc10_ita_reg.iear_umask); __pfm_vbprintf("[PMD0(pmd0)]\n[PMD1(pmd1)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_dear(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_ita_pmc_reg_t reg; pfmlib_ita_input_param_t *param = mod_in; pfmlib_ita_input_param_t fake_param; pfmlib_reg_t *pc, *pd; unsigned int pos1, pos2; int dear_idx = -1; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_dear(inp->pfp_events[i].event)) dear_idx = i; } if (param == NULL || param->pfp_ita_dear.ear_used == 0) { /* * case 3: no D-EAR event, no (or nothing) in param->pfp_ita2_dear.ear_used */ if (dear_idx == -1) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; pfm_ita_get_ear_mode(inp->pfp_events[dear_idx].event, ¶m->pfp_ita_dear.ear_mode); param->pfp_ita_dear.ear_umask = evt_umask(inp->pfp_events[dear_idx].event); param->pfp_ita_dear.ear_ism = PFMLIB_ITA_ISM_BOTH; /* force both instruction sets */ DPRINT("D-EAR event with no info\n"); } /* sanity check on the mode */ if (param->pfp_ita_dear.ear_mode > 2) return PFMLIB_ERR_INVAL; /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running D-EAR), use param info */ reg.pmc_val = 0; /* if plm is 0, then assume not specified per-event and use default */ reg.pmc11_ita_reg.dear_plm = param->pfp_ita_dear.ear_plm ? param->pfp_ita_dear.ear_plm : inp->pfp_dfl_plm; reg.pmc11_ita_reg.dear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc11_ita_reg.dear_tlb = param->pfp_ita_dear.ear_mode; reg.pmc11_ita_reg.dear_ism = param->pfp_ita_dear.ear_ism; reg.pmc11_ita_reg.dear_umask = param->pfp_ita_dear.ear_umask; reg.pmc11_ita_reg.dear_pt = param->pfp_ita_drange.rr_used ? 0: 1; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 11)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 11; /* PMC11 is D-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = 11; pos1++; pd[pos2].reg_num = 2; pd[pos2].reg_addr = 2; pd[pos2].reg_alt_addr = 2; pos2++; pd[pos2].reg_num = 3; pd[pos2].reg_addr = 3; pd[pos2].reg_alt_addr = 3; pos2++; pd[pos2].reg_num = 17; pd[pos2].reg_addr = 17; pd[pos2].reg_alt_addr = 17; pos2++; __pfm_vbprintf("[PMC11(pmc11)=0x%lx tlb=%s plm=%d pm=%d ism=0x%x umask=0x%x pt=%d]\n", reg.pmc_val, reg.pmc11_ita_reg.dear_tlb ? "Yes" : "No", reg.pmc11_ita_reg.dear_plm, reg.pmc11_ita_reg.dear_pm, reg.pmc11_ita_reg.dear_ism, reg.pmc11_ita_reg.dear_umask, reg.pmc11_ita_reg.dear_pt); __pfm_vbprintf("[PMD2(pmd2)]\n[PMD3(pmd3)\nPMD17(pmd17)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_opcm(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfmlib_ita_input_param_t *param = mod_in; pfm_ita_pmc_reg_t reg; pfmlib_reg_t *pc = outp->pfp_pmcs; int pos = outp->pfp_pmc_count; if (param == NULL) return PFMLIB_SUCCESS; if (param->pfp_ita_pmc8.opcm_used) { reg.pmc_val = param->pfp_ita_pmc8.pmc_val; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 8)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 8; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = 8; pc[pos].reg_alt_addr = 8; pos++; __pfm_vbprintf("[PMC8(pmc8)=0x%lx m=%d i=%d f=%d b=%d match=0x%x mask=0x%x]\n", reg.pmc_val, reg.pmc8_9_ita_reg.m, reg.pmc8_9_ita_reg.i, reg.pmc8_9_ita_reg.f, reg.pmc8_9_ita_reg.b, reg.pmc8_9_ita_reg.match, reg.pmc8_9_ita_reg.mask); } if (param->pfp_ita_pmc9.opcm_used) { reg.pmc_val = param->pfp_ita_pmc9.pmc_val; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 9)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 9; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = 9; pc[pos].reg_alt_addr = 9; pos++; __pfm_vbprintf("[PMC9(pmc9)=0x%lx m=%d i=%d f=%d b=%d match=0x%x mask=0x%x]\n", reg.pmc_val, reg.pmc8_9_ita_reg.m, reg.pmc8_9_ita_reg.i, reg.pmc8_9_ita_reg.f, reg.pmc8_9_ita_reg.b, reg.pmc8_9_ita_reg.match, reg.pmc8_9_ita_reg.mask); } outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int pfm_dispatch_btb(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_ita_pmc_reg_t reg; pfmlib_ita_input_param_t *param = mod_in; pfmlib_ita_input_param_t fake_param; pfmlib_reg_t *pc, *pd; int found_btb=0; unsigned int i, count; unsigned int pos1, pos2; reg.pmc_val = 0; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_btb(inp->pfp_events[i].event)) found_btb = 1; } if (param == NULL || param->pfp_ita_btb.btb_used == 0) { /* * case 3: no BTB event, no param */ if (found_btb == 0) return PFMLIB_SUCCESS; /* * case 1: BTB event, no param, capture all branches */ memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; param->pfp_ita_btb.btb_tar = 0x1; /* capture TAR */ param->pfp_ita_btb.btb_tm = 0x3; /* all branches */ param->pfp_ita_btb.btb_ptm = 0x3; /* all branches */ param->pfp_ita_btb.btb_ppm = 0x3; /* all branches */ param->pfp_ita_btb.btb_tac = 0x1; /* capture TAC */ param->pfp_ita_btb.btb_bac = 0x1; /* capture BAC */ DPRINT("BTB event with no info\n"); } /* * case 2: BTB event, param * case 4: no BTB event, param (free running mode) */ /* if plm is 0, then assume not specified per-event and use default */ reg.pmc12_ita_reg.btbc_plm = param->pfp_ita_btb.btb_plm ? param->pfp_ita_btb.btb_plm : inp->pfp_dfl_plm; reg.pmc12_ita_reg.btbc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc12_ita_reg.btbc_tar = param->pfp_ita_btb.btb_tar & 0x1; reg.pmc12_ita_reg.btbc_tm = param->pfp_ita_btb.btb_tm & 0x3; reg.pmc12_ita_reg.btbc_ptm = param->pfp_ita_btb.btb_ptm & 0x3; reg.pmc12_ita_reg.btbc_ppm = param->pfp_ita_btb.btb_ppm & 0x3; reg.pmc12_ita_reg.btbc_bpt = param->pfp_ita_btb.btb_tac & 0x1; reg.pmc12_ita_reg.btbc_bac = param->pfp_ita_btb.btb_bac & 0x1; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 12)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 12; pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_value = 12; pos1++; __pfm_vbprintf("[PMC12(pmc12)=0x%lx plm=%d pm=%d tar=%d tm=%d ptm=%d ppm=%d bpt=%d bac=%d]\n", reg.pmc_val, reg.pmc12_ita_reg.btbc_plm, reg.pmc12_ita_reg.btbc_pm, reg.pmc12_ita_reg.btbc_tar, reg.pmc12_ita_reg.btbc_tm, reg.pmc12_ita_reg.btbc_ptm, reg.pmc12_ita_reg.btbc_ppm, reg.pmc12_ita_reg.btbc_bpt, reg.pmc12_ita_reg.btbc_bac); /* * PMD16 is included in list of used PMD */ for(i=8; i < 17; i++, pos2++) { pd[pos2].reg_num = i; pd[pos2].reg_addr = i; pd[pos2].reg_alt_addr = i; __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[pos2].reg_num, pd[pos2].reg_num); } /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } /* * mode = 0 -> check code (enforce bundle alignment) * mode = 1 -> check data */ static int check_intervals(pfmlib_ita_input_rr_t *irr, int mode, int *n_intervals) { int i; pfmlib_ita_input_rr_desc_t *lim = irr->rr_limits; for(i=0; i < 4; i++) { /* end marker */ if (lim[i].rr_start == 0 && lim[i].rr_end == 0) break; /* invalid entry */ if (lim[i].rr_start >= lim[i].rr_end) return PFMLIB_ERR_IRRINVAL; if (mode == 0 && (lim[i].rr_start & 0xf || lim[i].rr_end & 0xf)) return PFMLIB_ERR_IRRALIGN; } *n_intervals = i; return PFMLIB_SUCCESS; } static void do_normal_rr(unsigned long start, unsigned long end, pfmlib_reg_t *br, int nbr, int dir, int *idx, int *reg_idx, int plm) { unsigned long size, l_addr, c; unsigned long l_offs = 0, r_offs = 0; unsigned long l_size, r_size; dbreg_t db; int p2; if (nbr < 1 || end <= start) return; size = end - start; DPRINT("start=0x%016lx end=0x%016lx size=0x%lx bytes (%lu bundles) nbr=%d dir=%d\n", start, end, size, size >> 4, nbr, dir); p2 = pfm_ia64_fls(size); c = ALIGN_DOWN(end, p2); DPRINT("largest power of two possible: 2^%d=0x%lx, crossing=0x%016lx\n", p2, 1UL << p2, c); if ((c - (1UL<= start) { l_addr = c - (1UL << p2); } else { p2--; if ((c + (1UL<>l_offs: 0x%lx\n", l_offs); } } else if (dir == 1 && r_size != 0 && nbr == 1) { p2++; l_addr = start; if (PFMLIB_DEBUG()) { r_offs = l_addr+(1UL<>r_offs: 0x%lx\n", r_offs); } } l_size = l_addr - start; r_size = end - l_addr-(1UL<>largest chunk: 2^%d @0x%016lx-0x%016lx\n", p2, l_addr, l_addr+(1UL<>before: 0x%016lx-0x%016lx\n", start, l_addr); if (r_size && !r_offs) DPRINT(">>after : 0x%016lx-0x%016lx\n", l_addr+(1UL<>1; if (nbr & 0x1) { /* * our simple heuristic is: * we assign the largest number of registers to the largest * of the two chunks */ if (l_size > r_size) { l_nbr++; } else { r_nbr++; } } do_normal_rr(start, l_addr, br, l_nbr, 0, idx, reg_idx, plm); do_normal_rr(l_addr+(1UL<rr_start, in_rr->rr_end, n_pairs); __pfm_vbprintf("start offset: -0x%lx end_offset: +0x%lx\n", out_rr->rr_soff, out_rr->rr_eoff); for (j=0; j < n_pairs; j++, base_idx += 2) { d.val = dbr[base_idx+1].reg_value; r_end = dbr[base_idx].reg_value+((~(d.db.db_mask)) & ~(0xffUL << 56)); __pfm_vbprintf("brp%u: db%u: 0x%016lx db%u: plm=0x%x mask=0x%016lx end=0x%016lx\n", dbr[base_idx].reg_num>>1, dbr[base_idx].reg_num, dbr[base_idx].reg_value, dbr[base_idx+1].reg_num, d.db.db_plm, (unsigned long) d.db.db_mask, r_end); } } static int compute_normal_rr(pfmlib_ita_input_rr_t *irr, int dfl_plm, int n, int *base_idx, pfmlib_ita_output_rr_t *orr) { pfmlib_ita_input_rr_desc_t *in_rr; pfmlib_ita_output_rr_desc_t *out_rr; unsigned long r_end; pfmlib_reg_t *br; dbreg_t d; int i, j, br_index, reg_idx, prev_index; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br; reg_idx = *base_idx; br_index = 0; for (i=0; i < n; i++, in_rr++, out_rr++) { /* * running out of registers */ if (br_index == 8) break; prev_index = br_index; do_normal_rr( in_rr->rr_start, in_rr->rr_end, br, 4 - (reg_idx>>1), /* how many pairs available */ 0, &br_index, ®_idx, in_rr->rr_plm ? in_rr->rr_plm : dfl_plm); DPRINT("br_index=%d reg_idx=%d\n", br_index, reg_idx); /* * compute offsets */ out_rr->rr_soff = out_rr->rr_eoff = 0; for(j=prev_index; j < br_index; j+=2) { d.val = br[j+1].reg_value; r_end = br[j].reg_value+((~(d.db.db_mask)+1) & ~(0xffUL << 56)); if (br[j].reg_value <= in_rr->rr_start) out_rr->rr_soff = in_rr->rr_start - br[j].reg_value; if (r_end >= in_rr->rr_end) out_rr->rr_eoff = r_end - in_rr->rr_end; } if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, prev_index, (br_index-prev_index)>>1); } /* do not have enough registers to cover all the ranges */ if (br_index == 8 && i < n) return PFMLIB_ERR_TOOMANY; orr->rr_nbr_used = br_index; return PFMLIB_SUCCESS; } static int pfm_dispatch_irange(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_ita_output_param_t *mod_out) { pfm_ita_pmc_reg_t reg; pfmlib_ita_input_param_t *param = mod_in; pfmlib_reg_t *pc = outp->pfp_pmcs; pfmlib_ita_input_rr_t *irr; pfmlib_ita_output_rr_t *orr; int pos = outp->pfp_pmc_count; int ret, base_idx = 0; int n_intervals; if (param == NULL || param->pfp_ita_irange.rr_used == 0) return PFMLIB_SUCCESS; if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_ita_irange; orr = &mod_out->pfp_ita_irange; ret = check_intervals(irr, 0, &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_IRRINVAL; DPRINT("n_intervals=%d\n", n_intervals); ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); if (ret != PFMLIB_SUCCESS) { return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_IRRTOOMANY : ret; } reg.pmc_val = 0; reg.pmc13_ita_reg.irange_ta = 0x0; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 13)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 13; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = 13; pc[pos].reg_alt_addr= 13; pos++; __pfm_vbprintf("[PMC13(pmc13)=0x%lx ta=%d]\n", reg.pmc_val, reg.pmc13_ita_reg.irange_ta); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int pfm_dispatch_drange(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_ita_output_param_t *mod_out) { pfmlib_ita_input_param_t *param = mod_in; pfmlib_event_t *e = inp->pfp_events; pfmlib_reg_t *pc = outp->pfp_pmcs; pfmlib_ita_input_rr_t *irr; pfmlib_ita_output_rr_t *orr; pfm_ita_pmc_reg_t reg; unsigned int i, count; int pos = outp->pfp_pmc_count; int ret, base_idx = 0; int n_intervals; if (param == NULL || param->pfp_ita_drange.rr_used == 0) return PFMLIB_SUCCESS; if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_ita_drange; orr = &mod_out->pfp_ita_drange; ret = check_intervals(irr, 1 , &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_DRRINVAL; DPRINT("n_intervals=%d\n", n_intervals); ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); if (ret != PFMLIB_SUCCESS) { return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_DRRTOOMANY : ret; } count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_dear(e[i].event)) return PFMLIB_SUCCESS; /* will be done there */ } reg.pmc_val = 0UL; /* * here we have no other choice but to use the default priv level as there is no * specific D-EAR event provided */ reg.pmc11_ita_reg.dear_plm = inp->pfp_dfl_plm; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 11)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 11; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = 11; pc[pos].reg_alt_addr= 11; pos++; __pfm_vbprintf("[PMC11(pmc11)=0x%lx tlb=%s plm=%d pm=%d ism=0x%x umask=0x%x pt=%d]\n", reg.pmc_val, reg.pmc11_ita_reg.dear_tlb ? "Yes" : "No", reg.pmc11_ita_reg.dear_plm, reg.pmc11_ita_reg.dear_pm, reg.pmc11_ita_reg.dear_ism, reg.pmc11_ita_reg.dear_umask, reg.pmc11_ita_reg.dear_pt); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int check_qualifier_constraints(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in) { pfmlib_event_t *e = inp->pfp_events; unsigned int i, count; count = inp->pfp_event_count; for(i=0; i < count; i++) { /* * skip check for counter which requested it. Use at your own risk. * No all counters have necessarily been validated for use with * qualifiers. Typically the event is counted as if no constraint * existed. */ if (mod_in->pfp_ita_counters[i].flags & PFMLIB_ITA_FL_EVT_NO_QUALCHECK) continue; if (evt_use_irange(mod_in) && has_iarr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_drange(mod_in) && has_darr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_opcm(mod_in) && has_opcm(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int check_range_plm(pfmlib_input_param_t *inp, pfmlib_ita_input_param_t *mod_in) { unsigned int i, count; if (mod_in->pfp_ita_drange.rr_used == 0 && mod_in->pfp_ita_irange.rr_used == 0) return PFMLIB_SUCCESS; /* * range restriction applies to all events, therefore we must have a consistent * set of plm and they must match the pfp_dfl_plm which is used to setup the debug * registers */ count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].plm && inp->pfp_events[i].plm != inp->pfp_dfl_plm) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int pfm_ita_dispatch_events(pfmlib_input_param_t *inp, void *model_in, pfmlib_output_param_t *outp, void *model_out) { int ret; pfmlib_ita_input_param_t *mod_in = (pfmlib_ita_input_param_t *)model_in; pfmlib_ita_output_param_t *mod_out = (pfmlib_ita_output_param_t *)model_out; /* * nothing will come out of this combination */ if (mod_out && mod_in == NULL) return PFMLIB_ERR_INVAL; /* check opcode match, range restriction qualifiers */ if (mod_in && check_qualifier_constraints(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; /* check for problems with raneg restriction and per-event plm */ if (mod_in && check_range_plm(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; ret = pfm_ita_dispatch_counters(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for I-EAR */ ret = pfm_dispatch_iear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for D-EAR */ ret = pfm_dispatch_dear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for Opcode matchers */ ret = pfm_dispatch_opcm(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_btb(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_irange(inp, mod_in, outp, mod_out);; if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_drange(inp, mod_in, outp, mod_out);; return ret; } /* XXX: return value is also error code */ int pfm_ita_get_event_maxincr(unsigned int i, unsigned int *maxincr) { if (i >= PME_ITA_EVENT_COUNT || maxincr == NULL) return PFMLIB_ERR_INVAL; *maxincr = itanium_pe[i].pme_maxincr; return PFMLIB_SUCCESS; } int pfm_ita_is_ear(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! is_ear(i) ? 0 : 1; } int pfm_ita_is_dear(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! is_dear(i) ? 0 : 1; } int pfm_ita_is_dear_tlb(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! (is_dear(i) && is_ear_tlb(i)) ? 0 : 1; } int pfm_ita_is_dear_cache(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! (is_dear(i) && !is_ear_tlb(i)) ? 0 : 1; } int pfm_ita_is_iear(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! is_iear(i) ? 0 : 1; } int pfm_ita_is_iear_tlb(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! (is_iear(i) && is_ear_tlb(i)) ? 0 : 1; } int pfm_ita_is_iear_cache(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! (is_iear(i) && !is_ear_tlb(i)) ? 0 : 1; } int pfm_ita_is_btb(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! is_btb(i) ? 0 : 1; } int pfm_ita_support_iarr(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! has_iarr(i) ? 0 : 1; } int pfm_ita_support_darr(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! has_darr(i) ? 0 : 1; } int pfm_ita_support_opcm(unsigned int i) { return i >= PME_ITA_EVENT_COUNT || ! has_opcm(i) ? 0 : 1; } int pfm_ita_get_ear_mode(unsigned int i, pfmlib_ita_ear_mode_t *m) { if (!is_ear(i) || m == NULL) return PFMLIB_ERR_INVAL; *m = is_ear_tlb(i) ? PFMLIB_ITA_EAR_TLB_MODE : PFMLIB_ITA_EAR_CACHE_MODE; return PFMLIB_SUCCESS; } static int pfm_ita_get_event_code(unsigned int i, unsigned int cnt, int *code) { if (cnt != PFMLIB_CNT_FIRST && (cnt < 4 || cnt > 7)) return PFMLIB_ERR_INVAL; *code = (int)itanium_pe[i].pme_code; return PFMLIB_SUCCESS; } /* * This function is accessible directly to the user */ int pfm_ita_get_event_umask(unsigned int i, unsigned long *umask) { if (i >= PME_ITA_EVENT_COUNT || umask == NULL) return PFMLIB_ERR_INVAL; *umask = evt_umask(i); return PFMLIB_SUCCESS; } static char * pfm_ita_get_event_name(unsigned int i) { return itanium_pe[i].pme_name; } static void pfm_ita_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { unsigned int i; unsigned long m; memset(counters, 0, sizeof(*counters)); m =itanium_pe[j].pme_counters; for(i=0; m ; i++, m>>=1) { if (m & 0x1) pfm_regmask_set(counters, i); } } static void pfm_ita_get_impl_pmcs(pfmlib_regmask_t *impl_pmcs) { unsigned int i = 0; /* all pmcs are contiguous */ for(i=0; i < PMU_ITA_NUM_PMCS; i++) pfm_regmask_set(impl_pmcs, i); } static void pfm_ita_get_impl_pmds(pfmlib_regmask_t *impl_pmds) { unsigned int i = 0; /* all pmds are contiguous */ for(i=0; i < PMU_ITA_NUM_PMDS; i++) pfm_regmask_set(impl_pmds, i); } static void pfm_ita_get_impl_counters(pfmlib_regmask_t *impl_counters) { unsigned int i = 0; /* counting pmds are contiguous */ for(i=4; i < 8; i++) pfm_regmask_set(impl_counters, i); } static void pfm_ita_get_hw_counter_width(unsigned int *width) { *width = PMU_ITA_COUNTER_WIDTH; } static int pfm_ita_get_cycle_event(pfmlib_event_t *e) { e->event = PME_ITA_CPU_CYCLES; return PFMLIB_SUCCESS; } static int pfm_ita_get_inst_retired(pfmlib_event_t *e) { e->event = PME_ITA_IA64_INST_RETIRED; return PFMLIB_SUCCESS; } pfm_pmu_support_t itanium_support={ .pmu_name = "itanium", .pmu_type = PFMLIB_ITANIUM_PMU, .pme_count = PME_ITA_EVENT_COUNT, .pmc_count = PMU_ITA_NUM_PMCS, .pmd_count = PMU_ITA_NUM_PMDS, .num_cnt = PMU_ITA_NUM_COUNTERS, .get_event_code = pfm_ita_get_event_code, .get_event_name = pfm_ita_get_event_name, .get_event_counters = pfm_ita_get_event_counters, .dispatch_events = pfm_ita_dispatch_events, .pmu_detect = pfm_ita_detect, .get_impl_pmcs = pfm_ita_get_impl_pmcs, .get_impl_pmds = pfm_ita_get_impl_pmds, .get_impl_counters = pfm_ita_get_impl_counters, .get_hw_counter_width = pfm_ita_get_hw_counter_width, .get_cycle_event = pfm_ita_get_cycle_event, .get_inst_retired_event = pfm_ita_get_inst_retired /* no event description available for Itanium */ }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc.c0000600003276200002170000003637112247131124021707 0ustar ralphundrgrad/* * pfmlib_intel_snbep_unc.c : Intel SandyBridge-EP uncore PMU common code * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" const pfmlib_attr_desc_t snbep_unc_mods[]={ PFM_ATTR_B("e", "edge detect"), /* edge */ PFM_ATTR_B("i", "invert"), /* invert */ PFM_ATTR_I("t", "threshold in range [0-255]"), /* threshold */ PFM_ATTR_I("t", "threshold in range [0-31]"), /* threshold */ PFM_ATTR_I("tf", "thread id filter [0-1]"), /* thread id */ PFM_ATTR_I("cf", "core id filter [0-7]"), /* core id */ PFM_ATTR_I("nf", "node id bitmask filter [0-255]"),/* nodeid mask */ PFM_ATTR_I("ff", "frequency >= 100Mhz * [0-255]"),/* freq filter */ PFM_ATTR_I("addr", "physical address matcher [40 bits]"),/* address matcher */ PFM_ATTR_NULL }; int pfm_intel_snbep_unc_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 45: /* SandyBridge-EP */ break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static void display_com(void *this, pfmlib_event_desc_t *e, void *val) { const intel_x86_entry_t *pe = this_pe(this); pfm_snbep_unc_reg_t *reg = val; __pfm_vbprintf("[UNC=0x%"PRIx64" event=0x%x umask=0x%x en=%d " "inv=%d edge=%d thres=%d] %s\n", reg->val, reg->com.unc_event, reg->com.unc_umask, reg->com.unc_en, reg->com.unc_inv, reg->com.unc_edge, reg->com.unc_thres, pe[e->event].name); } static void display_reg(void *this, pfmlib_event_desc_t *e, pfm_snbep_unc_reg_t reg) { pfmlib_pmu_t *pmu = this; if (pmu->display_reg) pmu->display_reg(this, e, ®); else display_com(this, e, ®); } static inline int is_occ_event(void *this, int idx) { pfmlib_pmu_t *pmu = this; const intel_x86_entry_t *pe = this_pe(this); return (pmu->flags & INTEL_PMU_FL_UNC_OCC) && (pe[idx].code & 0x80); } static inline int get_pcu_filt_band(void *this, pfm_snbep_unc_reg_t reg) { #define PCU_FREQ_BAND0_CODE 0xb /* event code for UNC_P_FREQ_BAND0_CYCLES */ return reg.pcu.unc_event - PCU_FREQ_BAND0_CODE; } int snbep_unc_add_defaults(void *this, pfmlib_event_desc_t *e, unsigned int msk, uint64_t *umask, pfm_snbep_unc_reg_t *filter, unsigned int max_grpid) { const intel_x86_entry_t *pe = this_pe(this); const intel_x86_entry_t *ent; unsigned int i; int j, k, added, skip; int idx; k = e->nattrs; ent = pe+e->event; for(i=0; msk; msk >>=1, i++) { if (!(msk & 0x1)) continue; added = skip = 0; for (j = 0; j < e->npattrs; j++) { if (e->pattrs[j].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[j].type != PFM_ATTR_UMASK) continue; idx = e->pattrs[j].idx; if (ent->umasks[idx].grpid != i) continue; if (max_grpid != INTEL_X86_MAX_GRPID && i > max_grpid) { skip = 1; continue; } /* umask is default for group */ if (intel_x86_uflag(this, e->event, idx, INTEL_X86_DFL)) { DPRINT("added default %s for group %d j=%d idx=%d\n", ent->umasks[idx].uname, i, j, idx); /* * default could be an alias, but * ucode must reflect actual code */ *umask |= ent->umasks[idx].ucode >> 8; filter->val |= pe[e->event].umasks[idx].ufilters[0]; e->attrs[k].id = j; /* pattrs index */ e->attrs[k].ival = 0; k++; added++; if (intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) goto done; if (intel_x86_uflag(this, e->event, idx, INTEL_X86_EXCL_GRP_GT)) { if (max_grpid != INTEL_X86_MAX_GRPID) { DPRINT("two max_grpid, old=%d new=%d\n", max_grpid, ent->umasks[idx].grpid); return PFM_ERR_UMASK; } max_grpid = ent->umasks[idx].grpid; } } } if (!added && !skip) { DPRINT("no default found for event %s unit mask group %d (max_grpid=%d)\n", ent->name, i, max_grpid); return PFM_ERR_UMASK; } } DPRINT("max_grpid=%d nattrs=%d k=%d\n", max_grpid, e->nattrs, k); done: e->nattrs = k; return PFM_SUCCESS; } /* * common encoding routine */ int pfm_intel_snbep_unc_get_encoding(void *this, pfmlib_event_desc_t *e) { const intel_x86_entry_t *pe = this_pe(this); unsigned int grpmsk, ugrpmsk = 0; unsigned int max_grpid = INTEL_X86_MAX_GRPID; unsigned int last_grpid = INTEL_X86_MAX_GRPID; int umodmsk = 0, modmsk_r = 0; int pcu_filt_band = -1; pfm_snbep_unc_reg_t reg; pfm_snbep_unc_reg_t filter; pfm_snbep_unc_reg_t addr; pfm_event_attr_info_t *a; uint64_t val, umask1, umask2; int k, ret; int has_cbo_tid = 0; unsigned int grpid; int grpcounts[INTEL_X86_NUM_GRP]; int ncombo[INTEL_X86_NUM_GRP]; char umask_str[PFMLIB_EVT_MAX_NAME_LEN]; memset(grpcounts, 0, sizeof(grpcounts)); memset(ncombo, 0, sizeof(ncombo)); filter.val = 0; addr.val = 0; pe = this_pe(this); umask_str[0] = e->fstr[0] = '\0'; reg.val = val = pe[e->event].code; /* take into account hardcoded umask */ umask1 = (val >> 8) & 0xff; umask2 = umask1; grpmsk = (1 << pe[e->event].ngrp)-1; modmsk_r = pe[e->event].modmsk_req; for(k=0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) { uint64_t um; grpid = pe[e->event].umasks[a->idx].grpid; /* * certain event groups are meant to be * exclusive, i.e., only unit masks of one group * can be used */ if (last_grpid != INTEL_X86_MAX_GRPID && grpid != last_grpid && intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) { DPRINT("exclusive unit mask group error\n"); return PFM_ERR_FEATCOMB; } /* * selecting certain umasks in a group may exclude any umasks * from any groups with a higher index * * enforcement requires looking at the grpid of all the umasks */ if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_EXCL_GRP_GT)) max_grpid = grpid; /* * certain event groups are meant to be * exclusive, i.e., only unit masks of one group * can be used */ if (last_grpid != INTEL_X86_MAX_GRPID && grpid != last_grpid && intel_x86_eflag(this, e->event, INTEL_X86_GRP_EXCL)) { DPRINT("exclusive unit mask group error\n"); return PFM_ERR_FEATCOMB; } /* * upper layer has removed duplicates * so if we come here more than once, it is for two * disinct umasks * * NCOMBO=no combination of unit masks within the same * umask group */ ++grpcounts[grpid]; /* mark that we have a umask with NCOMBO in this group */ if (intel_x86_uflag(this, e->event, a->idx, INTEL_X86_NCOMBO)) ncombo[grpid] = 1; /* * if more than one umask in this group but one is marked * with ncombo, then fail. It is okay to combine umask within * a group as long as none is tagged with NCOMBO */ if (grpcounts[grpid] > 1 && ncombo[grpid]) { DPRINT("umask %s does not support unit mask combination within group %d\n", pe[e->event].umasks[a->idx].uname, grpid); return PFM_ERR_FEATCOMB; } last_grpid = grpid; um = pe[e->event].umasks[a->idx].ucode; filter.val |= pe[e->event].umasks[a->idx].ufilters[0]; um >>= 8; umask2 |= um; ugrpmsk |= 1 << pe[e->event].umasks[a->idx].grpid; /* PCU occ event */ if (is_occ_event(this, e->event)) { reg.pcu.unc_occ = umask2 >> 6; umask2 = 0; } else reg.val |= umask2 << 8; evt_strcat(umask_str, ":%s", pe[e->event].umasks[a->idx].uname); modmsk_r |= pe[e->event].umasks[a->idx].umodmsk_req; } else if (a->type == PFM_ATTR_RAW_UMASK) { /* there can only be one RAW_UMASK per event */ /* sanity check */ if (a->idx & ~0xff) { DPRINT("raw umask is 8-bit wide\n"); return PFM_ERR_ATTR; } /* override umask */ umask2 = a->idx & 0xff; ugrpmsk = grpmsk; } else { uint64_t ival = e->attrs[k].ival; switch(a->idx) { case SNBEP_UNC_ATTR_I: /* invert */ if (is_occ_event(this, e->event)) reg.pcu.unc_occ_inv = !!ival; else reg.com.unc_inv = !!ival; umodmsk |= _SNBEP_UNC_ATTR_I; break; case SNBEP_UNC_ATTR_E: /* edge */ if (is_occ_event(this, e->event)) reg.pcu.unc_occ_edge = !!ival; else reg.com.unc_edge = !!ival; umodmsk |= _SNBEP_UNC_ATTR_E; break; case SNBEP_UNC_ATTR_T8: /* counter-mask */ /* already forced, cannot overwrite */ if (ival > 255) return PFM_ERR_ATTR_VAL; reg.com.unc_thres = ival; umodmsk |= _SNBEP_UNC_ATTR_T8; break; case SNBEP_UNC_ATTR_T5: /* pcu counter-mask */ /* already forced, cannot overwrite */ if (ival > 31) return PFM_ERR_ATTR_VAL; reg.pcu.unc_thres = ival; umodmsk |= _SNBEP_UNC_ATTR_T5; break; case SNBEP_UNC_ATTR_TF: /* thread id */ if (ival > 1) { DPRINT("invalid thread id, must be < 1"); return PFM_ERR_ATTR_VAL; } reg.cbo.unc_tid = 1; has_cbo_tid = 1; filter.cbo_filt.tid = ival; umodmsk |= _SNBEP_UNC_ATTR_TF; break; case SNBEP_UNC_ATTR_CF: /* core id */ if (ival > 7) return PFM_ERR_ATTR_VAL; reg.cbo.unc_tid = 1; filter.cbo_filt.cid = ival; has_cbo_tid = 1; umodmsk |= _SNBEP_UNC_ATTR_CF; break; case SNBEP_UNC_ATTR_NF: /* node id */ if (ival > 255 || ival == 0) { DPRINT("invalid nf, 0 < nf < 256\n"); return PFM_ERR_ATTR_VAL; } filter.cbo_filt.nid = ival; umodmsk |= _SNBEP_UNC_ATTR_NF; break; case SNBEP_UNC_ATTR_FF: /* freq band filter */ if (ival > 255) return PFM_ERR_ATTR_VAL; pcu_filt_band = get_pcu_filt_band(this, reg); filter.val = ival << (pcu_filt_band * 8); umodmsk |= _SNBEP_UNC_ATTR_FF; break; case SNBEP_UNC_ATTR_A: /* addr filter */ if (ival & ~((1ULL << 40)-1)) { DPRINT("address filter 40bits max\n"); return PFM_ERR_ATTR_VAL; } addr.ha_addr.lo_addr = ival; /* LSB 26 bits */ addr.ha_addr.hi_addr = (ival >> 26) & ((1ULL << 14)-1); umodmsk |= _SNBEP_UNC_ATTR_A; break; } } } /* * check that there is at least of unit mask in each unit mask group */ if (pe[e->event].numasks && (ugrpmsk != grpmsk || ugrpmsk == 0)) { uint64_t um = 0; ugrpmsk ^= grpmsk; ret = snbep_unc_add_defaults(this, e, ugrpmsk, &um, &filter, max_grpid); if (ret != PFM_SUCCESS) return ret; um >>= 8; umask2 = um; } /* * nf= is only required on some events in CBO */ if (!(modmsk_r & _SNBEP_UNC_ATTR_NF) && (umodmsk & _SNBEP_UNC_ATTR_NF)) { DPRINT("using nf= on an umask which does not require it\n"); return PFM_ERR_ATTR; } if (modmsk_r && !(umodmsk & modmsk_r)) { DPRINT("required modifiers missing: 0x%x\n", modmsk_r); return PFM_ERR_ATTR; } evt_strcat(e->fstr, "%s", pe[e->event].name); pfmlib_sort_attr(e); for(k = 0; k < e->nattrs; k++) { a = attr(e, k); if (a->ctrl != PFM_ATTR_CTRL_PMU) continue; if (a->type == PFM_ATTR_UMASK) evt_strcat(e->fstr, ":%s", pe[e->event].umasks[a->idx].uname); else if (a->type == PFM_ATTR_RAW_UMASK) evt_strcat(e->fstr, ":0x%x", a->idx); } e->count = 0; reg.val |= (umask1 | umask2) << 8; e->codes[e->count++] = reg.val; /* * handles C-box filter */ if (filter.val || has_cbo_tid) e->codes[e->count++] = filter.val; /* HA address matcher */ if (addr.val) e->codes[e->count++] = addr.val; for (k = 0; k < e->npattrs; k++) { int idx; if (e->pattrs[k].ctrl != PFM_ATTR_CTRL_PMU) continue; if (e->pattrs[k].type == PFM_ATTR_UMASK) continue; idx = e->pattrs[k].idx; switch(idx) { case SNBEP_UNC_ATTR_E: if (is_occ_event(this, e->event)) evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.pcu.unc_occ_edge); else evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.com.unc_edge); break; case SNBEP_UNC_ATTR_I: if (is_occ_event(this, e->event)) evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.pcu.unc_occ_inv); else evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.com.unc_inv); break; case SNBEP_UNC_ATTR_T8: evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.com.unc_thres); break; case SNBEP_UNC_ATTR_T5: evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.pcu.unc_thres); break; case SNBEP_UNC_ATTR_TF: evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, reg.cbo.unc_tid); break; case SNBEP_UNC_ATTR_FF: evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, (filter.val >> (pcu_filt_band*8)) & 0xff); break; case SNBEP_UNC_ATTR_NF: evt_strcat(e->fstr, ":%s=%lu", snbep_unc_mods[idx].name, filter.cbo_filt.nid); break; case SNBEP_UNC_ATTR_A: evt_strcat(e->fstr, ":%s=0x%lx", snbep_unc_mods[idx].name, addr.ha_addr.hi_addr << 26 | addr.ha_addr.lo_addr); break; } } display_reg(this, e, reg); return PFM_SUCCESS; } int pfm_intel_snbep_unc_can_auto_encode(void *this, int pidx, int uidx) { if (intel_x86_eflag(this, pidx, INTEL_X86_NO_AUTOENCODE)) return 0; return !intel_x86_uflag(this, pidx, uidx, INTEL_X86_NO_AUTOENCODE); } int pfm_intel_snbep_unc_get_event_attr_info(void *this, int pidx, int attr_idx, pfm_event_attr_info_t *info) { const intel_x86_entry_t *pe = this_pe(this); const pfmlib_attr_desc_t *atdesc = this_atdesc(this); int numasks, idx; numasks = intel_x86_num_umasks(this, pidx); if (attr_idx < numasks) { idx = intel_x86_attr2umask(this, pidx, attr_idx); info->name = pe[pidx].umasks[idx].uname; info->desc = pe[pidx].umasks[idx].udesc; info->equiv= pe[pidx].umasks[idx].uequiv; info->code = pe[pidx].umasks[idx].ucode; if (!intel_x86_uflag(this, pidx, idx, INTEL_X86_CODE_OVERRIDE)) info->code >>= 8; if (info->code == 0) info->code = pe[pidx].umasks[idx].ufilters[0]; info->type = PFM_ATTR_UMASK; info->is_dfl = intel_x86_uflag(this, pidx, idx, INTEL_X86_DFL); info->is_precise = intel_x86_uflag(this, pidx, idx, INTEL_X86_PEBS); } else { idx = intel_x86_attr2mod(this, pidx, attr_idx); info->name = atdesc[idx].name; info->desc = atdesc[idx].desc; info->type = atdesc[idx].type; info->equiv= NULL; info->code = idx; info->is_dfl = 0; info->is_precise = 0; } info->ctrl = PFM_ATTR_CTRL_PMU; info->idx = idx; /* namespace specific index */ info->dfl_val64 = 0; return PFM_SUCCESS; } papi-5.3.0/src/libpfm4/lib/pfmlib_perf_event_priv.h0000600003276200002170000000434312247131124021734 0ustar ralphundrgrad/* * pfmlib_perf_events_priv.h: perf_event public attributes * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __PERF_EVENT_PRIV_H__ #define __PERF_EVENT_PRIV_H__ #include "pfmlib_priv.h" #include "perfmon/pfmlib_perf_event.h" #define PERF_ATTR_U 0 /* monitor at user privilege levels */ #define PERF_ATTR_K 1 /* monitor at kernel privilege levels */ #define PERF_ATTR_H 2 /* monitor at hypervisor levels */ #define PERF_ATTR_PE 3 /* sampling period */ #define PERF_ATTR_FR 4 /* average target sampling rate */ #define PERF_ATTR_PR 5 /* precise sampling mode */ #define PERF_ATTR_EX 6 /* exclusive event */ #define PERF_ATTR_MG 7 /* monitor guest execution */ #define PERF_ATTR_MH 8 /* monitor host execution */ #define _PERF_ATTR_U (1 << PERF_ATTR_U) #define _PERF_ATTR_K (1 << PERF_ATTR_K) #define _PERF_ATTR_H (1 << PERF_ATTR_H) #define _PERF_ATTR_PE (1 << PERF_ATTR_PE) #define _PERF_ATTR_FR (1 << PERF_ATTR_FR) #define _PERF_ATTR_PR (1 << PERF_ATTR_PR) #define _PERF_ATTR_EX (1 << PERF_ATTR_EX) #define _PERF_ATTR_MG (1 << PERF_ATTR_MG) #define _PERF_ATTR_MH (1 << PERF_ATTR_MH) #define PERF_PLM_ALL (PFM_PLM0|PFM_PLM3|PFM_PLMH) #endif papi-5.3.0/src/libpfm4/lib/pfmlib_intel_snbep_unc_r3qpi.c0000600003276200002170000000523512247131124023020 0ustar ralphundrgrad/* * pfmlib_intel_snbep_r3qpi.c : Intel SandyBridge-EP R3QPI uncore PMU * * Copyright (c) 2012 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #include "pfmlib_intel_snbep_unc_priv.h" #include "events/intel_snbep_unc_r3qpi_events.h" #define DEFINE_R3QPI_BOX(n) \ pfmlib_pmu_t intel_snbep_unc_r3qpi##n##_support = {\ .desc = "Intel Sandy Bridge-EP R3QPI"#n" uncore", \ .name = "snbep_unc_r3qpi"#n,\ .perf_name = "uncore_r3qpi_"#n, \ .pmu = PFM_PMU_INTEL_SNBEP_UNC_R3QPI##n, \ .pme_count = LIBPFM_ARRAY_SIZE(intel_snbep_unc_r3_pe),\ .type = PFM_PMU_TYPE_UNCORE,\ .num_cntrs = 3,\ .num_fixed_cntrs = 0,\ .max_encoding = 1,\ .pe = intel_snbep_unc_r3_pe,\ .atdesc = snbep_unc_mods,\ .flags = PFMLIB_PMU_FL_RAW_UMASK\ | PFMLIB_PMU_FL_NO_SMPL,\ .pmu_detect = pfm_intel_snbep_unc_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_intel_snbep_unc_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_intel_snbep_unc_get_perf_encoding),\ .get_event_first = pfm_intel_x86_get_event_first,\ .get_event_next = pfm_intel_x86_get_event_next,\ .event_is_valid = pfm_intel_x86_event_is_valid,\ .validate_table = pfm_intel_x86_validate_table,\ .get_event_info = pfm_intel_x86_get_event_info,\ .get_event_attr_info = pfm_intel_x86_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_intel_snbep_unc_perf_validate_pattrs),\ .get_event_nattrs = pfm_intel_x86_get_event_nattrs,\ } DEFINE_R3QPI_BOX(0); DEFINE_R3QPI_BOX(1); papi-5.3.0/src/libpfm4/lib/pfmlib_mips_74k.c0000600003276200002170000000551412247131124020170 0ustar ralphundrgrad/* * pfmlib_mips_74k.c : support for MIPS chips * * Copyright (c) 2011 Samara Technology Group, Inc * Contributed by Philip Mucci * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ #include #include #include "pfmlib_priv.h" /* library private */ #include "pfmlib_mips_priv.h" #include "events/mips_74k_events.h" /* event tables */ /* root@redhawk_RT-N16:/proc# more cpuinfo system type : Broadcom BCM4716 chip rev 1 processor : 0 cpu model : MIPS 74K V4.0 BogoMIPS : 239.20 wait instruction : no microsecond timers : yes tlb_entries : 64 */ static int pfm_mips_detect_74k(void *this) { int ret; DPRINT("mips_detect_74k\n"); ret = pfm_mips_detect(this); if (ret != PFM_SUCCESS) return PFM_ERR_NOTSUPP; if (strstr(pfm_mips_cfg.model,"MIPS 74K")) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } /* Cortex A8 support */ pfmlib_pmu_t mips_74k_support={ .desc = "MIPS 74k", .name = "mips_74k", .pmu = PFM_PMU_MIPS_74K, .pme_count = LIBPFM_ARRAY_SIZE(mips_74k_pe), .type = PFM_PMU_TYPE_CORE, .pe = mips_74k_pe, .pmu_detect = pfm_mips_detect_74k, .max_encoding = 2, /* event encoding + counter bitmask */ .num_cntrs = 4, .get_event_encoding[PFM_OS_NONE] = pfm_mips_get_encoding, PFMLIB_ENCODE_PERF(pfm_mips_get_perf_encoding), .get_event_first = pfm_mips_get_event_first, .get_event_next = pfm_mips_get_event_next, .event_is_valid = pfm_mips_event_is_valid, .validate_table = pfm_mips_validate_table, .get_event_info = pfm_mips_get_event_info, .get_event_attr_info = pfm_mips_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_mips_perf_validate_pattrs), .get_event_nattrs = pfm_mips_get_event_nattrs, .supported_plm = PFM_PLM0|PFM_PLM3|PFM_PLMH, }; papi-5.3.0/src/libpfm4/lib/pfmlib_ppc970.c0000600003276200002170000000615712247131124017561 0ustar ralphundrgrad/* * pfmlib_ppc970.c : IBM Power 970/970mp support * * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/ppc970_events.h" #include "events/ppc970mp_events.h" static int pfm_ppc970_detect(void* this) { if (__is_processor(PV_970) || __is_processor(PV_970FX) || __is_processor(PV_970GX)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } static int pfm_ppc970mp_detect(void* this) { if (__is_processor(PV_970MP)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t ppc970_support={ .desc = "PPC970", .name = "ppc970", .pmu = PFM_PMU_PPC970, .pme_count = LIBPFM_ARRAY_SIZE(ppc970_pe), .max_encoding = 1, .pe = ppc970_pe, .pmu_detect = pfm_ppc970_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; pfmlib_pmu_t ppc970mp_support={ .desc = "PPC970MP", .name = "ppc970mp", .pmu = PFM_PMU_PPC970MP, .pme_count = LIBPFM_ARRAY_SIZE(ppc970mp_pe), .max_encoding = 1, .pe = ppc970mp_pe, .pmu_detect = pfm_ppc970mp_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_fam10h.c0000600003276200002170000000566312247131124020607 0ustar ralphundrgrad/* * pfmlib_amd64_fam10h.c : AMD64 Family 10h * * Copyright (c) 2010 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_fam10h.h" #define DEFINE_FAM10H_REV(d, n, r, pmuid) \ static int \ pfm_amd64_fam10h_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ } \ pfmlib_pmu_t amd64_fam10h_##n##_support={ \ .desc = "AMD64 Fam10h "#d, \ .name = "amd64_fam10h_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_fam10h_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_FAM10H_PLM, \ .num_cntrs = 4, \ .max_encoding = 1, \ .pe = amd64_fam10h_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_fam10h_##n##_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ .get_num_events = pfm_amd64_get_num_events, \ } DEFINE_FAM10H_REV(Barcelona, barcelona, AMD64_FAM10H_REV_B, PFM_PMU_AMD64_FAM10H_BARCELONA); DEFINE_FAM10H_REV(Shanghai, shanghai, AMD64_FAM10H_REV_C, PFM_PMU_AMD64_FAM10H_SHANGHAI); DEFINE_FAM10H_REV(Istanbul, istanbul, AMD64_FAM10H_REV_D, PFM_PMU_AMD64_FAM10H_ISTANBUL); papi-5.3.0/src/libpfm4/lib/pfmlib_power5.c0000600003276200002170000000626612247131124017761 0ustar ralphundrgrad/* * pfmlib_power5.c : IBM Power5 support * * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/power5_events.h" #include "events/power5+_events.h" static int pfm_power5_detect(void* this) { if (__is_processor(PV_POWER5)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } static int pfm_power5p_detect(void* this) { if (__is_processor(PV_POWER5p)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t power5_support={ .desc = "POWER5", .name = "power5", .pmu = PFM_PMU_POWER5, .pme_count = LIBPFM_ARRAY_SIZE(power5_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 4, .num_fixed_cntrs = 2, .max_encoding = 1, .pe = power5_pe, .pmu_detect = pfm_power5_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; pfmlib_pmu_t power5p_support={ .desc = "POWER5+", .name = "power5p", .pmu = PFM_PMU_POWER5p, .pme_count = LIBPFM_ARRAY_SIZE(power5p_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 4, .num_fixed_cntrs = 2, .max_encoding = 1, .pe = power5p_pe, .pmu_detect = pfm_power5p_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_fam15h.c0000600003276200002170000000530112247131124020601 0ustar ralphundrgrad/* * pfmlib_amd64_fam15h.c : AMD64 Family 15h * * Copyright (c) 2011 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_fam15h.h" #define DEFINE_FAM15H_REV(d, n, r, pmuid) \ static int \ pfm_amd64_fam15h_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ } \ pfmlib_pmu_t amd64_fam15h_##n##_support={ \ .desc = "AMD64 Fam15h "#d, \ .name = "amd64_fam15h_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_fam15h_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_FAM10H_PLM, \ .num_cntrs = 6, \ .max_encoding = 1, \ .pe = amd64_fam15h_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_fam15h_##n##_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ } DEFINE_FAM15H_REV(Interlagos, interlagos, 0, PFM_PMU_AMD64_FAM15H_INTERLAGOS); papi-5.3.0/src/libpfm4/lib/pfmlib_s390x_priv.h0000600003276200002170000000071212247131124020461 0ustar ralphundrgrad#ifndef __PFMLIB_S390X_PRIV_H__ #define __PFMLIB_S390X_PRIV_H__ #define CPUMF_COUNTER_MAX 160 typedef struct { uint64_t ctrnum; /* counter number */ unsigned int ctrset; /* counter set */ char *name; /* counter ID */ char *desc; /* short description */ } pme_cpumf_ctr_t; #define min(a, b) ((a) < (b) ? (a) : (b)) extern int pfm_s390x_get_perf_encoding(void *this, pfmlib_event_desc_t *e); #endif /* __PFMLIB_S390X_PRIV_H__ */ papi-5.3.0/src/libpfm4/lib/pfmlib_intel_x86_arch.c0000600003276200002170000001366512247131124021356 0ustar ralphundrgrad/* * pfmlib_intel_x86_arch.c : Intel architectural PMU v1, v2, v3 * * Copyright (c) 2005-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * * This file implements supports for the IA-32 architectural PMU as specified * in the following document: * "IA-32 Intel Architecture Software Developer's Manual - Volume 3B: System * Programming Guide" */ #include #include #include #include #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_intel_x86_priv.h" /* architecture private */ #include "events/intel_x86_arch_events.h" /* architected event table */ extern pfmlib_pmu_t intel_x86_arch_support; static intel_x86_entry_t *x86_arch_pe; /* * .byte 0x53 == push ebx. it's universal for 32 and 64 bit * .byte 0x5b == pop ebx. * Some gcc's (4.1.2 on Core2) object to pairing push/pop and ebx in 64 bit mode. * Using the opcode directly avoids this problem. */ static inline void cpuid(unsigned int op, unsigned int *a, unsigned int *b, unsigned int *c, unsigned int *d) { __asm__ __volatile__ (".byte 0x53\n\tcpuid\n\tmovl %%ebx, %%esi\n\t.byte 0x5b" : "=a" (*a), "=S" (*b), "=c" (*c), "=d" (*d) : "a" (op)); } /* * create architected event table */ static int create_arch_event_table(unsigned int mask, int version) { intel_x86_entry_t *pe; int i, num_events = 0; int m; DPRINT("version=%d evt_msk=0x%x\n", version, mask); /* * first pass: count the number of supported events */ m = mask; for(i=0; i < 7; i++, m>>=1) { if ((m & 0x1) == 0) num_events++; } intel_x86_arch_support.pme_count = num_events; pe = calloc(num_events, sizeof(intel_x86_entry_t)); if (pe == NULL) return PFM_ERR_NOTSUPP; x86_arch_pe = pe; intel_x86_arch_support.pe = pe; /* * second pass: populate the table */ m = mask; for(i=0; i < 7; i++, m>>=1) { if (!(m & 0x1)) { *pe = intel_x86_arch_pe[i]; switch(version) { case 3: pe->modmsk = INTEL_V3_ATTRS; break; default: pe->modmsk = INTEL_V2_ATTRS; break; } pe++; } } return PFM_SUCCESS; } static int check_arch_pmu(int family) { union { unsigned int val; intel_x86_pmu_eax_t eax; intel_x86_pmu_edx_t edx; } eax, ecx, edx, ebx; /* * check family number to reject for processors * older than Pentium (family=5). Those processors * did not have the CPUID instruction */ if (family < 5 || family == 15) return PFM_ERR_NOTSUPP; /* * check if CPU supports 0xa function of CPUID * 0xa started with Core Duo. Needed to detect if * architected PMU is present */ cpuid(0x0, &eax.val, &ebx.val, &ecx.val, &edx.val); if (eax.val < 0xa) return PFM_ERR_NOTSUPP; /* * extract architected PMU information */ cpuid(0xa, &eax.val, &ebx.val, &ecx.val, &edx.val); /* * version must be greater than zero */ return eax.eax.version < 1 ? PFM_ERR_NOTSUPP : PFM_SUCCESS; } static int pfm_intel_x86_arch_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; return check_arch_pmu(pfm_intel_x86_cfg.family); } static int pfm_intel_x86_arch_init(void *this) { union { unsigned int val; intel_x86_pmu_eax_t eax; intel_x86_pmu_edx_t edx; } eax, ecx, edx, ebx; /* * extract architected PMU information */ if (!pfm_cfg.forced_pmu) { cpuid(0xa, &eax.val, &ebx.val, &ecx.val, &edx.val); intel_x86_arch_support.num_cntrs = eax.eax.num_cnt; intel_x86_arch_support.num_fixed_cntrs = edx.edx.num_cnt; } else { eax.eax.version = 3; ebx.val = 0; /* no restriction */ intel_x86_arch_support.num_cntrs = 0; intel_x86_arch_support.num_fixed_cntrs = 0; } /* * must be called after impl_cntrs has been initialized */ return create_arch_event_table(ebx.val, eax.eax.version); } void pfm_intel_x86_arch_terminate(void *this) { /* workaround const void for intel_x86_arch_support.pe */ if (x86_arch_pe) free(x86_arch_pe); } /* architected PMU */ pfmlib_pmu_t intel_x86_arch_support={ .desc = "Intel X86 architectural PMU", .name = "ix86arch", .pmu = PFM_PMU_INTEL_X86_ARCH, .pme_count = 0, .pe = NULL, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | PFMLIB_PMU_FL_ARCH_DFL, .type = PFM_PMU_TYPE_CORE, .max_encoding = 1, .pmu_detect = pfm_intel_x86_arch_detect, .pmu_init = pfm_intel_x86_arch_init, .pmu_terminate = pfm_intel_x86_arch_terminate, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, }; papi-5.3.0/src/libpfm4/lib/pfmlib_montecito.c0000600003276200002170000021172612247131124020540 0ustar ralphundrgrad/* * pfmlib_montecito.c : support for the Dual-Core Itanium2 processor * * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include /* public headers */ #include /* private headers */ #include "pfmlib_priv.h" /* library private */ #include "pfmlib_priv_ia64.h" /* architecture private */ #include "pfmlib_montecito_priv.h" /* PMU private */ #include "montecito_events.h" /* PMU private */ #define is_ear(i) event_is_ear(montecito_pe+(i)) #define is_ear_tlb(i) event_is_ear_tlb(montecito_pe+(i)) #define is_ear_alat(i) event_is_ear_alat(montecito_pe+(i)) #define is_ear_cache(i) event_is_ear_cache(montecito_pe+(i)) #define is_iear(i) event_is_iear(montecito_pe+(i)) #define is_dear(i) event_is_dear(montecito_pe+(i)) #define is_etb(i) event_is_etb(montecito_pe+(i)) #define has_opcm(i) event_opcm_ok(montecito_pe+(i)) #define has_iarr(i) event_iarr_ok(montecito_pe+(i)) #define has_darr(i) event_darr_ok(montecito_pe+(i)) #define has_all(i) event_all_ok(montecito_pe+(i)) #define has_mesi(i) event_mesi_ok(montecito_pe+(i)) #define evt_use_opcm(e) ((e)->pfp_mont_opcm1.opcm_used != 0 || (e)->pfp_mont_opcm2.opcm_used !=0) #define evt_use_irange(e) ((e)->pfp_mont_irange.rr_used) #define evt_use_drange(e) ((e)->pfp_mont_drange.rr_used) #define evt_grp(e) (int)montecito_pe[e].pme_qualifiers.pme_qual.pme_group #define evt_set(e) (int)montecito_pe[e].pme_qualifiers.pme_qual.pme_set #define evt_umask(e) montecito_pe[e].pme_umask #define evt_type(e) (int)montecito_pe[e].pme_type #define evt_caf(e) (int)montecito_pe[e].pme_caf #define FINE_MODE_BOUNDARY_BITS 16 #define FINE_MODE_MASK ~((1U< PMC0 * 1 -> PMC1 * n -> PMCn * * The following are in the model specific rr_br[]: * IBR0 -> 0 * IBR1 -> 1 * ... * IBR7 -> 7 * DBR0 -> 0 * DBR1 -> 1 * ... * DBR7 -> 7 * * We do not use a mapping table, instead we make up the * values on the fly given the base. */ static int pfm_mont_detect(void) { int tmp; int ret = PFMLIB_ERR_NOTSUPP; tmp = pfm_ia64_get_cpu_family(); if (tmp == 0x20) { ret = PFMLIB_SUCCESS; } return ret; } /* * Check the event for incompatibilities. This is useful * for L1D and L2D related events. Due to wire limitations, * some caches events are separated into sets. There * are 6 sets for the L1D cache group and 8 sets for L2D group. * It is NOT possible to simultaneously measure events from * differents sets for L1D. For instance, you cannot * measure events from set0 and set1 in L1D cache group. The L2D * group allows up to two different sets to be active at the same * time. The first set is selected by the event in PMC4 and the second * set by the event in PMC6. Once the set is selected for PMC4, * the same set is locked for PMC5 and PMC8. Similarly, once the * set is selected for PMC6, the same set is locked for PMC7 and * PMC9. * * This function verifies that only one set of L1D is selected * and that no more than 2 sets are selected for L2D */ static int check_cross_groups(pfmlib_input_param_t *inp, unsigned int *l1d_event, unsigned long *l2d_set1_mask, unsigned long *l2d_set2_mask) { int g, s, s1, s2; unsigned int cnt = inp->pfp_event_count; pfmlib_event_t *e = inp->pfp_events; unsigned int i, j; unsigned long l2d_mask1 = 0, l2d_mask2 = 0; unsigned int l1d_event_idx = UNEXISTING_SET; /* * Let check the L1D constraint first * * There is no umask restriction for this group */ for (i=0; i < cnt; i++) { g = evt_grp(e[i].event); s = evt_set(e[i].event); if (g != PFMLIB_MONT_EVT_L1D_CACHE_GRP) continue; DPRINT("i=%u g=%d s=%d\n", i, g, s); l1d_event_idx = i; for (j=i+1; j < cnt; j++) { if (evt_grp(e[j].event) != g) continue; /* * if there is another event from the same group * but with a different set, then we return an error */ if (evt_set(e[j].event) != s) return PFMLIB_ERR_EVTSET; } } /* * Check that we have only up to two distinct * sets for L2D */ s1 = s2 = -1; for (i=0; i < cnt; i++) { g = evt_grp(e[i].event); if (g != PFMLIB_MONT_EVT_L2D_CACHE_GRP) continue; s = evt_set(e[i].event); /* * we have seen this set before, continue */ if (s1 == s) { l2d_mask1 |= 1UL << i; continue; } if (s2 == s) { l2d_mask2 |= 1UL << i; continue; } /* * record first of second set seen */ if (s1 == -1) { s1 = s; l2d_mask1 |= 1UL << i; } else if (s2 == -1) { s2 = s; l2d_mask2 |= 1UL << i; } else { /* * found a third set, that's not possible */ return PFMLIB_ERR_EVTSET; } } *l1d_event = l1d_event_idx; *l2d_set1_mask = l2d_mask1; *l2d_set2_mask = l2d_mask2; return PFMLIB_SUCCESS; } /* * Certain prefetch events must be treated specially when instruction range restriction * is used because they can only be constrained by IBRP1 in fine-mode. Other events * will use IBRP0 if tagged as a demand fetch OR IBPR1 if tagged as a prefetch match. * * Events which can be qualified by the two pairs depending on their tag: * - ISB_BUNPAIRS_IN * - L1I_FETCH_RAB_HIT * - L1I_FETCH_ISB_HIT * - L1I_FILLS * * This function returns the number of qualifying prefetch events found */ static int prefetch_events[]={ PME_MONT_L1I_PREFETCHES, PME_MONT_L1I_STRM_PREFETCHES, PME_MONT_L2I_PREFETCHES }; #define NPREFETCH_EVENTS sizeof(prefetch_events)/sizeof(int) static int prefetch_dual_events[]= { PME_MONT_ISB_BUNPAIRS_IN, PME_MONT_L1I_FETCH_RAB_HIT, PME_MONT_L1I_FETCH_ISB_HIT, PME_MONT_L1I_FILLS }; #define NPREFETCH_DUAL_EVENTS sizeof(prefetch_dual_events)/sizeof(int) /* * prefetch events must use IBRP1, unless they are dual and the user specified * PFMLIB_MONT_IRR_DEMAND_FETCH in rr_flags */ static int check_prefetch_events(pfmlib_input_param_t *inp, pfmlib_mont_input_rr_t *irr, unsigned int *count, int *base_idx, int *dup) { int code; int prefetch_codes[NPREFETCH_EVENTS]; int prefetch_dual_codes[NPREFETCH_DUAL_EVENTS]; unsigned int i, j; int c, flags; int found = 0, found_ibrp0 = 0, found_ibrp1 = 0; flags = irr->rr_flags & (PFMLIB_MONT_IRR_DEMAND_FETCH|PFMLIB_MONT_IRR_PREFETCH_MATCH); for(i=0; i < NPREFETCH_EVENTS; i++) { pfm_get_event_code(prefetch_events[i], &code); prefetch_codes[i] = code; } for(i=0; i < NPREFETCH_DUAL_EVENTS; i++) { pfm_get_event_code(prefetch_dual_events[i], &code); prefetch_dual_codes[i] = code; } for(i=0; i < inp->pfp_event_count; i++) { pfm_get_event_code(inp->pfp_events[i].event, &c); for(j=0; j < NPREFETCH_EVENTS; j++) { if (c == prefetch_codes[j]) { found++; found_ibrp1++; } } /* * for the dual events, users must specify one or both of the * PFMLIB_MONT_IRR_DEMAND_FETCH or PFMLIB_MONT_IRR_PREFETCH_MATCH */ for(j=0; j < NPREFETCH_DUAL_EVENTS; j++) { if (c == prefetch_dual_codes[j]) { found++; if (flags == 0) return PFMLIB_ERR_IRRFLAGS; if (flags & PFMLIB_MONT_IRR_DEMAND_FETCH) found_ibrp0++; if (flags & PFMLIB_MONT_IRR_PREFETCH_MATCH) found_ibrp1++; } } } *count = found; *dup = 0; /* * if both found_ibrp0 and found_ibrp1 > 0, then we need to duplicate * the range in ibrp0 to ibrp1. */ if (found) { *base_idx = found_ibrp0 ? 0 : 2; if (found_ibrp1 && found_ibrp0) *dup = 1; } return 0; } /* * look for CPU_OP_CYCLES_QUAL * Return: * 1 if found * 0 otherwise */ static int has_cpu_cycles_qual(pfmlib_input_param_t *inp) { unsigned int i; int code, c; pfm_get_event_code(PME_MONT_CPU_OP_CYCLES_QUAL, &code); for(i=0; i < inp->pfp_event_count; i++) { pfm_get_event_code(inp->pfp_events[i].event, &c); if (c == code) return 1; } return 0; } /* * IA64_INST_RETIRED (and subevents) is the only event which can be measured on all * 4 IBR when non-fine mode is not possible. * * This function returns: * - the number of events match the IA64_INST_RETIRED code * - in retired_mask to bottom 4 bits indicates which of the 4 INST_RETIRED event * is present */ static unsigned int check_inst_retired_events(pfmlib_input_param_t *inp, unsigned long *retired_mask) { int code; int c; unsigned int i, count, found = 0; unsigned long umask, mask; pfm_get_event_code(PME_MONT_IA64_INST_RETIRED, &code); count = inp->pfp_event_count; mask = 0; for(i=0; i < count; i++) { pfm_get_event_code(inp->pfp_events[i].event, &c); if (c == code) { pfm_mont_get_event_umask(inp->pfp_events[i].event, &umask); switch(umask) { case 0: mask |= 1; break; case 1: mask |= 2; break; case 2: mask |= 4; break; case 3: mask |= 8; break; } found++; } } if (retired_mask) *retired_mask = mask; return found; } static int check_fine_mode_possible(pfmlib_mont_input_rr_t *rr, int n) { pfmlib_mont_input_rr_desc_t *lim = rr->rr_limits; int i; for(i=0; i < n; i++) { if ((lim[i].rr_start & FINE_MODE_MASK) != (lim[i].rr_end & FINE_MODE_MASK)) return 0; } return 1; } /* * mode = 0 -> check code (enforce bundle alignment) * mode = 1 -> check data */ static int check_intervals(pfmlib_mont_input_rr_t *irr, int mode, unsigned int *n_intervals) { unsigned int i; pfmlib_mont_input_rr_desc_t *lim = irr->rr_limits; for(i=0; i < 4; i++) { /* end marker */ if (lim[i].rr_start == 0 && lim[i].rr_end == 0) break; /* invalid entry */ if (lim[i].rr_start >= lim[i].rr_end) return PFMLIB_ERR_IRRINVAL; if (mode == 0 && (lim[i].rr_start & 0xf || lim[i].rr_end & 0xf)) return PFMLIB_ERR_IRRALIGN; } *n_intervals = i; return PFMLIB_SUCCESS; } /* * It is not possible to measure more than one of the * L2D_OZQ_CANCELS0, L2D_OZQ_CANCELS1 at the same time. */ static int cancel_events[]= { PME_MONT_L2D_OZQ_CANCELS0_ACQ, PME_MONT_L2D_OZQ_CANCELS1_ANY }; #define NCANCEL_EVENTS sizeof(cancel_events)/sizeof(int) static int check_cancel_events(pfmlib_input_param_t *inp) { unsigned int i, j, count; int code; int cancel_codes[NCANCEL_EVENTS]; int idx = -1; for(i=0; i < NCANCEL_EVENTS; i++) { pfm_get_event_code(cancel_events[i], &code); cancel_codes[i] = code; } count = inp->pfp_event_count; for(i=0; i < count; i++) { for (j=0; j < NCANCEL_EVENTS; j++) { pfm_get_event_code(inp->pfp_events[i].event, &code); if (code == cancel_codes[j]) { if (idx != -1) { return PFMLIB_ERR_INVAL; } idx = inp->pfp_events[i].event; } } } return PFMLIB_SUCCESS; } /* * Automatically dispatch events to corresponding counters following constraints. */ static unsigned int l2d_set1_cnts[]={ 4, 5, 8 }; static unsigned int l2d_set2_cnts[]={ 6, 7, 9 }; static int pfm_mont_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfmlib_mont_input_param_t *param = mod_in; pfm_mont_pmc_reg_t reg; pfmlib_event_t *e; pfmlib_reg_t *pc, *pd; pfmlib_regmask_t avail_cntrs, impl_cntrs; unsigned int i,j, k, max_cnt; unsigned int assign[PMU_MONT_NUM_COUNTERS]; unsigned int m, cnt; unsigned int l1d_set; unsigned long l2d_set1_mask, l2d_set2_mask, evt_mask, mesi; unsigned long not_assigned_events, cnt_mask; int l2d_set1_p, l2d_set2_p; int ret; e = inp->pfp_events; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; cnt = inp->pfp_event_count; if (PFMLIB_DEBUG()) for (m=0; m < cnt; m++) { DPRINT("ev[%d]=%s counters=0x%lx\n", m, montecito_pe[e[m].event].pme_name, montecito_pe[e[m].event].pme_counters); } if (cnt > PMU_MONT_NUM_COUNTERS) return PFMLIB_ERR_TOOMANY; l1d_set = UNEXISTING_SET; ret = check_cross_groups(inp, &l1d_set, &l2d_set1_mask, &l2d_set2_mask); if (ret != PFMLIB_SUCCESS) return ret; ret = check_cancel_events(inp); if (ret != PFMLIB_SUCCESS) return ret; /* * at this point, we know that: * - we have at most 1 L1D set * - we have at most 2 L2D sets * - cancel events are compatible */ DPRINT("l1d_set=%u l2d_set1_mask=0x%lx l2d_set2_mask=0x%lx\n", l1d_set, l2d_set1_mask, l2d_set2_mask); /* * first, place L1D cache event in PMC5 * * this is the strongest constraint */ pfm_get_impl_counters(&impl_cntrs); pfm_regmask_andnot(&avail_cntrs, &impl_cntrs, &inp->pfp_unavail_pmcs); not_assigned_events = 0; DPRINT("avail_cntrs=0x%lx\n", avail_cntrs.bits[0]); /* * we do not check ALL_THRD here because at least * one event has to be in PMC5 for this group */ if (l1d_set != UNEXISTING_SET) { if (!pfm_regmask_isset(&avail_cntrs, 5)) return PFMLIB_ERR_NOASSIGN; assign[l1d_set] = 5; pfm_regmask_clr(&avail_cntrs, 5); } l2d_set1_p = l2d_set2_p = 0; /* * assign L2D set1 and set2 counters */ for (i=0; i < cnt ; i++) { evt_mask = 1UL << i; /* * place l2d set1 events. First 3 go to designated * counters, the rest is placed elsewhere in the final * pass */ if (l2d_set1_p < 3 && (l2d_set1_mask & evt_mask)) { assign[i] = l2d_set1_cnts[l2d_set1_p]; if (!pfm_regmask_isset(&avail_cntrs, assign[i])) return PFMLIB_ERR_NOASSIGN; pfm_regmask_clr(&avail_cntrs, assign[i]); l2d_set1_p++; continue; } /* * same as above but for l2d set2 */ if (l2d_set2_p < 3 && (l2d_set2_mask & evt_mask)) { assign[i] = l2d_set2_cnts[l2d_set2_p]; if (!pfm_regmask_isset(&avail_cntrs, assign[i])) return PFMLIB_ERR_NOASSIGN; pfm_regmask_clr(&avail_cntrs, assign[i]); l2d_set2_p++; continue; } /* * if not l2d nor l1d, then defer placement until final pass */ if (i != l1d_set) not_assigned_events |= evt_mask; DPRINT("phase 1: i=%u avail_cntrs=0x%lx l2d_set1_p=%d l2d_set2_p=%d not_assigned=0x%lx\n", i, avail_cntrs.bits[0], l2d_set1_p, l2d_set2_p, not_assigned_events); } /* * assign BUS_* ER_* events (work only in PMC4-PMC9) */ evt_mask = not_assigned_events; for (i=0; evt_mask ; i++, evt_mask >>=1) { if ((evt_mask & 0x1) == 0) continue; cnt_mask = montecito_pe[e[i].event].pme_counters; /* * only interested in events with restricted set of counters */ if (cnt_mask == 0xfff0) continue; for(j=0; cnt_mask; j++, cnt_mask >>=1) { if ((cnt_mask & 0x1) == 0) continue; DPRINT("phase 2: i=%d j=%d cnt_mask=0x%lx avail_cntrs=0x%lx not_assigned_evnts=0x%lx\n", i, j, cnt_mask, avail_cntrs.bits[0], not_assigned_events); if (!pfm_regmask_isset(&avail_cntrs, j)) continue; assign[i] = j; not_assigned_events &= ~(1UL << i); pfm_regmask_clr(&avail_cntrs, j); break; } if (cnt_mask == 0) return PFMLIB_ERR_NOASSIGN; } /* * assign the rest of the events (no constraints) */ evt_mask = not_assigned_events; max_cnt = PMU_MONT_FIRST_COUNTER + PMU_MONT_NUM_COUNTERS; for (i=0, j=0; evt_mask ; i++, evt_mask >>=1) { DPRINT("phase 3a: i=%d j=%d evt_mask=0x%lx avail_cntrs=0x%lx not_assigned_evnts=0x%lx\n", i, j, evt_mask, avail_cntrs.bits[0], not_assigned_events); if ((evt_mask & 0x1) == 0) continue; while(j < max_cnt && !pfm_regmask_isset(&avail_cntrs, j)) { DPRINT("phase 3: i=%d j=%d evt_mask=0x%lx avail_cntrs=0x%lx not_assigned_evnts=0x%lx\n", i, j, evt_mask, avail_cntrs.bits[0], not_assigned_events); j++; } if (j == max_cnt) return PFMLIB_ERR_NOASSIGN; assign[i] = j; j++; } for (j=0; j < cnt ; j++ ) { mesi = 0; /* * XXX: we do not support .all placement just yet */ if (param && param->pfp_mont_counters[j].flags & PFMLIB_MONT_FL_EVT_ALL_THRD) { DPRINT(".all mode is not yet supported by libpfm\n"); return PFMLIB_ERR_NOTSUPP; } if (has_mesi(e[j].event)) { for(k=0;k< e[j].num_masks; k++) { mesi |= 1UL << e[j].unit_masks[k]; } /* by default we capture everything */ if (mesi == 0) mesi = 0xf; } reg.pmc_val = 0; /* clear all, bits 26-27 must be zero for proper operations */ /* if plm is 0, then assume not specified per-event and use default */ reg.pmc_plm = inp->pfp_events[j].plm ? inp->pfp_events[j].plm : inp->pfp_dfl_plm; reg.pmc_oi = 0; /* let the user/OS deal with this field */ reg.pmc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc_thres = param ? param->pfp_mont_counters[j].thres: 0; reg.pmc_ism = 0x2; /* force IA-64 mode */ reg.pmc_umask = is_ear(e[j].event) ? 0x0 : montecito_pe[e[j].event].pme_umask; reg.pmc_es = montecito_pe[e[j].event].pme_code; reg.pmc_all = 0; /* XXX force self for now */ reg.pmc_m = (mesi>>3) & 0x1; reg.pmc_e = (mesi>>2) & 0x1; reg.pmc_s = (mesi>>1) & 0x1; reg.pmc_i = mesi & 0x1; /* * Note that we don't force PMC4.pmc_ena = 1 because the kernel takes care of this for us. * This way we don't have to program something in PMC4 even when we don't use it */ pc[j].reg_num = assign[j]; pc[j].reg_value = reg.pmc_val; pc[j].reg_addr = pc[j].reg_alt_addr = assign[j]; pd[j].reg_num = assign[j]; pd[j].reg_addr = pd[j].reg_alt_addr = assign[j]; __pfm_vbprintf("[PMC%u(pmc%u)=0x%06lx m=%d e=%d s=%d i=%d thres=%d all=%d es=0x%02x plm=%d umask=0x%x pm=%d ism=0x%x oi=%d] %s\n", assign[j], assign[j], reg.pmc_val, reg.pmc_m, reg.pmc_e, reg.pmc_s, reg.pmc_i, reg.pmc_thres, reg.pmc_all, reg.pmc_es,reg.pmc_plm, reg.pmc_umask, reg.pmc_pm, reg.pmc_ism, reg.pmc_oi, montecito_pe[e[j].event].pme_name); __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[j].reg_num, pd[j].reg_num); } /* number of PMC registers programmed */ outp->pfp_pmc_count = cnt; outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } static int pfm_dispatch_iear(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_mont_pmc_reg_t reg; pfmlib_mont_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_mont_input_param_t fake_param; unsigned int pos1, pos2; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_iear(inp->pfp_events[i].event)) break; } if (param == NULL || param->pfp_mont_iear.ear_used == 0) { /* * case 3: no I-EAR event, no (or nothing) in param->pfp_mont_iear.ear_used */ if (i == count) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; /* * case 1: extract all information for event (name) */ pfm_mont_get_ear_mode(inp->pfp_events[i].event, ¶m->pfp_mont_iear.ear_mode); param->pfp_mont_iear.ear_umask = evt_umask(inp->pfp_events[i].event); DPRINT("I-EAR event with no info\n"); } /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running I-EAR), use param info */ reg.pmc_val = 0; if (param->pfp_mont_iear.ear_mode == PFMLIB_MONT_EAR_TLB_MODE) { /* if plm is 0, then assume not specified per-event and use default */ reg.pmc37_mont_tlb_reg.iear_plm = param->pfp_mont_iear.ear_plm ? param->pfp_mont_iear.ear_plm : inp->pfp_dfl_plm; reg.pmc37_mont_tlb_reg.iear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc37_mont_tlb_reg.iear_ct = 0x0; reg.pmc37_mont_tlb_reg.iear_umask = param->pfp_mont_iear.ear_umask; } else if (param->pfp_mont_iear.ear_mode == PFMLIB_MONT_EAR_CACHE_MODE) { /* if plm is 0, then assume not specified per-event and use default */ reg.pmc37_mont_cache_reg.iear_plm = param->pfp_mont_iear.ear_plm ? param->pfp_mont_iear.ear_plm : inp->pfp_dfl_plm; reg.pmc37_mont_cache_reg.iear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc37_mont_cache_reg.iear_ct = 0x1; reg.pmc37_mont_cache_reg.iear_umask = param->pfp_mont_iear.ear_umask; } else { DPRINT("ALAT mode not supported in I-EAR mode\n"); return PFMLIB_ERR_INVAL; } if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 37)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 37; /* PMC37 is I-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_addr = 37; pos1++; pd[pos2].reg_num = 34; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 34; pos2++; pd[pos2].reg_num = 35; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 35; pos2++; if (param->pfp_mont_iear.ear_mode == PFMLIB_MONT_EAR_TLB_MODE) { __pfm_vbprintf("[PMC37(pmc37)=0x%lx ctb=tlb plm=%d pm=%d umask=0x%x]\n", reg.pmc_val, reg.pmc37_mont_tlb_reg.iear_plm, reg.pmc37_mont_tlb_reg.iear_pm, reg.pmc37_mont_tlb_reg.iear_umask); } else { __pfm_vbprintf("[PMC37(pmc37)=0x%lx ctb=cache plm=%d pm=%d umask=0x%x]\n", reg.pmc_val, reg.pmc37_mont_cache_reg.iear_plm, reg.pmc37_mont_cache_reg.iear_pm, reg.pmc37_mont_cache_reg.iear_umask); } __pfm_vbprintf("[PMD34(pmd34)]\n[PMD35(pmd35)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_dear(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_mont_pmc_reg_t reg; pfmlib_mont_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_mont_input_param_t fake_param; unsigned int pos1, pos2; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_dear(inp->pfp_events[i].event)) break; } if (param == NULL || param->pfp_mont_dear.ear_used == 0) { /* * case 3: no D-EAR event, no (or nothing) in param->pfp_mont_dear.ear_used */ if (i == count) return PFMLIB_SUCCESS; memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; /* * case 1: extract all information for event (name) */ pfm_mont_get_ear_mode(inp->pfp_events[i].event, ¶m->pfp_mont_dear.ear_mode); param->pfp_mont_dear.ear_umask = evt_umask(inp->pfp_events[i].event); DPRINT("D-EAR event with no info\n"); } /* sanity check on the mode */ if ( param->pfp_mont_dear.ear_mode != PFMLIB_MONT_EAR_CACHE_MODE && param->pfp_mont_dear.ear_mode != PFMLIB_MONT_EAR_TLB_MODE && param->pfp_mont_dear.ear_mode != PFMLIB_MONT_EAR_ALAT_MODE) return PFMLIB_ERR_INVAL; /* * case 2: ear_used=1, event is defined, we use the param info as it is more precise * case 4: ear_used=1, no event (free running D-EAR), use param info */ reg.pmc_val = 0; /* if plm is 0, then assume not specified per-event and use default */ reg.pmc40_mont_reg.dear_plm = param->pfp_mont_dear.ear_plm ? param->pfp_mont_dear.ear_plm : inp->pfp_dfl_plm; reg.pmc40_mont_reg.dear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc40_mont_reg.dear_mode = param->pfp_mont_dear.ear_mode; reg.pmc40_mont_reg.dear_umask = param->pfp_mont_dear.ear_umask; reg.pmc40_mont_reg.dear_ism = 0x2; /* force IA-64 mode */ if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 40)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 40; /* PMC11 is D-EAR config register */ pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 40; pos1++; pd[pos2].reg_num = 32; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 32; pos2++; pd[pos2].reg_num = 33; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 33; pos2++; pd[pos2].reg_num = 36; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 36; pos2++; __pfm_vbprintf("[PMC40(pmc40)=0x%lx mode=%s plm=%d pm=%d ism=0x%x umask=0x%x]\n", reg.pmc_val, reg.pmc40_mont_reg.dear_mode == 0 ? "L1D" : (reg.pmc40_mont_reg.dear_mode == 1 ? "L1DTLB" : "ALAT"), reg.pmc40_mont_reg.dear_plm, reg.pmc40_mont_reg.dear_pm, reg.pmc40_mont_reg.dear_ism, reg.pmc40_mont_reg.dear_umask); __pfm_vbprintf("[PMD32(pmd32)]\n[PMD33(pmd33)\nPMD36(pmd36)\n"); /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_dispatch_opcm(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out) { pfmlib_mont_input_param_t *param = mod_in; pfmlib_reg_t *pc = outp->pfp_pmcs; pfm_mont_pmc_reg_t reg1, reg2, pmc36; unsigned int i, has_1st_pair, has_2nd_pair, count; unsigned int pos = outp->pfp_pmc_count; int used_pmc32, used_pmc34; if (param == NULL) return PFMLIB_SUCCESS; #define PMC36_DFL_VAL 0xfffffff0 /* * mandatory default value for PMC36 as described in the documentation * all monitoring is opcode constrained. Better make sure the match/mask * is set to match everything! It looks weird for the default value! */ pmc36.pmc_val = PMC36_DFL_VAL; reg1.pmc_val = 0x030f01ffffffffff; reg2.pmc_val = 0; used_pmc32 = param->pfp_mont_opcm1.opcm_used; used_pmc34 = param->pfp_mont_opcm2.opcm_used; /* * check in any feature is used. * PMC36 must be setup when opcode matching is used OR when code range restriction is used */ if (used_pmc32 == 0 && used_pmc34 == 0 && param->pfp_mont_irange.rr_used == 0) return 0; /* * check for rr_nbr_used to make sure that the range request produced something on output */ if (used_pmc32 || (param->pfp_mont_irange.rr_used && mod_out->pfp_mont_irange.rr_nbr_used) ) { /* * if not used, ignore all bits */ if (used_pmc32) { reg1.pmc32_34_mont_reg.opcm_mask = param->pfp_mont_opcm1.opcm_mask; reg1.pmc32_34_mont_reg.opcm_b = param->pfp_mont_opcm1.opcm_b; reg1.pmc32_34_mont_reg.opcm_f = param->pfp_mont_opcm1.opcm_f; reg1.pmc32_34_mont_reg.opcm_i = param->pfp_mont_opcm1.opcm_i; reg1.pmc32_34_mont_reg.opcm_m = param->pfp_mont_opcm1.opcm_m; reg2.pmc33_35_mont_reg.opcm_match = param->pfp_mont_opcm1.opcm_match; } if (param->pfp_mont_irange.rr_used) { reg1.pmc32_34_mont_reg.opcm_ig_ad = 0; reg1.pmc32_34_mont_reg.opcm_inv = param->pfp_mont_irange.rr_flags & PFMLIB_MONT_RR_INV ? 1 : 0; } else { /* clear range restriction fields when none is used */ reg1.pmc32_34_mont_reg.opcm_ig_ad = 1; reg1.pmc32_34_mont_reg.opcm_inv = 0; } if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 32)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 32; pc[pos].reg_value = reg1.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 32; pos++; /* * will be constrained by PMC32 */ if (used_pmc32) { if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 33)) return PFMLIB_ERR_NOASSIGN; /* * used pmc33 only when we have active opcode matching */ pc[pos].reg_num = 33; pc[pos].reg_value = reg2.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 33; pos++; has_1st_pair = has_2nd_pair = 0; count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].event == PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP0_PMC32_33) has_1st_pair=1; if (inp->pfp_events[i].event == PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP2_PMC32_33) has_2nd_pair=1; } if (has_1st_pair || has_2nd_pair == 0) pmc36.pmc36_mont_reg.opcm_ch0_ig_opcm = 0; if (has_2nd_pair || has_1st_pair == 0) pmc36.pmc36_mont_reg.opcm_ch2_ig_opcm = 0; } __pfm_vbprintf("[PMC32(pmc32)=0x%lx m=%d i=%d f=%d b=%d mask=0x%lx inv=%d ig_ad=%d]\n", reg1.pmc_val, reg1.pmc32_34_mont_reg.opcm_m, reg1.pmc32_34_mont_reg.opcm_i, reg1.pmc32_34_mont_reg.opcm_f, reg1.pmc32_34_mont_reg.opcm_b, reg1.pmc32_34_mont_reg.opcm_mask, reg1.pmc32_34_mont_reg.opcm_inv, reg1.pmc32_34_mont_reg.opcm_ig_ad); if (used_pmc32) __pfm_vbprintf("[PMC33(pmc33)=0x%lx match=0x%lx]\n", reg2.pmc_val, reg2.pmc33_35_mont_reg.opcm_match); } /* * will be constrained by PMC34 */ if (used_pmc34) { reg1.pmc_val = 0x01ffffffffff; /* pmc34 default value */ reg2.pmc_val = 0; reg1.pmc32_34_mont_reg.opcm_mask = param->pfp_mont_opcm2.opcm_mask; reg1.pmc32_34_mont_reg.opcm_b = param->pfp_mont_opcm2.opcm_b; reg1.pmc32_34_mont_reg.opcm_f = param->pfp_mont_opcm2.opcm_f; reg1.pmc32_34_mont_reg.opcm_i = param->pfp_mont_opcm2.opcm_i; reg1.pmc32_34_mont_reg.opcm_m = param->pfp_mont_opcm2.opcm_m; reg2.pmc33_35_mont_reg.opcm_match = param->pfp_mont_opcm2.opcm_match; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 34)) return PFMLIB_ERR_NOASSIGN; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 35)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 34; pc[pos].reg_value = reg1.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 34; pos++; pc[pos].reg_num = 35; pc[pos].reg_value = reg2.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 35; pos++; has_1st_pair = has_2nd_pair = 0; count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].event == PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP1_PMC34_35) has_1st_pair=1; if (inp->pfp_events[i].event == PME_MONT_IA64_TAGGED_INST_RETIRED_IBRP3_PMC34_35) has_2nd_pair=1; } if (has_1st_pair || has_2nd_pair == 0) pmc36.pmc36_mont_reg.opcm_ch1_ig_opcm = 0; if (has_2nd_pair || has_1st_pair == 0) pmc36.pmc36_mont_reg.opcm_ch3_ig_opcm = 0; __pfm_vbprintf("[PMC34(pmc34)=0x%lx m=%d i=%d f=%d b=%d mask=0x%lx]\n", reg1.pmc_val, reg1.pmc32_34_mont_reg.opcm_m, reg1.pmc32_34_mont_reg.opcm_i, reg1.pmc32_34_mont_reg.opcm_f, reg1.pmc32_34_mont_reg.opcm_b, reg1.pmc32_34_mont_reg.opcm_mask); __pfm_vbprintf("[PMC35(pmc35)=0x%lx match=0x%lx]\n", reg2.pmc_val, reg2.pmc33_35_mont_reg.opcm_match); } if (pmc36.pmc_val != PMC36_DFL_VAL) { if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 36)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 36; pc[pos].reg_value = pmc36.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 36; pos++; __pfm_vbprintf("[PMC36(pmc36)=0x%lx ch0_ig_op=%d ch1_ig_op=%d ch2_ig_op=%d ch3_ig_op=%d]\n", pmc36.pmc_val, pmc36.pmc36_mont_reg.opcm_ch0_ig_opcm, pmc36.pmc36_mont_reg.opcm_ch1_ig_opcm, pmc36.pmc36_mont_reg.opcm_ch2_ig_opcm, pmc36.pmc36_mont_reg.opcm_ch3_ig_opcm); } outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int pfm_dispatch_etb(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfmlib_event_t *e= inp->pfp_events; pfm_mont_pmc_reg_t reg; pfmlib_mont_input_param_t *param = mod_in; pfmlib_reg_t *pc, *pd; pfmlib_mont_input_param_t fake_param; int found_etb = 0, found_bad_dear = 0; int has_etb_param; unsigned int i, pos1, pos2; unsigned int count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; /* * explicit ETB settings */ has_etb_param = param && param->pfp_mont_etb.etb_used; reg.pmc_val = 0UL; /* * we need to scan all events looking for DEAR ALAT/TLB due to incompatibility. * In this case PMC39 must be forced to zero */ count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_etb(e[i].event)) found_etb = 1; /* * keep track of the first ETB event */ /* look only for DEAR TLB */ if (is_dear(e[i].event) && (is_ear_tlb(e[i].event) || is_ear_alat(e[i].event))) { found_bad_dear = 1; } } DPRINT("found_etb=%d found_bar_dear=%d\n", found_etb, found_bad_dear); /* * did not find D-EAR TLB/ALAT event, need to check param structure */ if (found_bad_dear == 0 && param && param->pfp_mont_dear.ear_used == 1) { if ( param->pfp_mont_dear.ear_mode == PFMLIB_MONT_EAR_TLB_MODE || param->pfp_mont_dear.ear_mode == PFMLIB_MONT_EAR_ALAT_MODE) found_bad_dear = 1; } /* * no explicit ETB event and no special case to deal with (cover part of case 3) */ if (found_etb == 0 && has_etb_param == 0 && found_bad_dear == 0) return PFMLIB_SUCCESS; if (has_etb_param == 0) { /* * case 3: no ETB event, etb_used=0 but found_bad_dear=1, need to cleanup PMC12 */ if (found_etb == 0) goto assign_zero; /* * case 1: we have a ETB event but no param, default setting is to capture * all branches. */ memset(&fake_param, 0, sizeof(fake_param)); param = &fake_param; param->pfp_mont_etb.etb_tm = 0x3; /* all branches */ param->pfp_mont_etb.etb_ptm = 0x3; /* all branches */ param->pfp_mont_etb.etb_ppm = 0x3; /* all branches */ param->pfp_mont_etb.etb_brt = 0x0; /* all branches */ DPRINT("ETB event with no info\n"); } /* * case 2: ETB event in the list, param provided * case 4: no ETB event, param provided (free running mode) */ reg.pmc39_mont_reg.etbc_plm = param->pfp_mont_etb.etb_plm ? param->pfp_mont_etb.etb_plm : inp->pfp_dfl_plm; reg.pmc39_mont_reg.etbc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc39_mont_reg.etbc_ds = 0; /* 1 is reserved */ reg.pmc39_mont_reg.etbc_tm = param->pfp_mont_etb.etb_tm & 0x3; reg.pmc39_mont_reg.etbc_ptm = param->pfp_mont_etb.etb_ptm & 0x3; reg.pmc39_mont_reg.etbc_ppm = param->pfp_mont_etb.etb_ppm & 0x3; reg.pmc39_mont_reg.etbc_brt = param->pfp_mont_etb.etb_brt & 0x3; /* * if DEAR-ALAT or DEAR-TLB is set then PMC12 must be set to zero (see documentation p. 87) * * D-EAR ALAT/TLB and ETB cannot be used at the same time. * From documentation: PMC12 must be zero in this mode; else the wrong IP for misses * coming right after a mispredicted branch. * * D-EAR cache is fine. */ assign_zero: if (found_bad_dear && reg.pmc_val != 0UL) return PFMLIB_ERR_EVTINCOMP; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 39)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 39; pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 39; pos1++; __pfm_vbprintf("[PMC39(pmc39)=0x%lx plm=%d pm=%d ds=%d tm=%d ptm=%d ppm=%d brt=%d]\n", reg.pmc_val, reg.pmc39_mont_reg.etbc_plm, reg.pmc39_mont_reg.etbc_pm, reg.pmc39_mont_reg.etbc_ds, reg.pmc39_mont_reg.etbc_tm, reg.pmc39_mont_reg.etbc_ptm, reg.pmc39_mont_reg.etbc_ppm, reg.pmc39_mont_reg.etbc_brt); /* * only add ETB PMDs when actually using BTB. * Not needed when dealing with D-EAR TLB and DEAR-ALAT * PMC39 restriction */ if (found_etb || has_etb_param) { pd[pos2].reg_num = 38; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 38; pos2++; pd[pos2].reg_num = 39; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 39; pos2++; __pfm_vbprintf("[PMD38(pmd38)]\n[PMD39(pmd39)\n"); for(i=48; i < 64; i++, pos2++) { pd[pos2].reg_num = i; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = i; __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[pos2].reg_num, pd[pos2].reg_num); } } /* update final number of entries used */ outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static void do_normal_rr(unsigned long start, unsigned long end, pfmlib_reg_t *br, int nbr, int dir, int *idx, int *reg_idx, int plm) { unsigned long size, l_addr, c; unsigned long l_offs = 0, r_offs = 0; unsigned long l_size, r_size; dbreg_t db; int p2; if (nbr < 1 || end <= start) return; size = end - start; DPRINT("start=0x%016lx end=0x%016lx size=0x%lx bytes (%lu bundles) nbr=%d dir=%d\n", start, end, size, size >> 4, nbr, dir); p2 = pfm_ia64_fls(size); c = ALIGN_DOWN(end, p2); DPRINT("largest power of two possible: 2^%d=0x%lx, crossing=0x%016lx\n", p2, 1UL << p2, c); if ((c - (1UL<= start) { l_addr = c - (1UL << p2); } else { p2--; if ((c + (1UL<>l_offs: 0x%lx\n", l_offs); } } else if (dir == 1 && r_size != 0 && nbr == 1) { p2++; l_addr = start; if (PFMLIB_DEBUG()) { r_offs = l_addr+(1UL<>r_offs: 0x%lx\n", r_offs); } } l_size = l_addr - start; r_size = end - l_addr-(1UL<>largest chunk: 2^%d @0x%016lx-0x%016lx\n", p2, l_addr, l_addr+(1UL<>before: 0x%016lx-0x%016lx\n", start, l_addr); if (r_size && !r_offs) printf(">>after : 0x%016lx-0x%016lx\n", l_addr+(1UL<>1; if (nbr & 0x1) { /* * our simple heuristic is: * we assign the largest number of registers to the largest * of the two chunks */ if (l_size > r_size) { l_nbr++; } else { r_nbr++; } } do_normal_rr(start, l_addr, br, l_nbr, 0, idx, reg_idx, plm); do_normal_rr(l_addr+(1UL<rr_start, in_rr->rr_end, n_pairs, fine_mode ? ", fine_mode" : "", rr_flags & PFMLIB_MONT_RR_INV ? ", inversed" : ""); __pfm_vbprintf("start offset: -0x%lx end_offset: +0x%lx\n", out_rr->rr_soff, out_rr->rr_eoff); for (j=0; j < n_pairs; j++, base_idx+=2) { d.val = dbr[base_idx+1].reg_value; r_end = dbr[base_idx].reg_value+((~(d.db.db_mask)) & ~(0xffUL << 56)); if (fine_mode) __pfm_vbprintf("brp%u: db%u: 0x%016lx db%u: plm=0x%x mask=0x%016lx\n", dbr[base_idx].reg_num>>1, dbr[base_idx].reg_num, dbr[base_idx].reg_value, dbr[base_idx+1].reg_num, d.db.db_plm, d.db.db_mask); else __pfm_vbprintf("brp%u: db%u: 0x%016lx db%u: plm=0x%x mask=0x%016lx end=0x%016lx\n", dbr[base_idx].reg_num>>1, dbr[base_idx].reg_num, dbr[base_idx].reg_value, dbr[base_idx+1].reg_num, d.db.db_plm, d.db.db_mask, r_end); } } /* * base_idx = base register index to use (for IBRP1, base_idx = 2) */ static int compute_fine_rr(pfmlib_mont_input_rr_t *irr, int dfl_plm, int n, int *base_idx, pfmlib_mont_output_rr_t *orr) { int i; pfmlib_reg_t *br; pfmlib_mont_input_rr_desc_t *in_rr; pfmlib_mont_output_rr_desc_t *out_rr; unsigned long addr; int reg_idx; dbreg_t db; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; reg_idx = *base_idx; db.val = 0; db.db.db_mask = FINE_MODE_MASK; if (n > 2) return PFMLIB_ERR_IRRTOOMANY; for (i=0; i < n; i++, reg_idx += 2, in_rr++, br+= 4) { /* * setup lower limit pair * * because of the PMU can only see addresses on a 2-bundle boundary, we must align * down to the closest bundle-pair aligned address. 5 => 32-byte aligned address */ addr = ALIGN_DOWN(in_rr->rr_start, 5); out_rr->rr_soff = in_rr->rr_start - addr; /* * adjust plm for each range */ db.db.db_plm = in_rr->rr_plm ? in_rr->rr_plm : (unsigned long)dfl_plm; br[0].reg_num = reg_idx; br[0].reg_value = addr; br[0].reg_addr = br[0].reg_alt_addr = 1+reg_idx; br[1].reg_num = reg_idx+1; br[1].reg_value = db.val; br[1].reg_addr = br[1].reg_alt_addr = 1+reg_idx+1; /* * setup upper limit pair * * * In fine mode, the bundle address stored in the upper limit debug * registers is included in the count, so we substract 0x10 to exclude it. * * because of the PMU bug, we align the (corrected) end to the nearest * 32-byte aligned address + 0x10. With this correction and depending * on the correction, we may count one * * */ addr = in_rr->rr_end - 0x10; if ((addr & 0x1f) == 0) addr += 0x10; out_rr->rr_eoff = addr - in_rr->rr_end + 0x10; br[2].reg_num = reg_idx+4; br[2].reg_value = addr; br[2].reg_addr = br[2].reg_alt_addr = 1+reg_idx+4; br[3].reg_num = reg_idx+5; br[3].reg_value = db.val; br[3].reg_addr = br[3].reg_alt_addr = 1+reg_idx+5; if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, 0, 2, 1, irr->rr_flags); } orr->rr_nbr_used += i<<2; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } /* * base_idx = base register index to use (for IBRP1, base_idx = 2) */ static int compute_single_rr(pfmlib_mont_input_rr_t *irr, int dfl_plm, int *base_idx, pfmlib_mont_output_rr_t *orr) { unsigned long size, end, start; unsigned long p_start, p_end; pfmlib_mont_input_rr_desc_t *in_rr; pfmlib_mont_output_rr_desc_t *out_rr; pfmlib_reg_t *br; dbreg_t db; int reg_idx; int l, m; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; start = in_rr->rr_start; end = in_rr->rr_end; size = end - start; reg_idx = *base_idx; l = pfm_ia64_fls(size); m = l; if (size & ((1UL << l)-1)) { if (l>62) { printf("range: [0x%lx-0x%lx] too big\n", start, end); return PFMLIB_ERR_IRRTOOBIG; } m++; } DPRINT("size=%ld, l=%d m=%d, internal: 0x%lx full: 0x%lx\n", size, l, m, 1UL << l, 1UL << m); for (; m < 64; m++) { p_start = ALIGN_DOWN(start, m); p_end = p_start+(1UL<= end) goto found; } return PFMLIB_ERR_IRRINVAL; found: DPRINT("m=%d p_start=0x%lx p_end=0x%lx\n", m, p_start,p_end); /* when the event is not IA64_INST_RETIRED, then we MUST use ibrp0 */ br[0].reg_num = reg_idx; br[0].reg_value = p_start; br[0].reg_addr = br[0].reg_alt_addr = 1+reg_idx; db.val = 0; db.db.db_mask = ~((1UL << m)-1); db.db.db_plm = in_rr->rr_plm ? in_rr->rr_plm : (unsigned long)dfl_plm; br[1].reg_num = reg_idx + 1; br[1].reg_value = db.val; br[1].reg_addr = br[1].reg_alt_addr = 1+reg_idx+1; out_rr->rr_soff = start - p_start; out_rr->rr_eoff = p_end - end; if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, 0, 1, 0, irr->rr_flags); orr->rr_nbr_used += 2; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } static int compute_normal_rr(pfmlib_mont_input_rr_t *irr, int dfl_plm, int n, int *base_idx, pfmlib_mont_output_rr_t *orr) { pfmlib_mont_input_rr_desc_t *in_rr; pfmlib_mont_output_rr_desc_t *out_rr; unsigned long r_end; pfmlib_reg_t *br; dbreg_t d; int i, j; int br_index, reg_idx, prev_index; in_rr = irr->rr_limits; out_rr = orr->rr_infos; br = orr->rr_br+orr->rr_nbr_used; reg_idx = *base_idx; br_index = 0; for (i=0; i < n; i++, in_rr++, out_rr++) { /* * running out of registers */ if (br_index == 8) break; prev_index = br_index; do_normal_rr( in_rr->rr_start, in_rr->rr_end, br, 4 - (reg_idx>>1), /* how many pairs available */ 0, &br_index, ®_idx, in_rr->rr_plm ? in_rr->rr_plm : dfl_plm); DPRINT("br_index=%d reg_idx=%d\n", br_index, reg_idx); /* * compute offsets */ out_rr->rr_soff = out_rr->rr_eoff = 0; for(j=prev_index; j < br_index; j+=2) { d.val = br[j+1].reg_value; r_end = br[j].reg_value+((~(d.db.db_mask)+1) & ~(0xffUL << 56)); if (br[j].reg_value <= in_rr->rr_start) out_rr->rr_soff = in_rr->rr_start - br[j].reg_value; if (r_end >= in_rr->rr_end) out_rr->rr_eoff = r_end - in_rr->rr_end; } if (PFMLIB_VERBOSE()) print_one_range(in_rr, out_rr, br, prev_index, (br_index-prev_index)>>1, 0, irr->rr_flags); } /* do not have enough registers to cover all the ranges */ if (br_index == 8 && i < n) return PFMLIB_ERR_TOOMANY; orr->rr_nbr_used += br_index; /* update base_idx, for subsequent calls */ *base_idx = reg_idx; return PFMLIB_SUCCESS; } static int pfm_dispatch_irange(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out) { pfm_mont_pmc_reg_t reg; pfmlib_mont_input_param_t *param = mod_in; pfmlib_mont_input_rr_t *irr; pfmlib_mont_output_rr_t *orr; pfmlib_reg_t *pc = outp->pfp_pmcs; unsigned long retired_mask; unsigned int i, pos = outp->pfp_pmc_count, count; unsigned int retired_only, retired_count, fine_mode, prefetch_count; unsigned int n_intervals; int base_idx = 0, dup = 0; int ret; if (param == NULL) return PFMLIB_SUCCESS; if (param->pfp_mont_irange.rr_used == 0) return PFMLIB_SUCCESS; if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_mont_irange; orr = &mod_out->pfp_mont_irange; ret = check_intervals(irr, 0, &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_IRRINVAL; retired_count = check_inst_retired_events(inp, &retired_mask); retired_only = retired_count == inp->pfp_event_count; fine_mode = irr->rr_flags & PFMLIB_MONT_RR_NO_FINE_MODE ? 0 : check_fine_mode_possible(irr, n_intervals); DPRINT("n_intervals=%d retired_only=%d retired_count=%d fine_mode=%d\n", n_intervals, retired_only, retired_count, fine_mode); /* * On montecito, there are more constraints on what can be measured with irange. * * - The fine mode is the best because you directly set the lower and upper limits of * the range. This uses 2 ibr pairs for range (ibrp0/ibrp2 and ibp1/ibrp3). Therefore * at most 2 fine mode ranges can be defined. The boundaries of the range must be in the * same 64KB page. The fine mode works will all events. * * - if the fine mode fails, then for all events, except IA64_TAGGED_INST_RETIRED_*, only * the first pair of ibr is available: ibrp0. This imposes some severe restrictions on the * size and alignement of the range. It can be bigger than 64KB and must be properly aligned * on its size. The library relaxes these constraints by allowing the covered areas to be * larger than the expected range. It may start before and end after the requested range. * You can determine the amount of overrun in either direction for each range by looking at * the rr_soff (start offset) and rr_eoff (end offset). * * - if the events include certain prefetch events then only IBRP1 can be used. * See 3.3.5.2 Exception 1. * * - Finally, when the events are ONLY IA64_TAGGED_INST_RETIRED_* then all IBR pairs can be used * to cover the range giving us more flexibility to approximate the range when it is not * properly aligned on its size (see 10.3.5.2 Exception 2). But the corresponding * IA64_TAGGED_INST_RETIRED_* must be present. */ if (fine_mode == 0 && retired_only == 0 && n_intervals > 1) return PFMLIB_ERR_IRRTOOMANY; /* we do not default to non-fine mode to support more ranges */ if (n_intervals > 2 && fine_mode == 1) return PFMLIB_ERR_IRRTOOMANY; ret = check_prefetch_events(inp, irr, &prefetch_count, &base_idx, &dup); if (ret) return ret; DPRINT("prefetch_count=%u base_idx=%d dup=%d\n", prefetch_count, base_idx, dup); /* * CPU_OP_CYCLES.QUAL supports code range restrictions but it returns * meaningful values (fine/coarse mode) only when IBRP1 is not used. */ if ((base_idx > 0 || dup) && has_cpu_cycles_qual(inp)) return PFMLIB_ERR_FEATCOMB; if (fine_mode == 0) { if (retired_only) { /* can take multiple intervals */ ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); } else { /* unless we have only prefetch and instruction retired events, * we cannot satisfy the request because the other events cannot * be measured on anything but IBRP0. */ if ((prefetch_count+retired_count) != inp->pfp_event_count) return PFMLIB_ERR_FEATCOMB; ret = compute_single_rr(irr, inp->pfp_dfl_plm, &base_idx, orr); if (ret == PFMLIB_SUCCESS && dup) ret = compute_single_rr(irr, inp->pfp_dfl_plm, &base_idx, orr); } } else { if (prefetch_count && n_intervals != 1) return PFMLIB_ERR_IRRTOOMANY; /* except is retired_only, can take only one interval */ ret = compute_fine_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); if (ret == PFMLIB_SUCCESS && dup) ret = compute_fine_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); } if (ret != PFMLIB_SUCCESS) return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_IRRTOOMANY : ret; reg.pmc_val = 0xdb6; /* default value */ count = orr->rr_nbr_used; for (i=0; i < count; i++) { switch(orr->rr_br[i].reg_num) { case 0: reg.pmc38_mont_reg.iarc_ig_ibrp0 = 0; break; case 2: reg.pmc38_mont_reg.iarc_ig_ibrp1 = 0; break; case 4: reg.pmc38_mont_reg.iarc_ig_ibrp2 = 0; break; case 6: reg.pmc38_mont_reg.iarc_ig_ibrp3 = 0; break; } } if (fine_mode) { reg.pmc38_mont_reg.iarc_fine = 1; } else if (retired_only) { /* * we need to check that the user provided all the events needed to cover * all the ibr pairs used to cover the range */ if ((retired_mask & 0x1) == 0 && reg.pmc38_mont_reg.iarc_ig_ibrp0 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x2) == 0 && reg.pmc38_mont_reg.iarc_ig_ibrp1 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x4) == 0 && reg.pmc38_mont_reg.iarc_ig_ibrp2 == 0) return PFMLIB_ERR_IRRINVAL; if ((retired_mask & 0x8) == 0 && reg.pmc38_mont_reg.iarc_ig_ibrp3 == 0) return PFMLIB_ERR_IRRINVAL; } if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 38)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 38; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 38; pos++; __pfm_vbprintf("[PMC38(pmc38)=0x%lx ig_ibrp0=%d ig_ibrp1=%d ig_ibrp2=%d ig_ibrp3=%d fine=%d]\n", reg.pmc_val, reg.pmc38_mont_reg.iarc_ig_ibrp0, reg.pmc38_mont_reg.iarc_ig_ibrp1, reg.pmc38_mont_reg.iarc_ig_ibrp2, reg.pmc38_mont_reg.iarc_ig_ibrp3, reg.pmc38_mont_reg.iarc_fine); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static const unsigned long iod_tab[8]={ /* --- */ 3, /* --D */ 2, /* -O- */ 3, /* should not be used */ /* -OD */ 0, /* =IOD safe because default IBR is harmless */ /* I-- */ 1, /* =IO safe because by defaut OPC is turned off */ /* I-D */ 0, /* =IOD safe because by default opc is turned off */ /* IO- */ 1, /* IOD */ 0 }; /* * IMPORTANT: MUST BE CALLED *AFTER* pfm_dispatch_irange() to make sure we see * the irange programming to adjust pmc41. */ static int pfm_dispatch_drange(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out) { pfmlib_mont_input_param_t *param = mod_in; pfmlib_reg_t *pc = outp->pfp_pmcs; pfmlib_mont_input_rr_t *irr; pfmlib_mont_output_rr_t *orr, *orr2; pfm_mont_pmc_reg_t pmc38; pfm_mont_pmc_reg_t reg; unsigned int i, pos = outp->pfp_pmc_count; int iod_codes[4], dfl_val_pmc32, dfl_val_pmc34; unsigned int n_intervals; int ret; int base_idx = 0; int fine_mode = 0; #define DR_USED 0x1 /* data range is used */ #define OP_USED 0x2 /* opcode matching is used */ #define IR_USED 0x4 /* code range is used */ if (param == NULL) return PFMLIB_SUCCESS; /* * if only pmc32/pmc33 opcode matching is used, we do not need to change * the default value of pmc41 regardless of the events being measured. */ if ( param->pfp_mont_drange.rr_used == 0 && param->pfp_mont_irange.rr_used == 0) return PFMLIB_SUCCESS; /* * it seems like the ignored bits need to have special values * otherwise this does not work. */ reg.pmc_val = 0x2078fefefefe; /* * initialize iod codes */ iod_codes[0] = iod_codes[1] = iod_codes[2] = iod_codes[3] = 0; /* * setup default iod value, we need to separate because * if drange is used we do not know in advance which DBR will be used * therefore we need to apply dfl_val later */ dfl_val_pmc32 = param->pfp_mont_opcm1.opcm_used ? OP_USED : 0; dfl_val_pmc34 = param->pfp_mont_opcm2.opcm_used ? OP_USED : 0; if (param->pfp_mont_drange.rr_used == 1) { if (mod_out == NULL) return PFMLIB_ERR_INVAL; irr = ¶m->pfp_mont_drange; orr = &mod_out->pfp_mont_drange; ret = check_intervals(irr, 1, &n_intervals); if (ret != PFMLIB_SUCCESS) return ret; if (n_intervals < 1) return PFMLIB_ERR_DRRINVAL; ret = compute_normal_rr(irr, inp->pfp_dfl_plm, n_intervals, &base_idx, orr); if (ret != PFMLIB_SUCCESS) { return ret == PFMLIB_ERR_TOOMANY ? PFMLIB_ERR_DRRTOOMANY : ret; } /* * Update iod_codes to reflect the use of the DBR constraint. */ for (i=0; i < orr->rr_nbr_used; i++) { if (orr->rr_br[i].reg_num == 0) iod_codes[0] |= DR_USED | dfl_val_pmc32; if (orr->rr_br[i].reg_num == 2) iod_codes[1] |= DR_USED | dfl_val_pmc34; if (orr->rr_br[i].reg_num == 4) iod_codes[2] |= DR_USED | dfl_val_pmc32; if (orr->rr_br[i].reg_num == 6) iod_codes[3] |= DR_USED | dfl_val_pmc34; } } /* * XXX: assume dispatch_irange executed before calling this function */ if (param->pfp_mont_irange.rr_used == 1) { orr2 = &mod_out->pfp_mont_irange; if (mod_out == NULL) return PFMLIB_ERR_INVAL; /* * we need to find out whether or not the irange is using * fine mode. If this is the case, then we only need to * program pmc41 for the ibr pairs which designate the lower * bounds of a range. For instance, if IBRP0/IBRP2 are used, * then we only need to program pmc13.cfg_dbrp0 and pmc13.ena_dbrp0, * the PMU will automatically use IBRP2, even though pmc13.ena_dbrp2=0. */ for(i=0; i <= pos; i++) { if (pc[i].reg_num == 38) { pmc38.pmc_val = pc[i].reg_value; if (pmc38.pmc38_mont_reg.iarc_fine == 1) fine_mode = 1; break; } } /* * Update to reflect the use of the IBR constraint */ for (i=0; i < orr2->rr_nbr_used; i++) { if (orr2->rr_br[i].reg_num == 0) iod_codes[0] |= IR_USED | dfl_val_pmc32; if (orr2->rr_br[i].reg_num == 2) iod_codes[1] |= IR_USED | dfl_val_pmc34; if (fine_mode == 0 && orr2->rr_br[i].reg_num == 4) iod_codes[2] |= IR_USED | dfl_val_pmc32; if (fine_mode == 0 && orr2->rr_br[i].reg_num == 6) iod_codes[3] |= IR_USED | dfl_val_pmc34; } } if (param->pfp_mont_irange.rr_used == 0 && param->pfp_mont_drange.rr_used ==0) { iod_codes[0] = iod_codes[2] = dfl_val_pmc32; iod_codes[1] = iod_codes[3] = dfl_val_pmc34; } /* * update the cfg dbrpX field. If we put a constraint on a cfg dbrp, then * we must enable it in the corresponding ena_dbrpX */ reg.pmc41_mont_reg.darc_ena_dbrp0 = iod_codes[0] ? 1 : 0; reg.pmc41_mont_reg.darc_cfg_dtag0 = iod_tab[iod_codes[0]]; reg.pmc41_mont_reg.darc_ena_dbrp1 = iod_codes[1] ? 1 : 0; reg.pmc41_mont_reg.darc_cfg_dtag1 = iod_tab[iod_codes[1]]; reg.pmc41_mont_reg.darc_ena_dbrp2 = iod_codes[2] ? 1 : 0; reg.pmc41_mont_reg.darc_cfg_dtag2 = iod_tab[iod_codes[2]]; reg.pmc41_mont_reg.darc_ena_dbrp3 = iod_codes[3] ? 1 : 0; reg.pmc41_mont_reg.darc_cfg_dtag3 = iod_tab[iod_codes[3]]; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 41)) return PFMLIB_ERR_NOASSIGN; pc[pos].reg_num = 41; pc[pos].reg_value = reg.pmc_val; pc[pos].reg_addr = pc[pos].reg_alt_addr = 41; pos++; __pfm_vbprintf("[PMC41(pmc41)=0x%lx cfg_dtag0=%d cfg_dtag1=%d cfg_dtag2=%d cfg_dtag3=%d ena_dbrp0=%d ena_dbrp1=%d ena_dbrp2=%d ena_dbrp3=%d]\n", reg.pmc_val, reg.pmc41_mont_reg.darc_cfg_dtag0, reg.pmc41_mont_reg.darc_cfg_dtag1, reg.pmc41_mont_reg.darc_cfg_dtag2, reg.pmc41_mont_reg.darc_cfg_dtag3, reg.pmc41_mont_reg.darc_ena_dbrp0, reg.pmc41_mont_reg.darc_ena_dbrp1, reg.pmc41_mont_reg.darc_ena_dbrp2, reg.pmc41_mont_reg.darc_ena_dbrp3); outp->pfp_pmc_count = pos; return PFMLIB_SUCCESS; } static int check_qualifier_constraints(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in) { pfmlib_mont_input_param_t *param = mod_in; pfmlib_event_t *e = inp->pfp_events; unsigned int i, count; count = inp->pfp_event_count; for(i=0; i < count; i++) { /* * skip check for counter which requested it. Use at your own risk. * No all counters have necessarily been validated for use with * qualifiers. Typically the event is counted as if no constraint * existed. */ if (param->pfp_mont_counters[i].flags & PFMLIB_MONT_FL_EVT_NO_QUALCHECK) continue; if (evt_use_irange(param) && has_iarr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_drange(param) && has_darr(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; if (evt_use_opcm(param) && has_opcm(e[i].event) == 0) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int check_range_plm(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in) { pfmlib_mont_input_param_t *param = mod_in; unsigned int i, count; if (param->pfp_mont_drange.rr_used == 0 && param->pfp_mont_irange.rr_used == 0) return PFMLIB_SUCCESS; /* * range restriction applies to all events, therefore we must have a consistent * set of plm and they must match the pfp_dfl_plm which is used to setup the debug * registers */ count = inp->pfp_event_count; for(i=0; i < count; i++) { if (inp->pfp_events[i].plm && inp->pfp_events[i].plm != inp->pfp_dfl_plm) return PFMLIB_ERR_FEATCOMB; } return PFMLIB_SUCCESS; } static int pfm_dispatch_ipear(pfmlib_input_param_t *inp, pfmlib_mont_input_param_t *mod_in, pfmlib_output_param_t *outp) { pfm_mont_pmc_reg_t reg; pfmlib_mont_input_param_t *param = mod_in; pfmlib_event_t *e = inp->pfp_events; pfmlib_reg_t *pc, *pd; unsigned int pos1, pos2; unsigned int i, count; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; pos1 = outp->pfp_pmc_count; pos2 = outp->pfp_pmd_count; /* * check if there is something to do */ if (param == NULL || param->pfp_mont_ipear.ipear_used == 0) return PFMLIB_SUCCESS; /* * we need to look for use of ETB, because IP-EAR and ETB cannot be used at the * same time */ if (param->pfp_mont_etb.etb_used) return PFMLIB_ERR_FEATCOMB; /* * look for implicit ETB used because of BRANCH_EVENT */ count = inp->pfp_event_count; for (i=0; i < count; i++) { if (is_etb(e[i].event)) return PFMLIB_ERR_FEATCOMB; } reg.pmc_val = 0; reg.pmc42_mont_reg.ipear_plm = param->pfp_mont_ipear.ipear_plm ? param->pfp_mont_ipear.ipear_plm : inp->pfp_dfl_plm; reg.pmc42_mont_reg.ipear_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE ? 1 : 0; reg.pmc42_mont_reg.ipear_mode = 4; reg.pmc42_mont_reg.ipear_delay = param->pfp_mont_ipear.ipear_delay; if (pfm_regmask_isset(&inp->pfp_unavail_pmcs, 42)) return PFMLIB_ERR_NOASSIGN; pc[pos1].reg_num = 42; pc[pos1].reg_value = reg.pmc_val; pc[pos1].reg_addr = pc[pos1].reg_alt_addr = 42; pos1++; __pfm_vbprintf("[PMC42(pmc42)=0x%lx plm=%d pm=%d mode=%d delay=%d]\n", reg.pmc_val, reg.pmc42_mont_reg.ipear_plm, reg.pmc42_mont_reg.ipear_pm, reg.pmc42_mont_reg.ipear_mode, reg.pmc42_mont_reg.ipear_delay); pd[pos2].reg_num = 38; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 38; pos2++; pd[pos2].reg_num = 39; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = 39; pos2++; __pfm_vbprintf("[PMD38(pmd38)]\n[PMD39(pmd39)\n"); for(i=48; i < 64; i++, pos2++) { pd[pos2].reg_num = i; pd[pos2].reg_addr = pd[pos2].reg_alt_addr = i; __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[pos2].reg_num, pd[pos2].reg_num); } outp->pfp_pmc_count = pos1; outp->pfp_pmd_count = pos2; return PFMLIB_SUCCESS; } static int pfm_mont_dispatch_events(pfmlib_input_param_t *inp, void *model_in, pfmlib_output_param_t *outp, void *model_out) { int ret; pfmlib_mont_input_param_t *mod_in = (pfmlib_mont_input_param_t *)model_in; pfmlib_mont_output_param_t *mod_out = (pfmlib_mont_output_param_t *)model_out; /* * nothing will come out of this combination */ if (mod_out && mod_in == NULL) return PFMLIB_ERR_INVAL; /* check opcode match, range restriction qualifiers */ if (mod_in && check_qualifier_constraints(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; /* check for problems with range restriction and per-event plm */ if (mod_in && check_range_plm(inp, mod_in) != PFMLIB_SUCCESS) return PFMLIB_ERR_FEATCOMB; ret = pfm_mont_dispatch_counters(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for I-EAR */ ret = pfm_dispatch_iear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for D-EAR */ ret = pfm_dispatch_dear(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* XXX: must be done before dispatch_opcm() and dispatch_drange() */ ret = pfm_dispatch_irange(inp, mod_in, outp, mod_out);; if (ret != PFMLIB_SUCCESS) return ret; ret = pfm_dispatch_drange(inp, mod_in, outp, mod_out);; if (ret != PFMLIB_SUCCESS) return ret; /* now check for Opcode matchers */ ret = pfm_dispatch_opcm(inp, mod_in, outp, mod_out); if (ret != PFMLIB_SUCCESS) return ret; /* now check for ETB */ ret = pfm_dispatch_etb(inp, mod_in, outp); if (ret != PFMLIB_SUCCESS) return ret; /* now check for IP-EAR */ ret = pfm_dispatch_ipear(inp, mod_in, outp); return ret; } /* XXX: return value is also error code */ int pfm_mont_get_event_maxincr(unsigned int i, unsigned int *maxincr) { if (i >= PME_MONT_EVENT_COUNT || maxincr == NULL) return PFMLIB_ERR_INVAL; *maxincr = montecito_pe[i].pme_maxincr; return PFMLIB_SUCCESS; } int pfm_mont_is_ear(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_ear(i); } int pfm_mont_is_dear(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_dear(i); } int pfm_mont_is_dear_tlb(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_dear(i) && is_ear_tlb(i); } int pfm_mont_is_dear_cache(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_dear(i) && is_ear_cache(i); } int pfm_mont_is_dear_alat(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_ear_alat(i); } int pfm_mont_is_iear(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_iear(i); } int pfm_mont_is_iear_tlb(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_iear(i) && is_ear_tlb(i); } int pfm_mont_is_iear_cache(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_iear(i) && is_ear_cache(i); } int pfm_mont_is_etb(unsigned int i) { return i < PME_MONT_EVENT_COUNT && is_etb(i); } int pfm_mont_support_iarr(unsigned int i) { return i < PME_MONT_EVENT_COUNT && has_iarr(i); } int pfm_mont_support_darr(unsigned int i) { return i < PME_MONT_EVENT_COUNT && has_darr(i); } int pfm_mont_support_opcm(unsigned int i) { return i < PME_MONT_EVENT_COUNT && has_opcm(i); } int pfm_mont_support_all(unsigned int i) { return i < PME_MONT_EVENT_COUNT && has_all(i); } int pfm_mont_get_ear_mode(unsigned int i, pfmlib_mont_ear_mode_t *m) { pfmlib_mont_ear_mode_t r; if (!is_ear(i) || m == NULL) return PFMLIB_ERR_INVAL; r = PFMLIB_MONT_EAR_TLB_MODE; if (is_ear_tlb(i)) goto done; r = PFMLIB_MONT_EAR_CACHE_MODE; if (is_ear_cache(i)) goto done; r = PFMLIB_MONT_EAR_ALAT_MODE; if (is_ear_alat(i)) goto done; return PFMLIB_ERR_INVAL; done: *m = r; return PFMLIB_SUCCESS; } static int pfm_mont_get_event_code(unsigned int i, unsigned int cnt, int *code) { if (cnt != PFMLIB_CNT_FIRST && (cnt < 4 || cnt > 15)) return PFMLIB_ERR_INVAL; *code = (int)montecito_pe[i].pme_code; return PFMLIB_SUCCESS; } /* * This function is accessible directly to the user */ int pfm_mont_get_event_umask(unsigned int i, unsigned long *umask) { if (i >= PME_MONT_EVENT_COUNT || umask == NULL) return PFMLIB_ERR_INVAL; *umask = evt_umask(i); return PFMLIB_SUCCESS; } int pfm_mont_get_event_group(unsigned int i, int *grp) { if (i >= PME_MONT_EVENT_COUNT || grp == NULL) return PFMLIB_ERR_INVAL; *grp = evt_grp(i); return PFMLIB_SUCCESS; } int pfm_mont_get_event_set(unsigned int i, int *set) { if (i >= PME_MONT_EVENT_COUNT || set == NULL) return PFMLIB_ERR_INVAL; *set = evt_set(i) == 0xf ? PFMLIB_MONT_EVT_NO_SET : evt_set(i); return PFMLIB_SUCCESS; } int pfm_mont_get_event_type(unsigned int i, int *type) { if (i >= PME_MONT_EVENT_COUNT || type == NULL) return PFMLIB_ERR_INVAL; *type = evt_caf(i); return PFMLIB_SUCCESS; } /* external interface */ int pfm_mont_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out) { pfmlib_mont_output_param_t *param = mod_out; pfm_mont_pmc_reg_t reg; unsigned int i, count; /* some sanity checks */ if (outp == NULL || param == NULL) return 0; if (outp->pfp_pmc_count >= PFMLIB_MAX_PMCS) return 0; if (param->pfp_mont_irange.rr_nbr_used == 0) return 0; /* * we look for pmc38 as it contains the bit indicating if fine mode is used */ count = outp->pfp_pmc_count; for(i=0; i < count; i++) { if (outp->pfp_pmcs[i].reg_num == 38) goto found; } return 0; found: reg.pmc_val = outp->pfp_pmcs[i].reg_value; return reg.pmc38_mont_reg.iarc_fine ? 1 : 0; } static char * pfm_mont_get_event_name(unsigned int i) { return montecito_pe[i].pme_name; } static void pfm_mont_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { unsigned int i; unsigned long m; memset(counters, 0, sizeof(*counters)); m =montecito_pe[j].pme_counters; for(i=0; m ; i++, m>>=1) { if (m & 0x1) pfm_regmask_set(counters, i); } } static void pfm_mont_get_impl_pmcs(pfmlib_regmask_t *impl_pmcs) { unsigned int i = 0; for(i=0; i < 16; i++) pfm_regmask_set(impl_pmcs, i); for(i=32; i < 43; i++) pfm_regmask_set(impl_pmcs, i); } static void pfm_mont_get_impl_pmds(pfmlib_regmask_t *impl_pmds) { unsigned int i = 0; for(i=4; i < 16; i++) pfm_regmask_set(impl_pmds, i); for(i=32; i < 40; i++) pfm_regmask_set(impl_pmds, i); for(i=48; i < 64; i++) pfm_regmask_set(impl_pmds, i); } static void pfm_mont_get_impl_counters(pfmlib_regmask_t *impl_counters) { unsigned int i = 0; /* counter pmds are contiguous */ for(i=4; i < 16; i++) pfm_regmask_set(impl_counters, i); } static void pfm_mont_get_hw_counter_width(unsigned int *width) { *width = PMU_MONT_COUNTER_WIDTH; } static int pfm_mont_get_event_description(unsigned int ev, char **str) { char *s; s = montecito_pe[ev].pme_desc; if (s) { *str = strdup(s); } else { *str = NULL; } return PFMLIB_SUCCESS; } static int pfm_mont_get_cycle_event(pfmlib_event_t *e) { e->event = PME_MONT_CPU_OP_CYCLES_ALL; return PFMLIB_SUCCESS; } static int pfm_mont_get_inst_retired(pfmlib_event_t *e) { e->event = PME_MONT_IA64_INST_RETIRED; return PFMLIB_SUCCESS; } static unsigned int pfm_mont_get_num_event_masks(unsigned int event) { return has_mesi(event) ? 4 : 0; } static char * pfm_mont_get_event_mask_name(unsigned int event, unsigned int mask) { switch(mask) { case 0: return "I"; case 1: return "S"; case 2: return "E"; case 3: return "M"; } return NULL; } static int pfm_mont_get_event_mask_desc(unsigned int event, unsigned int mask, char **desc) { switch(mask) { case 0: *desc = strdup("invalid"); break; case 1: *desc = strdup("shared"); break; case 2: *desc = strdup("exclusive"); break; case 3: *desc = strdup("modified"); break; default: return PFMLIB_ERR_INVAL; } return PFMLIB_SUCCESS; } static int pfm_mont_get_event_mask_code(unsigned int event, unsigned int mask, unsigned int *code) { *code = mask; return PFMLIB_SUCCESS; } pfm_pmu_support_t montecito_support={ .pmu_name = "dual-core Itanium 2", .pmu_type = PFMLIB_MONTECITO_PMU, .pme_count = PME_MONT_EVENT_COUNT, .pmc_count = PMU_MONT_NUM_PMCS, .pmd_count = PMU_MONT_NUM_PMDS, .num_cnt = PMU_MONT_NUM_COUNTERS, .get_event_code = pfm_mont_get_event_code, .get_event_name = pfm_mont_get_event_name, .get_event_counters = pfm_mont_get_event_counters, .dispatch_events = pfm_mont_dispatch_events, .pmu_detect = pfm_mont_detect, .get_impl_pmcs = pfm_mont_get_impl_pmcs, .get_impl_pmds = pfm_mont_get_impl_pmds, .get_impl_counters = pfm_mont_get_impl_counters, .get_hw_counter_width = pfm_mont_get_hw_counter_width, .get_event_desc = pfm_mont_get_event_description, .get_cycle_event = pfm_mont_get_cycle_event, .get_inst_retired_event = pfm_mont_get_inst_retired, .get_num_event_masks = pfm_mont_get_num_event_masks, .get_event_mask_name = pfm_mont_get_event_mask_name, .get_event_mask_desc = pfm_mont_get_event_mask_desc, .get_event_mask_code = pfm_mont_get_event_mask_code }; papi-5.3.0/src/libpfm4/lib/pfmlib_power6.c0000600003276200002170000000435012247131124017752 0ustar ralphundrgrad/* * pfmlib_power6.c : IBM Power6 support * * Copyright (C) IBM Corporation, 2009. All rights reserved. * Contributed by Corey Ashford (cjashfor@us.ibm.com) * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_power_priv.h" #include "events/power6_events.h" static int pfm_power6_detect(void* this) { if (__is_processor(PV_POWER6)) return PFM_SUCCESS; return PFM_ERR_NOTSUPP; } pfmlib_pmu_t power6_support={ .desc = "POWER6", .name = "power6", .pmu = PFM_PMU_POWER6, .pme_count = LIBPFM_ARRAY_SIZE(power6_pe), .type = PFM_PMU_TYPE_CORE, .num_cntrs = 4, .num_fixed_cntrs = 2, .max_encoding = 1, .pe = power6_pe, .pmu_detect = pfm_power6_detect, .get_event_encoding[PFM_OS_NONE] = pfm_gen_powerpc_get_encoding, PFMLIB_ENCODE_PERF(pfm_gen_powerpc_get_perf_encoding), PFMLIB_VALID_PERF_PATTRS(pfm_gen_powerpc_perf_validate_pattrs), .get_event_first = pfm_gen_powerpc_get_event_first, .get_event_next = pfm_gen_powerpc_get_event_next, .event_is_valid = pfm_gen_powerpc_event_is_valid, .validate_table = pfm_gen_powerpc_validate_table, .get_event_info = pfm_gen_powerpc_get_event_info, .get_event_attr_info = pfm_gen_powerpc_get_event_attr_info, }; papi-5.3.0/src/libpfm4/lib/pfmlib_amd64_fam12h.c0000600003276200002170000000527212247131124020605 0ustar ralphundrgrad/* * pfmlib_amd64_fam12h.c : AMD64 Family 12h * * Copyright (c) 2011 University of Tennessee * Contributed by Vince Weaver * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_amd64_priv.h" #include "events/amd64_events_fam12h.h" #define DEFINE_FAM12H_REV(d, n, r, pmuid) \ static int \ pfm_amd64_fam12h_##n##_detect(void *this) \ { \ int ret; \ ret = pfm_amd64_detect(this); \ if (ret != PFM_SUCCESS) \ return ret; \ ret = pfm_amd64_cfg.revision; \ return ret == pmuid ? PFM_SUCCESS : PFM_ERR_NOTSUPP; \ } \ pfmlib_pmu_t amd64_fam12h_##n##_support={ \ .desc = "AMD64 Fam12h "#d, \ .name = "amd64_fam12h_"#n, \ .pmu = pmuid, \ .pmu_rev = r, \ .pme_count = LIBPFM_ARRAY_SIZE(amd64_fam12h_pe),\ .type = PFM_PMU_TYPE_CORE, \ .supported_plm = AMD64_FAM10H_PLM, \ .num_cntrs = 4, \ .max_encoding = 1, \ .pe = amd64_fam12h_pe, \ .atdesc = amd64_mods, \ .flags = PFMLIB_PMU_FL_RAW_UMASK, \ \ .pmu_detect = pfm_amd64_fam12h_##n##_detect,\ .get_event_encoding[PFM_OS_NONE] = pfm_amd64_get_encoding,\ PFMLIB_ENCODE_PERF(pfm_amd64_get_perf_encoding), \ .get_event_first = pfm_amd64_get_event_first, \ .get_event_next = pfm_amd64_get_event_next, \ .event_is_valid = pfm_amd64_event_is_valid, \ .validate_table = pfm_amd64_validate_table, \ .get_event_info = pfm_amd64_get_event_info, \ .get_event_attr_info = pfm_amd64_get_event_attr_info,\ PFMLIB_VALID_PERF_PATTRS(pfm_amd64_perf_validate_pattrs),\ .get_event_nattrs = pfm_amd64_get_event_nattrs, \ } DEFINE_FAM12H_REV(Llano, llano, 0, PFM_PMU_AMD64_FAM12H_LLANO); papi-5.3.0/src/libpfm4/lib/pfmlib_gen_ia64.c0000600003276200002170000003250612247131124020130 0ustar ralphundrgrad/* * pfmlib_gen_ia64.c : support default architected IA-64 PMU features * * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include "pfmlib_priv.h" /* library private */ #include "pfmlib_priv_ia64.h" /* architecture private */ #define PMU_GEN_IA64_MAX_COUNTERS 4 /* * number of architected events */ #define PME_GEN_COUNT 2 /* * Description of the PMC register mappings use by * this module (as reported in pfmlib_reg_t.reg_num): * * 0 -> PMC0 * 1 -> PMC1 * n -> PMCn */ #define PFMLIB_GEN_IA64_PMC_BASE 0 /* * generic event as described by architecture */ typedef struct { unsigned long pme_code:8; /* major event code */ unsigned long pme_ig:56; /* ignored */ } pme_gen_ia64_code_t; /* * union of all possible entry codes. All encodings must fit in 64bit */ typedef union { unsigned long pme_vcode; pme_gen_ia64_code_t pme_gen_code; } pme_gen_ia64_entry_code_t; /* * entry in the event table (one table per implementation) */ typedef struct pme_entry { char *pme_name; pme_gen_ia64_entry_code_t pme_entry_code; /* event code */ pfmlib_regmask_t pme_counters; /* counter bitmask */ } pme_gen_ia64_entry_t; /* let's define some handy shortcuts ! */ #define pmc_plm pmc_gen_count_reg.pmc_plm #define pmc_ev pmc_gen_count_reg.pmc_ev #define pmc_oi pmc_gen_count_reg.pmc_oi #define pmc_pm pmc_gen_count_reg.pmc_pm #define pmc_es pmc_gen_count_reg.pmc_es /* * this table is patched by initialization code */ static pme_gen_ia64_entry_t generic_pe[PME_GEN_COUNT]={ #define PME_IA64_GEN_CPU_CYCLES 0 { "CPU_CYCLES", }, #define PME_IA64_GEN_INST_RETIRED 1 { "IA64_INST_RETIRED", }, }; static int pfm_gen_ia64_counter_width; static int pfm_gen_ia64_counters; static pfmlib_regmask_t pfm_gen_ia64_impl_pmcs; static pfmlib_regmask_t pfm_gen_ia64_impl_pmds; /* * Description of the PMC register mappings use by * this module (as reported in pfmlib_reg_t.reg_num): * * 0 -> PMC0 * 1 -> PMC1 * n -> PMCn * We do not use a mapping table, instead we make up the * values on the fly given the base. */ #define PFMLIB_GEN_IA64_PMC_BASE 0 /* * convert text range (e.g. 4-15 18 12-26) into actual bitmask * range argument is modified */ static int parse_counter_range(char *range, pfmlib_regmask_t *b) { char *p, c; int start, end; if (range[strlen(range)-1] == '\n') range[strlen(range)-1] = '\0'; while(range) { p = range; while (*p && *p != ' ' && *p != '-') p++; if (*p == '\0') break; c = *p; *p = '\0'; start = atoi(range); range = p+1; if (c == '-') { p++; while (*p && *p != ' ' && *p != '-') p++; if (*p) *p++ = '\0'; end = atoi(range); range = p; } else { end = start; } if (end >= PFMLIB_REG_MAX|| start >= PFMLIB_REG_MAX) goto invalid; for (; start <= end; start++) pfm_regmask_set(b, start); } return 0; invalid: fprintf(stderr, "%s.%s : bitmask too small need %d bits\n", __FILE__, __FUNCTION__, start); return -1; } static int pfm_gen_ia64_initialize(void) { FILE *fp; char *p; char buffer[64]; int matches = 0; fp = fopen("/proc/pal/cpu0/perfmon_info", "r"); if (fp == NULL) return PFMLIB_ERR_NOTSUPP; for (;;) { p = fgets(buffer, sizeof(buffer)-1, fp); if (p == NULL) break; if ((p = strchr(buffer, ':')) == NULL) break; *p = '\0'; if (!strncmp("Counter width", buffer, 13)) { pfm_gen_ia64_counter_width = atoi(p+2); matches++; continue; } if (!strncmp("PMC/PMD pairs", buffer, 13)) { pfm_gen_ia64_counters = atoi(p+2); matches++; continue; } if (!strncmp("Cycle event number", buffer, 18)) { generic_pe[0].pme_entry_code.pme_vcode = atoi(p+2); matches++; continue; } if (!strncmp("Retired event number", buffer, 20)) { generic_pe[1].pme_entry_code.pme_vcode = atoi(p+2); matches++; continue; } if (!strncmp("Cycles count capable", buffer, 20)) { if (parse_counter_range(p+2, &generic_pe[0].pme_counters) == -1) return -1; matches++; continue; } if (!strncmp("Retired bundles count capable", buffer, 29)) { if (parse_counter_range(p+2, &generic_pe[1].pme_counters) == -1) return -1; matches++; continue; } if (!strncmp("Implemented PMC", buffer, 15)) { if (parse_counter_range(p+2, &pfm_gen_ia64_impl_pmcs) == -1) return -1; matches++; continue; } if (!strncmp("Implemented PMD", buffer, 15)) { if (parse_counter_range(p+2, &pfm_gen_ia64_impl_pmds) == -1) return -1; matches++; continue; } } pfm_regmask_weight(&pfm_gen_ia64_impl_pmcs, &generic_ia64_support.pmc_count); pfm_regmask_weight(&pfm_gen_ia64_impl_pmds, &generic_ia64_support.pmd_count); fclose(fp); return matches == 8 ? PFMLIB_SUCCESS : PFMLIB_ERR_NOTSUPP; } static void pfm_gen_ia64_forced_initialize(void) { unsigned int i; pfm_gen_ia64_counter_width = 47; pfm_gen_ia64_counters = 4; generic_pe[0].pme_entry_code.pme_vcode = 18; generic_pe[1].pme_entry_code.pme_vcode = 8; memset(&pfm_gen_ia64_impl_pmcs, 0, sizeof(pfmlib_regmask_t)); memset(&pfm_gen_ia64_impl_pmds, 0, sizeof(pfmlib_regmask_t)); for(i=0; i < 8; i++) pfm_regmask_set(&pfm_gen_ia64_impl_pmcs, i); for(i=4; i < 8; i++) pfm_regmask_set(&pfm_gen_ia64_impl_pmds, i); memset(&generic_pe[0].pme_counters, 0, sizeof(pfmlib_regmask_t)); memset(&generic_pe[1].pme_counters, 0, sizeof(pfmlib_regmask_t)); for(i=4; i < 8; i++) { pfm_regmask_set(&generic_pe[0].pme_counters, i); pfm_regmask_set(&generic_pe[1].pme_counters, i); } generic_ia64_support.pmc_count = 8; generic_ia64_support.pmd_count = 4; generic_ia64_support.num_cnt = 4; } static int pfm_gen_ia64_detect(void) { /* PMU is architected, so guaranteed to be present */ return PFMLIB_SUCCESS; } static int pfm_gen_ia64_init(void) { if (forced_pmu != PFMLIB_NO_PMU) { pfm_gen_ia64_forced_initialize(); } else if (pfm_gen_ia64_initialize() == -1) return PFMLIB_ERR_NOTSUPP; return PFMLIB_SUCCESS; } static int valid_assign(unsigned int *as, pfmlib_regmask_t *r_pmcs, unsigned int cnt) { unsigned int i; for(i=0; i < cnt; i++) { if (as[i]==0) return 0; /* * take care of restricted PMC registers */ if (pfm_regmask_isset(r_pmcs, as[i])) return 0; } return 1; } /* * Automatically dispatch events to corresponding counters following constraints. * Upon return the pfarg_reg_t structure is ready to be submitted to kernel */ static int pfm_gen_ia64_dispatch_counters(pfmlib_input_param_t *inp, pfmlib_output_param_t *outp) { #define has_counter(e,b) (pfm_regmask_isset(&generic_pe[e].pme_counters, b) ? b : 0) unsigned int max_l0, max_l1, max_l2, max_l3; unsigned int assign[PMU_GEN_IA64_MAX_COUNTERS]; pfm_gen_ia64_pmc_reg_t reg; pfmlib_event_t *e; pfmlib_reg_t *pc, *pd; pfmlib_regmask_t *r_pmcs; unsigned int i,j,k,l; unsigned int cnt; e = inp->pfp_events; pc = outp->pfp_pmcs; pd = outp->pfp_pmds; cnt = inp->pfp_event_count; r_pmcs = &inp->pfp_unavail_pmcs; if (cnt > PMU_GEN_IA64_MAX_COUNTERS) return PFMLIB_ERR_TOOMANY; max_l0 = PMU_GEN_IA64_FIRST_COUNTER + PMU_GEN_IA64_MAX_COUNTERS; max_l1 = PMU_GEN_IA64_FIRST_COUNTER + PMU_GEN_IA64_MAX_COUNTERS*(cnt>1); max_l2 = PMU_GEN_IA64_FIRST_COUNTER + PMU_GEN_IA64_MAX_COUNTERS*(cnt>2); max_l3 = PMU_GEN_IA64_FIRST_COUNTER + PMU_GEN_IA64_MAX_COUNTERS*(cnt>3); if (PFMLIB_DEBUG()) { DPRINT("max_l0=%u max_l1=%u max_l2=%u max_l3=%u\n", max_l0, max_l1, max_l2, max_l3); } /* * This code needs fixing. It is not very pretty and * won't handle more than 4 counters if more become * available ! * For now, worst case in the loop nest: 4! (factorial) */ for (i=PMU_GEN_IA64_FIRST_COUNTER; i < max_l0; i++) { assign[0]= has_counter(e[0].event,i); if (max_l1 == PMU_GEN_IA64_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt)) goto done; for (j=PMU_GEN_IA64_FIRST_COUNTER; j < max_l1; j++) { if (j == i) continue; assign[1] = has_counter(e[1].event,j); if (max_l2 == PMU_GEN_IA64_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt)) goto done; for (k=PMU_GEN_IA64_FIRST_COUNTER; k < max_l2; k++) { if(k == i || k == j) continue; assign[2] = has_counter(e[2].event,k); if (max_l3 == PMU_GEN_IA64_FIRST_COUNTER && valid_assign(assign, r_pmcs, cnt)) goto done; for (l=PMU_GEN_IA64_FIRST_COUNTER; l < max_l3; l++) { if(l == i || l == j || l == k) continue; assign[3] = has_counter(e[3].event,l); if (valid_assign(assign, r_pmcs, cnt)) goto done; } } } } /* we cannot satisfy the constraints */ return PFMLIB_ERR_NOASSIGN; done: memset(pc, 0, cnt*sizeof(pfmlib_reg_t)); memset(pd, 0, cnt*sizeof(pfmlib_reg_t)); for (j=0; j < cnt ; j++ ) { reg.pmc_val = 0; /* clear all */ /* if not specified per event, then use default (could be zero: measure nothing) */ reg.pmc_plm = e[j].plm ? e[j].plm: inp->pfp_dfl_plm; reg.pmc_oi = 1; /* overflow interrupt */ reg.pmc_pm = inp->pfp_flags & PFMLIB_PFP_SYSTEMWIDE? 1 : 0; reg.pmc_es = generic_pe[e[j].event].pme_entry_code.pme_gen_code.pme_code; pc[j].reg_num = assign[j]; pc[j].reg_value = reg.pmc_val; pc[j].reg_addr = PFMLIB_GEN_IA64_PMC_BASE+j; pd[j].reg_num = assign[j]; pd[j].reg_addr = assign[j]; __pfm_vbprintf("[PMC%u(pmc%u)=0x%lx,es=0x%02x,plm=%d pm=%d] %s\n", assign[j], assign[j], reg.pmc_val, reg.pmc_es,reg.pmc_plm, reg.pmc_pm, generic_pe[e[j].event].pme_name); __pfm_vbprintf("[PMD%u(pmd%u)]\n", pd[j].reg_num, pd[j].reg_num); } /* number of PMC programmed */ outp->pfp_pmc_count = cnt; outp->pfp_pmd_count = cnt; return PFMLIB_SUCCESS; } static int pfm_gen_ia64_dispatch_events(pfmlib_input_param_t *inp, void *dummy1, pfmlib_output_param_t *outp, void *dummy2) { return pfm_gen_ia64_dispatch_counters(inp, outp); } static int pfm_gen_ia64_get_event_code(unsigned int i, unsigned int cnt, int *code) { if (cnt != PFMLIB_CNT_FIRST && (cnt < 4 || cnt > 7)) return PFMLIB_ERR_INVAL; *code = (int)generic_pe[i].pme_entry_code.pme_gen_code.pme_code; return PFMLIB_SUCCESS; } static char * pfm_gen_ia64_get_event_name(unsigned int i) { return generic_pe[i].pme_name; } static void pfm_gen_ia64_get_event_counters(unsigned int j, pfmlib_regmask_t *counters) { unsigned int i; memset(counters, 0, sizeof(*counters)); for(i=0; i < pfm_gen_ia64_counters; i++) { if (pfm_regmask_isset(&generic_pe[j].pme_counters, i)) pfm_regmask_set(counters, i); } } static void pfm_gen_ia64_get_impl_pmcs(pfmlib_regmask_t *impl_pmcs) { *impl_pmcs = pfm_gen_ia64_impl_pmcs; } static void pfm_gen_ia64_get_impl_pmds(pfmlib_regmask_t *impl_pmds) { *impl_pmds = pfm_gen_ia64_impl_pmds; } static void pfm_gen_ia64_get_impl_counters(pfmlib_regmask_t *impl_counters) { unsigned int i = 0; /* pmd4-pmd7 */ for(i=4; i < 8; i++) pfm_regmask_set(impl_counters, i); } static void pfm_gen_ia64_get_hw_counter_width(unsigned int *width) { *width = pfm_gen_ia64_counter_width; } static int pfm_gen_ia64_get_event_desc(unsigned int ev, char **str) { switch(ev) { case PME_IA64_GEN_CPU_CYCLES: *str = strdup("CPU cycles"); break; case PME_IA64_GEN_INST_RETIRED: *str = strdup("IA-64 instructions retired"); break; default: *str = NULL; } return PFMLIB_SUCCESS; } static int pfm_gen_ia64_get_cycle_event(pfmlib_event_t *e) { e->event = PME_IA64_GEN_CPU_CYCLES; return PFMLIB_SUCCESS; } static int pfm_gen_ia64_get_inst_retired(pfmlib_event_t *e) { e->event = PME_IA64_GEN_INST_RETIRED; return PFMLIB_SUCCESS; } pfm_pmu_support_t generic_ia64_support={ .pmu_name ="IA-64", .pmu_type = PFMLIB_GEN_IA64_PMU, .pme_count = PME_GEN_COUNT, .pmc_count = 4+4, .pmd_count = PMU_GEN_IA64_MAX_COUNTERS, .num_cnt = PMU_GEN_IA64_MAX_COUNTERS, .get_event_code = pfm_gen_ia64_get_event_code, .get_event_name = pfm_gen_ia64_get_event_name, .get_event_counters = pfm_gen_ia64_get_event_counters, .dispatch_events = pfm_gen_ia64_dispatch_events, .pmu_detect = pfm_gen_ia64_detect, .pmu_init = pfm_gen_ia64_init, .get_impl_pmcs = pfm_gen_ia64_get_impl_pmcs, .get_impl_pmds = pfm_gen_ia64_get_impl_pmds, .get_impl_counters = pfm_gen_ia64_get_impl_counters, .get_hw_counter_width = pfm_gen_ia64_get_hw_counter_width, .get_event_desc = pfm_gen_ia64_get_event_desc, .get_cycle_event = pfm_gen_ia64_get_cycle_event, .get_inst_retired_event = pfm_gen_ia64_get_inst_retired }; papi-5.3.0/src/libpfm4/lib/pfmlib_intel_nhm.c0000600003276200002170000001521712247131124020511 0ustar ralphundrgrad/* * pfmlib_intel_nhm.c : Intel Nehalem core PMU * * Copyright (c) 2008 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * Nehalem PMU = architectural perfmon v3 + OFFCORE + PEBS v2 + LBR */ /* private headers */ #include "pfmlib_priv.h" #include "pfmlib_intel_x86_priv.h" #if 0 static int pfm_nhm_lbr_encode(void *this, pfmlib_event_desc_t *e, uint64_t *codes, int *count, pfmlib_perf_attr_t *attrs); static int pfm_nhm_offcore_encode(void *this, pfmlib_event_desc_t *e, uint64_t *codes, int *count, pfmlib_perf_attr_t *attrs); #endif #include "events/intel_nhm_events.h" static int pfm_nhm_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 26: case 30: case 31: break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_nhm_ex_detect(void *this) { int ret; ret = pfm_intel_x86_detect(); if (ret != PFM_SUCCESS) return ret; if (pfm_intel_x86_cfg.family != 6) return PFM_ERR_NOTSUPP; switch(pfm_intel_x86_cfg.model) { case 46: break; default: return PFM_ERR_NOTSUPP; } return PFM_SUCCESS; } static int pfm_nhm_init(void *this) { pfm_intel_x86_cfg.arch_version = 3; return PFM_SUCCESS; } /* * the following function implement the model * specific API directly available to user */ static const char *data_src_encodings[]={ /* 0 */ "unknown L3 cache miss", /* 1 */ "minimal latency core cache hit. Request was satisfied by L1 data cache", /* 2 */ "pending core cache HIT. Outstanding core cache miss to same cacheline address already underway", /* 3 */ "data request satisfied by the L2", /* 4 */ "L3 HIT. Local or remote home request that hit L3 in the uncore with no coherency actions required (snooping)", /* 5 */ "L3 HIT. Local or remote home request that hit L3 and was serviced by another core with a cross core snoop where no modified copy was found (clean)", /* 6 */ "L3 HIT. Local or remote home request that hit L3 and was serviced by another core with a cross core snoop where modified copies were found (HITM)", /* 7 */ "reserved", /* 8 */ "L3 MISS. Local homed request that missed L3 and was serviced by forwarded data following a cross package snoop where no modified copy was found (remote home requests are not counted)", /* 9 */ "reserved", /* 10 */ "L3 MISS. Local homed request that missed L3 and was serviced by local DRAM (go to shared state)", /* 11 */ "L3 MISS. Remote homed request that missed L3 and was serviced by remote DRAM (go to shared state)", /* 12 */ "L3 MISS. Local homed request that missed L3 and was serviced by local DRAM (go to exclusive state)", /* 13 */ "L3 MISS. Remote homed request that missed L3 and was serviced by remote DRAM (go to exclusive state)", /* 14 */ "reserved", /* 15 */ "request to uncacheable memory" }; /* * return data source encoding based on index in val * To be used with PEBS load latency filtering to decode * source of the load miss */ const char * pfm_nhm_data_src_desc(int val) { if (val > 15 || val < 0) return NULL; return data_src_encodings[val]; } #if 0 static int pfm_nhm_lbr_encode(void *this, pfmlib_event_desc_t *e, uint64_t *codes, int *count, pfmlib_perf_attr_t *attrs) { return PFM_ERR_NOTSUPP; } static int pfm_nhm_offcore_encode(void *this, pfmlib_event_desc_t *e, uint64_t *codes, int *count, pfmlib_perf_attr_t *attrs) { return PFM_ERR_NOTSUPP; } #endif pfmlib_pmu_t intel_nhm_support={ .desc = "Intel Nehalem", .name = "nhm", .pmu = PFM_PMU_INTEL_NHM, .pme_count = LIBPFM_ARRAY_SIZE(intel_nhm_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 4, .num_fixed_cntrs = 3, .max_encoding = 2, /* because of OFFCORE_RESPONSE */ .pe = intel_nhm_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_nhm_detect, .pmu_init = pfm_nhm_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; pfmlib_pmu_t intel_nhm_ex_support={ .desc = "Intel Nehalem EX", .name = "nhm_ex", .pmu = PFM_PMU_INTEL_NHM_EX, .pme_count = LIBPFM_ARRAY_SIZE(intel_nhm_pe), .type = PFM_PMU_TYPE_CORE, .supported_plm = INTEL_X86_PLM, .num_cntrs = 4, .num_fixed_cntrs = 3, .max_encoding = 2, /* because of OFFCORE_RESPONSE */ .pe = intel_nhm_pe, .atdesc = intel_x86_mods, .flags = PFMLIB_PMU_FL_RAW_UMASK | INTEL_X86_PMU_FL_ECMASK, .pmu_detect = pfm_nhm_ex_detect, .pmu_init = pfm_nhm_init, .get_event_encoding[PFM_OS_NONE] = pfm_intel_x86_get_encoding, PFMLIB_ENCODE_PERF(pfm_intel_x86_get_perf_encoding), .get_event_first = pfm_intel_x86_get_event_first, .get_event_next = pfm_intel_x86_get_event_next, .event_is_valid = pfm_intel_x86_event_is_valid, .validate_table = pfm_intel_x86_validate_table, .get_event_info = pfm_intel_x86_get_event_info, .get_event_attr_info = pfm_intel_x86_get_event_attr_info, PFMLIB_VALID_PERF_PATTRS(pfm_intel_x86_perf_validate_pattrs), .get_event_nattrs = pfm_intel_x86_get_event_nattrs, .can_auto_encode = pfm_intel_x86_can_auto_encode, }; papi-5.3.0/src/libpfm4/COPYING0000600003276200002170000000217612247131123015323 0ustar ralphundrgradAll other files are published under the following license: Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. papi-5.3.0/src/freebsd-lock.h0000600003276200002170000000010312247131121015426 0ustar ralphundrgrad #define _papi_hwd_lock(a) { ; } #define _papi_hwd_unlock(a) { ; } papi-5.3.0/src/papi_vector.c0000600003276200002170000002335112247131124015411 0ustar ralphundrgrad/* * File: papi_vector.c * Author: Kevin London * london@cs.utk.edu * Mods: Haihang You * you@cs.utk.edu * Mods: * */ #include "papi.h" #include "papi_internal.h" #include "papi_vector.h" #include "papi_memory.h" #include #ifdef _AIX /* needed because the get_virt_usec() code uses a hardware context */ /* which is a pmapi definition on AIX. */ #include #endif void _vectors_error( ) { SUBDBG( "function is not implemented in the component!\n" ); exit( PAPI_ECMP ); } int vec_int_ok_dummy( ) { return PAPI_OK; } int vec_int_one_dummy( ) { return 1; } int vec_int_dummy( ) { return PAPI_ECMP; } void * vec_void_star_dummy( ) { return NULL; } void vec_void_dummy( ) { return; } long long vec_long_long_dummy( ) { return PAPI_ECMP; } long long vec_long_long_context_dummy( hwd_context_t *ignored ) { (void) ignored; return PAPI_ECMP; } char * vec_char_star_dummy( ) { return NULL; } long vec_long_dummy( ) { return PAPI_ECMP; } long long vec_virt_cycles(void) { return ((long long) _papi_os_vector.get_virt_usec() * _papi_hwi_system_info.hw_info.cpu_max_mhz); } long long vec_real_nsec_dummy(void) { return ((long long) _papi_os_vector.get_real_usec() * 1000); } long long vec_virt_nsec_dummy(void) { return ((long long) _papi_os_vector.get_virt_usec() * 1000); } int _papi_hwi_innoculate_vector( papi_vector_t * v ) { if ( !v ) return ( PAPI_EINVAL ); /* component function pointers */ if ( !v->dispatch_timer ) v->dispatch_timer = ( void ( * )( int, hwd_siginfo_t *, void * ) ) vec_void_dummy; if ( !v->get_overflow_address ) v->get_overflow_address = ( void *( * )( int, char *, int ) ) vec_void_star_dummy; if ( !v->start ) v->start = ( int ( * )( hwd_context_t *, hwd_control_state_t * ) ) vec_int_dummy; if ( !v->stop ) v->stop = ( int ( * )( hwd_context_t *, hwd_control_state_t * ) ) vec_int_dummy; if ( !v->read ) v->read = ( int ( * ) ( hwd_context_t *, hwd_control_state_t *, long long **, int ) ) vec_int_dummy; if ( !v->reset ) v->reset = ( int ( * )( hwd_context_t *, hwd_control_state_t * ) ) vec_int_dummy; if ( !v->write ) v->write = ( int ( * )( hwd_context_t *, hwd_control_state_t *, long long[] ) ) vec_int_dummy; if ( !v->cleanup_eventset ) v->cleanup_eventset = ( int ( * )( hwd_control_state_t * ) ) vec_int_ok_dummy; if ( !v->stop_profiling ) v->stop_profiling = ( int ( * )( ThreadInfo_t *, EventSetInfo_t * ) ) vec_int_dummy; if ( !v->init_component ) v->init_component = ( int ( * )( int ) ) vec_int_ok_dummy; if ( !v->init_thread ) v->init_thread = ( int ( * )( hwd_context_t * ) ) vec_int_ok_dummy; if ( !v->init_control_state ) v->init_control_state = ( int ( * )( hwd_control_state_t * ptr ) ) vec_void_dummy; if ( !v->update_control_state ) v->update_control_state = ( int ( * ) ( hwd_control_state_t *, NativeInfo_t *, int, hwd_context_t * ) ) vec_int_dummy; if ( !v->ctl ) v->ctl = ( int ( * )( hwd_context_t *, int, _papi_int_option_t * ) ) vec_int_dummy; if ( !v->set_overflow ) v->set_overflow = ( int ( * )( EventSetInfo_t *, int, int ) ) vec_int_dummy; if ( !v->set_profile ) v->set_profile = ( int ( * )( EventSetInfo_t *, int, int ) ) vec_int_dummy; if ( !v->set_domain ) v->set_domain = ( int ( * )( hwd_control_state_t *, int ) ) vec_int_dummy; if ( !v->ntv_enum_events ) v->ntv_enum_events = ( int ( * )( unsigned int *, int ) ) vec_int_dummy; if ( !v->ntv_name_to_code ) v->ntv_name_to_code = ( int ( * )( char *, unsigned int * ) ) vec_int_dummy; if ( !v->ntv_code_to_name ) v->ntv_code_to_name = ( int ( * )( unsigned int, char *, int ) ) vec_int_dummy; if ( !v->ntv_code_to_descr ) v->ntv_code_to_descr = ( int ( * )( unsigned int, char *, int ) ) vec_int_ok_dummy; if ( !v->ntv_code_to_bits ) v->ntv_code_to_bits = ( int ( * )( unsigned int, hwd_register_t * ) ) vec_int_dummy; if ( !v->ntv_code_to_info ) v->ntv_code_to_info = ( int ( * )( unsigned int, PAPI_event_info_t * ) ) vec_int_dummy; if ( !v->allocate_registers ) v->allocate_registers = ( int ( * )( EventSetInfo_t * ) ) vec_int_ok_dummy; if ( !v->shutdown_thread ) v->shutdown_thread = ( int ( * )( hwd_context_t * ) ) vec_int_dummy; if ( !v->shutdown_component ) v->shutdown_component = ( int ( * )( void ) ) vec_int_ok_dummy; if ( !v->user ) v->user = ( int ( * )( int, void *, void * ) ) vec_int_dummy; return PAPI_OK; } int _papi_hwi_innoculate_os_vector( papi_os_vector_t * v ) { if ( !v ) return ( PAPI_EINVAL ); if ( !v->get_real_cycles ) v->get_real_cycles = vec_long_long_dummy; if ( !v->get_real_usec ) v->get_real_usec = vec_long_long_dummy; if ( !v->get_real_nsec ) v->get_real_nsec = vec_real_nsec_dummy; if ( !v->get_virt_cycles ) v->get_virt_cycles = vec_virt_cycles; if ( !v->get_virt_usec ) v->get_virt_usec = vec_long_long_dummy; if ( !v->get_virt_nsec ) v->get_virt_nsec = vec_virt_nsec_dummy; if ( !v->update_shlib_info ) v->update_shlib_info = ( int ( * )( papi_mdi_t * ) ) vec_int_dummy; if ( !v->get_system_info ) v->get_system_info = ( int ( * )( ) ) vec_int_dummy; if ( !v->get_memory_info ) v->get_memory_info = ( int ( * )( PAPI_hw_info_t *, int ) ) vec_int_dummy; if ( !v->get_dmem_info ) v->get_dmem_info = ( int ( * )( PAPI_dmem_info_t * ) ) vec_int_dummy; return PAPI_OK; } /* not used? debug only? */ #if 0 static void * vector_find_dummy( void *func, char **buf ) { void *ptr = NULL; if ( vec_int_ok_dummy == ( int ( * )( ) ) func ) { ptr = ( void * ) vec_int_ok_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_int_ok_dummy" ); } else if ( vec_int_one_dummy == ( int ( * )( ) ) func ) { ptr = ( void * ) vec_int_one_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_int_one_dummy" ); } else if ( vec_int_dummy == ( int ( * )( ) ) func ) { ptr = ( void * ) vec_int_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_int_dummy" ); } else if ( vec_void_dummy == ( void ( * )( ) ) func ) { ptr = ( void * ) vec_void_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_void_dummy" ); } else if ( vec_void_star_dummy == ( void *( * )( ) ) func ) { ptr = ( void * ) vec_void_star_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_void_star_dummy" ); } else if ( vec_long_long_dummy == ( long long ( * )( ) ) func ) { ptr = ( void * ) vec_long_long_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_long_long_dummy" ); } else if ( vec_char_star_dummy == ( char *( * )( ) ) func ) { ptr = ( void * ) vec_char_star_dummy; *buf = papi_strdup( "vec_char_star_dummy" ); } else if ( vec_long_dummy == ( long ( * )( ) ) func ) { ptr = ( void * ) vec_long_dummy; if ( buf != NULL ) *buf = papi_strdup( "vec_long_dummy" ); } else { ptr = NULL; } return ( ptr ); } static void vector_print_routine( void *func, char *fname, int pfunc ) { void *ptr = NULL; char *buf = NULL; ptr = vector_find_dummy( func, &buf ); if ( ptr ) { printf( "DUMMY: %s is mapped to %s.\n", fname, buf ); papi_free( buf ); } else if ( ( !ptr && pfunc ) ) printf( "function: %s is mapped to %p.\n", fname, func ); } static void vector_print_table( papi_vector_t * v, int print_func ) { if ( !v ) return; vector_print_routine( ( void * ) v->dispatch_timer, "_papi_hwd_dispatch_timer", print_func ); vector_print_routine( ( void * ) v->get_overflow_address, "_papi_hwd_get_overflow_address", print_func ); vector_print_routine( ( void * ) v->start, "_papi_hwd_start", print_func ); vector_print_routine( ( void * ) v->stop, "_papi_hwd_stop", print_func ); vector_print_routine( ( void * ) v->read, "_papi_hwd_read", print_func ); vector_print_routine( ( void * ) v->reset, "_papi_hwd_reset", print_func ); vector_print_routine( ( void * ) v->write, "_papi_hwd_write", print_func ); vector_print_routine( ( void * ) v->cleanup_eventset, "_papi_hwd_cleanup_eventset", print_func ); vector_print_routine( ( void * ) v->stop_profiling, "_papi_hwd_stop_profiling", print_func ); vector_print_routine( ( void * ) v->init_component, "_papi_hwd_init_component", print_func ); vector_print_routine( ( void * ) v->init_thread, "_papi_hwd_init_thread", print_func ); vector_print_routine( ( void * ) v->init_control_state, "_papi_hwd_init_control_state", print_func ); vector_print_routine( ( void * ) v->ctl, "_papi_hwd_ctl", print_func ); vector_print_routine( ( void * ) v->set_overflow, "_papi_hwd_set_overflow", print_func ); vector_print_routine( ( void * ) v->set_profile, "_papi_hwd_set_profile", print_func ); vector_print_routine( ( void * ) v->set_domain, "_papi_hwd_set_domain", print_func ); vector_print_routine( ( void * ) v->ntv_enum_events, "_papi_hwd_ntv_enum_events", print_func ); vector_print_routine( ( void * ) v->ntv_name_to_code, "_papi_hwd_ntv_name_to_code", print_func ); vector_print_routine( ( void * ) v->ntv_code_to_name, "_papi_hwd_ntv_code_to_name", print_func ); vector_print_routine( ( void * ) v->ntv_code_to_descr, "_papi_hwd_ntv_code_to_descr", print_func ); vector_print_routine( ( void * ) v->ntv_code_to_bits, "_papi_hwd_ntv_code_to_bits", print_func ); vector_print_routine( ( void * ) v->ntv_code_to_info, "_papi_hwd_ntv_code_to_info", print_func ); vector_print_routine( ( void * ) v->allocate_registers, "_papi_hwd_allocate_registers", print_func ); vector_print_routine( ( void * ) v->shutdown_thread, "_papi_hwd_shutdown_thread", print_func ); vector_print_routine( ( void * ) v->shutdown_component, "_papi_hwd_shutdown_component", print_func ); vector_print_routine( ( void * ) v->user, "_papi_hwd_user", print_func ); } #endif papi-5.3.0/src/linux-bgq-memory.c0000600003276200002170000000360012247131124016305 0ustar ralphundrgrad/* * File: linux-bgq-memory.c * CVS: $Id$ * Author: Heike Jagode * jagode@eecs.utk.edu * Mods: * */ #include "papi.h" #include "papi_internal.h" #include "linux-bgq.h" #ifdef __LINUX__ #include #endif #include /* * Prototypes... */ int init_bgq( PAPI_mh_info_t * pMem_Info ); // inline void cpuid(unsigned int *, unsigned int *,unsigned int *,unsigned int *); /* * Get Memory Information * * Fills in memory information - effectively set to all 0x00's */ extern int _bgq_get_memory_info( PAPI_hw_info_t * pHwInfo, int pCPU_Type ) { int retval = 0; switch ( pCPU_Type ) { default: //fprintf(stderr,"Default CPU type in %s (%d)\n",__FUNCTION__,__LINE__); retval = init_bgq( &pHwInfo->mem_hierarchy ); break; } return retval; } /* * Get DMem Information for BG/Q * * NOTE: Currently, all values set to -1 */ extern int _bgq_get_dmem_info( PAPI_dmem_info_t * pDmemInfo ) { // pid_t xPID = getpid(); // prpsinfo_t xInfo; // char xFile[256]; // int xFD; // sprintf(xFile, "/proc/%05d", xPID); // if ((fd = open(xFile, O_RDONLY)) < 0) { // SUBDBG("PAPI_get_dmem_info can't open /proc/%d\n", xPID); // return (PAPI_ESYS); // } // if (ioctl(xFD, PIOCPSINFO, &xInfo) < 0) { // return (PAPI_ESYS); // } // close(xFD); pDmemInfo->size = PAPI_EINVAL; pDmemInfo->resident = PAPI_EINVAL; pDmemInfo->high_water_mark = PAPI_EINVAL; pDmemInfo->shared = PAPI_EINVAL; pDmemInfo->text = PAPI_EINVAL; pDmemInfo->library = PAPI_EINVAL; pDmemInfo->heap = PAPI_EINVAL; pDmemInfo->locked = PAPI_EINVAL; pDmemInfo->stack = PAPI_EINVAL; pDmemInfo->pagesize = PAPI_EINVAL; return PAPI_OK; } /* * Cache configuration for BG/Q */ int init_bgq( PAPI_mh_info_t * pMem_Info ) { memset( pMem_Info, 0x0, sizeof ( *pMem_Info ) ); //fprintf(stderr,"mem_info not est up [%s (%d)]\n",__FUNCTION__,__LINE__); return PAPI_OK; } papi-5.3.0/src/mb.h0000600003276200002170000000356612247131124013507 0ustar ralphundrgrad#ifndef _MB_H #define _MB_H /* These definitions are not yet in distros, so I have cut and pasted just the needed definitions in here */ #ifdef __powerpc__ #define rmb() asm volatile ("sync" : : : "memory") #elif defined (__s390__) #define rmb() asm volatile("bcr 15,0" ::: "memory") #elif defined (__sh__) #if defined(__SH4A__) || defined(__SH5__) #define rmb() asm volatile("synco" ::: "memory") #else #define rmb() asm volatile("" ::: "memory") #endif #elif defined (__hppa__) #define rmb() asm volatile("" ::: "memory") #elif defined (__sparc__) #define rmb() asm volatile("":::"memory") #elif defined (__alpha__) #define rmb() asm volatile("mb" ::: "memory") #elif defined(__ia64__) #define rmb() asm volatile ("mf" ::: "memory") #elif defined(__arm__) /* * Use the __kuser_memory_barrier helper in the CPU helper page. See * arch/arm/kernel/entry-armv.S in the kernel source for details. */ #define rmb() ((void(*)(void))0xffff0fa0)() #elif defined(__aarch64__) #define rmb() asm volatile("dmb ld" ::: "memory") #elif defined(__mips__) #define rmb() asm volatile( \ ".set mips2\n\t" \ "sync\n\t" \ ".set mips0" \ : /* no output */ \ : /* no input */ \ : "memory") #elif defined(__i386__) #define rmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory") #elif defined(__x86_64) #if defined(__KNC__) #define rmb() __sync_synchronize() #else #define rmb() asm volatile("lfence":::"memory") #endif #else #error Need to define rmb for this architecture! #error See the kernel source directory: tools/perf/perf.h file #endif #endif papi-5.3.0/src/config.h.in0000600003276200002170000000773412247131121014761 0ustar ralphundrgrad/* config.h.in. Generated from configure.in by autoheader. */ /* cpu type */ #undef CPU /* Define to 1 if you have the `clock_gettime' function. */ #undef HAVE_CLOCK_GETTIME /* POSIX 1b realtime clock */ #undef HAVE_CLOCK_GETTIME_REALTIME /* POSIX 1b per-thread clock */ #undef HAVE_CLOCK_GETTIME_THREAD /* Native access to a hardware cycle counter */ #undef HAVE_CYCLE /* Define to 1 if you have the header file. */ #undef HAVE_C_ASM_H /* This platform has the ffsll() function */ #undef HAVE_FFSLL /* Define to 1 if you have the `gethrtime' function. */ #undef HAVE_GETHRTIME /* Full gettid function */ #undef HAVE_GETTID /* Normal gettimeofday timer */ #undef HAVE_GETTIMEOFDAY /* Define if hrtime_t is defined in */ #undef HAVE_HRTIME_T /* Define to 1 if you have the header file. */ #undef HAVE_INTRINSICS_H /* Define to 1 if you have the header file. */ #undef HAVE_INTTYPES_H /* Define to 1 if you have the `cpc' library (-lcpc). */ #undef HAVE_LIBCPC /* perfctr header file */ #undef HAVE_LIBPERFCTR_H /* Define to 1 if you have the `mach_absolute_time' function. */ #undef HAVE_MACH_ABSOLUTE_TIME /* Define to 1 if you have the header file. */ #undef HAVE_MACH_MACH_TIME_H /* Define to 1 if you have the header file. */ #undef HAVE_MEMORY_H /* Altix memory mapped global cycle counter */ #undef HAVE_MMTIMER /* Define to 1 if you have the header file. */ #undef HAVE_PERFMON_PFMLIB_H /* Montecito headers */ #undef HAVE_PERFMON_PFMLIB_MONTECITO_H /* Working per thread getrusage */ #undef HAVE_PER_THREAD_GETRUSAGE /* Working per thread timer */ #undef HAVE_PER_THREAD_TIMES /* new pfmlib_output_param_t */ #undef HAVE_PFMLIB_OUTPUT_PFP_PMD_COUNT /* event description function */ #undef HAVE_PFM_GET_EVENT_DESCRIPTION /* new pfm_msg_t */ #undef HAVE_PFM_MSG_TYPE /* old reg_evt_idx */ #undef HAVE_PFM_REG_EVT_IDX /* Define to 1 if you have the `read_real_time' function. */ #undef HAVE_READ_REAL_TIME /* Define to 1 if you have the header file. */ #undef HAVE_STDINT_H /* Define to 1 if you have the header file. */ #undef HAVE_STDLIB_H /* Define to 1 if you have the header file. */ #undef HAVE_STRINGS_H /* Define to 1 if you have the header file. */ #undef HAVE_STRING_H /* gettid syscall function */ #undef HAVE_SYSCALL_GETTID /* Define to 1 if you have the header file. */ #undef HAVE_SYS_STAT_H /* Define to 1 if you have the header file. */ #undef HAVE_SYS_TIME_H /* Define to 1 if you have the header file. */ #undef HAVE_SYS_TYPES_H /* Keyword for per-thread variables */ #undef HAVE_THREAD_LOCAL_STORAGE /* Define to 1 if you have the `time_base_to_time' function. */ #undef HAVE_TIME_BASE_TO_TIME /* Define to 1 if you have the header file. */ #undef HAVE_UNISTD_H /* Define for _rtc() intrinsic. */ #undef HAVE__RTC /* Define if _rtc() is not found. */ #undef NO_RTC_INTRINSIC /* Define to the address where bug reports for this package should be sent. */ #undef PACKAGE_BUGREPORT /* Define to the full name of this package. */ #undef PACKAGE_NAME /* Define to the full name and version of this package. */ #undef PACKAGE_STRING /* Define to the one symbol short name of this package. */ #undef PACKAGE_TARNAME /* Define to the version of this package. */ #undef PACKAGE_VERSION /* Define to 1 if you have the ANSI C header files. */ #undef STDC_HEADERS /* Define to 1 if you can safely include both and . */ #undef TIME_WITH_SYS_TIME /* Use the perfctr virtual TSC for per-thread times */ #undef USE_PERFCTR_PTTIMER /* Use /proc for per-thread times */ #undef USE_PROC_PTTIMER /* Enable GNU extensions on systems that have them. */ #ifndef _GNU_SOURCE # undef _GNU_SOURCE #endif /* Define to `__inline__' or `__inline' if that's what the C compiler calls it, or to nothing if 'inline' is not supported under any name. */ #ifndef __cplusplus #undef inline #endif papi-5.3.0/src/libpfm-3.y/0000700003276200002170000000000012247131123014603 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/ChangeLog0000600003276200002170000003343712247131122016370 0ustar ralphundrgrad2006-08-21 Stephane Eranian This file will not be updated anymore, Refer to SF.net CVS log for diff information 2006-07-10 Stephane Eranian * removed PFM_FL_X86_INSECURE because it is not needed anymore * removed perfmon_i386.h and perfmon_mips64.h because empty 2006-06-28 Stephane Eranian * added pfmsetup.c (Kevin Corry IBM) * fixed pfmsetup.c to correctly handle sampling format uuid 2006-06-28 Stephane Eranian * added libpfm_montecito.3 man page * updated libpfm_itanium2.3 man page * removed pfm_print_event_info() and related calls from library * removed unused pfmlib_mont_ipear_mode_t struct * remove etb_ds from Montecito ETB struct as it can only have one value * added showevtinfo.c example * added PFMLIB_ITA2_EVT_NO_SET to pfmlib_itanium2.h * added PFMLIB_MONT_EVT_NO_SET to pfmlib_montecito.h * replaced pfm_mont_get_event_caf() by pfm_mont_get_event_type() * added missing perfmon_compat.h from include install (Will Cohen) * fortify showreginfo.c for FC6 (Will Cohen) 2006-06-13 Stephane Eranian * added generic support or event umask (Kevin Corry from IBM) * changed detect_pmcs.c to use pfm-getinfo_evtsets() * updated all examples to use the new detect_unavailable_pmcs() * the examples require 2.6.17-rc6 to run 2006-05-22 Stephane Eranian * corrected architected IA-32 PMU detection code, e.g., PIC assembly * fixed counter width of IA-32 architected PMU to 32 * fixed definition of perfevtsel to 64-bit wide for IA-32 architected PMU 2006-05-11 Stephane Eranian * added support for IA-32 architected PMU as specified in the latest IA-32 architecure manuals. There is enough to support miinual functionalities on Core Duo/Solo processors * updated system call number to match those used with 2.6.17-rc4 * enhanced i386_p6 model detection code 2006-04-25 Stephane Eranian * updated pfmlib_gen_mips64.c with latest code from Phil Mucci * introduced get_event_code_counter() internal method to handle the fact that on smoe MPU (MIPS) an event may have a different value based on the counter it is assigned to. This is a superset of the previous get_event_code(). added PFMLIB_CNT_FIRST to ask for first value (or don't care) 2006-04-05 Stephane Eranian * added support for install_prefix in makefile * fixed broken ETB_EVENT (not report has ETB event) * added BRANCH_EVENT as alias to ETB_EVENT for Montecito * added support for unavailable PMC registers to pfm_dispatch_events() * added detect_pmcs.c, detect_pmcs.h in examples * updated all generic examples to use detect_unavail_pmcs() helper function * updated pfm_dispatch_events() man pages * cleanup PFMLIB_REGMASK_*, change to pfm_regmask_* * created a separate set of man pages for all pfm_regmask_* functions 2006-04-04 Stephane Eranian * fixed makefile in include to install perfmon_i386.h for x86_64 install (Will Cohen from Redhat) * install pfmlib_montecito.h on IA64 2006-04-05 Stephane Eranian * updated system call numbers to 2.6.17-rc1 * incorporated a type change for reg_value in pfmlib.h (Kevin Corry from IBM) 2006-03-22 Stephane Eranian * changed HT detection for PEBS examples 2006-03-07 Stephane Eranian * updated to 2.6.16-rc5 new perfmon code base support * added preliminary Montecito support * incorporated AMD provided event list for X86-64 (Ray Bryant) * renamed all GEN_X86_64 gen_x86_64 to amd_x86_64 * removed PFM_32BIT_ABI_64BIT_OS, ABI now supports ILP32,LP64 without special compilation 2006-01-16 Stephane Eranian * added PFM_32BIT_ABI_64BIT_OS to allow 32-bit compile (32-bit ABI) for a 64-bit OS * added C++ support to perfmon header files * added MIPS64 (5K,20K) support (provided by Phil Mucci) * restructured *_standalone.c examples * added pfm_get_event_code_counter() and man page * changed implementation of pfm_get_num_pm*() * remove non-sense example task_view.c * added support for MIPS in some examples 2006-01-09 Stephane Eranian * examples code cleanups * example support up to 2048 CPU (syst.c) * portable sampling examples support more than 64 PMDs 2005-12-15 Stephane Eranian * updated all examples to new pfm_create_context() prototype * fixed some type mismatch in pfmlib_itanium2.c * required for 2.6.15-rc5-git3 kernel patch 2005-10-18 Stephane Eranian * forced perfsel.en bit to 1 for X86-64 and i386/p6 * inverted reset mask to be more familiar in examples/showreginfo.c * updated P4 examples to force enable bit to 1 2005-09-28 Stephane Eranian * split p6/pentium M event tables. Pentium M adds a few more events and changes the semantic of some. * added smpl_standalone.c, notify_standalone.c and ia32/smpl_pebs.c * cleanup the examples some more * updated multiplex. to match structure of multiplex2.c * updated perfmon2 kernel headers to match 2.6.14-rc2-mm1 release * added man pages for libpfm_p6 and libpfm_x86_64 * fixed handling of edge field for P6 2005-08-01 Stephane Eranian * switch all examples in examples/dir to use the multi system call interface. * updated perfmon.h/perfmon_compat.h to latest kernel interface (multi syscall) 2004-06-24 Stephane Eranian * fixed Itanium2 events tables L2_FORCE_RECIRC_* and L2_L3ACCESS_* events can only be measured by PMC4 * fixed pfm_*_get_event_counters(). It would always return the counter mask for event index 0. 2004-06-24 Stephane Eranian * fixed pfm_print_event_info_*() because it would not print the PMC/PMD mask correctly * updated pfm_dispatch_*ear() for Itanium2 * updated pfm_dispatch_irange() for Itanium2 * updated pfm_ita2_print_info() * updated pfm_ita2_num_pmcs() and pfm_ita2_num_pmds() 2004-02-12 Stephane Eranian * fixed a bug in pfmlib_itanium2.c which cause measurements using opcode matching with an event different from IA64_TAGGED_INST_RETIRED* to return wrong results, i.e., opcode filter was ignored. 2003-11-21 Stephane Eranian * changed interface to pfm_get_impl_*() to use a cleaner definition for bitmasks. pfmlib_regmask_t is now a struct and applications must use accesor macros PFMLIB_REGMASK_*() * added pfm_get_num_pmcs(), pfm_get_num_pmds(), pfm_get_num_counters() * updated man pages to reflect changes * cleanup all examples to reflect bitmask changes 2003-10-24 Stephane Eranian * added reserved fields to the key pfmlib structure for future extensions (recompilation from beta required). 2003-10-24 Stephane Eranian * released beta of version 3.0 * some of the changes not reported by older entries: * removed freesmpl.c example * added ita2_btb.c, ita2_dear.c, ita_dear.c, multiplex.c * added task_attach.c, task_attach_timeout.c, task_smpl.c * added missing itanium2 events, mostly subevent combinations for SYLL_NOT_DISPERSED, EXTERN_DP_PINS_0_TO_3, and EXTERN_DP_PINS_4_TO_5 * got rid of pfm_get_first_event(), pfm_get_next_event(). First valid index is always 0, use pfm_get_num_events() to find last event index * renamed pfm_stop() to pfm_self_stop(), pfm_start() to pfm_self_start() * updated all examples to perfmon2 interface * added notify_self2.c, notify_self3.c examples * updated perfmon.h/perfmon_default_smpl.h to reflect latest perfmon-2 changes (2.6.0-test8) 2003-08-25 Stephane Eranian * allowed mulitple EAR/BTB events * really implemented the 4 different ways of programming EAR/BTB 2003-07-30 Stephane Eranian * updated all man pages to reflect changes for 3.0 * more cleanups in the examples to make all package compile without warning with ecc 2003-07-29 Stephane Eranian * fixed a limitation in the iod_table[] used if dispatch_drange(). Pure Opc mode is possible using the IBR/Opc mode. Reported by Geoff Kent at UIUC. * cleaned up all functions using a bitmask as arguments 2003-06-30 Stephane Eranian * added pfm_get_max_event_name_len() * unsigned vs. int cleanups * introduced pfm_*_pmc_reg_t and pfm_*_pmd_reg_t * cleaned up calls using bitmasks * renamed PMU_MAX_* to PFMLIB_MAX_* * got rid of PMU_FIRST_COUNTER * introduced pfmlib_counter_t * internal interface changes, renaming: pmu_name vs name * got rid of char **name and replaced with char *name, int maxlen * added pfm_start(), pfm_stop() as real functions * changed interface of pfm_dispatch_events to make input vs. output parameters more explicit * model-specific input/output to pfm_dispatch_event() now arguments instead of being linked from the generic argument. 2003-06-27 Stephane Eranian * added missing const to char arguments for pfm_find_event, pfm_find_event_byname, pfm_print_event_info. Suggestion by Hans * renamed pfp_pc to pfp_pmc * renamed pfp_pc_count to pfp_pmc_count 2003-06-11 Stephane Eranian * updated manuals to reflect library changes * updated all examples to match the new Linux/ia64 kernel interface (perfmon2). 2003-06-10 Stephane Eranian * fix pfmlib_itanium.c: dispatch_dear(), dispatch_iear() to setup EAR when there is an EAR event but no detailed setting in ita_param. * added pfm_ita_ear_mode_t to pfmlib_itanium.h * added pfm_ita_get_ear_mode() to pfmlib_itanium.h 2003-06-06 Stephane Eranian * add a generic call to return hardware counter width: pfm_get hw_counter_width() * updated perfmon.h to perfmon2 * added flag to itanium/itanium2 specific parameter to tell the library to ignore per-even qualifier constraints. see PFMLIB_ITA_FL_CNT_NO_QUALCHECK and PFMLIB_ITA2_FL_CNT_NO_QUALCHECK. 2003-05-06 Stephane Eranian * got rid of all connections to perfmon.h. the library is now fully self-contained. pfarg_reg_t has been replaced by pfmlib_reg_t. 2002-03-20 Stephane Eranian * fix %x vs. %lx for pmc8/9 in pfmlib_itanium.c and pfmlib_itanium2.c 2002-12-20 Stephane Eranian * added PFM_FL_EXCL_IDLE to perfmon.h 2002-12-18 Stephane Eranian * clear ig_ad, inv fields in PMC8,9 when no code range restriction is used. 2002-12-17 Stephane Eranian * update pfm_initialize.3 to clarify when this function needs to be called. 2002-12-10 Stephane Eranian * changed _SYS_PERFMON.h to _PERFMON_PERFMON.h 2002-12-06 Stephane Eranian * integrated Peter Chubb's Debian script fixes * fixed the Debian script to include the examples 2002-12-05 Stephane Eranian * added man pages for pfm_start() and pfm_stop() * release 2.0 beta for review 2002-12-04 Stephane Eranian * the pfmlib_param_t structure now contains the pmc array (pfp_pc[]) as well as a counter representing the number of valid entries written to pfp_pc[]. cleaned up all modules and headers to reflect changes. * added pfm_ita2_is_fine_mode() to test whether or not fine mode was used for code ranges. 2002-12-03 Stephane Eranian * removed pfm_ita_ism from pfmlib_ita_param_t * removed pfm_ita2_ism from pfmlib_ita2_param_t * added libpfm.3, libpfm_itanium.3, libpfm_itanium2.3 * enabled per-range privilege level mask in pfmlib_itanium.c and pfmlib_itanium2.c 2002-11-21 Stephane Eranian * added pfmlib_generic.h to cleanup pfmlib.h * dropped retry argument to pfm_find_event() * got rid of the pfm_find_byvcode*() interface (internal only) * cleanup up interface code is int not unsigned long * added man pages in docs/man for the generic library interface * moved the PMU specific handy shortcuts for register struct to module specific file. Avoid possible conflicts in applications using different PMU models in one source file. 2002-11-20 Stephane Eranian * separated the library, headers, examples from the pfmon tool * changed license of library to MIT-style license * set version number to 2.0 * added support to generate a shared version of libpfm * fix pfm_dispatch_opcm() to check for effective use of IA64_TAGGED_INST_IBRPX_PMCY before setting the bits in PMC15 (spotted by UIUC Impact Team). * cleaned up error messages in the examples * fix bug in pfm_ita2_print_info() which caused extra umask bits to be displayed for EAR. 2002-11-19 Stephane Eranian * added pfm_get_impl_counters() to library interface and PMU models * added missing support for pfm_get_impl_pmds(), pfm_get_impl_pmcs() to pfmlib_generic.c * created pfmlib_compiler.h to encapsulate inline assembly differences between compilers. * created pfmlib_compiler_priv.h to encapsulate the inline assembly differences for library private code. 2002-11-13 Stephane Eranian * fixed definition of pmc10 in pfmlib_itanium2.h to account for a layout difference between cache and TLB mode (spotted by UIUC Impact Team). Was causing problems with some latency values in IEAR cache mode. * fixed initialization of pmc10 in pfmlib_itanium2.c to reflect above change. 2002-10-14 Stephane Eranian * fixed impl_pmds[] in pfmlib_itanium.c and pfmlib_itanium2.c. PMD17 was missing. 2002-09-09 Stephane Eranian * updated include/perfmon/perfmon.h to include sampling period randomization. 2002-08-14 Stephane Eranian * fix bitfield length for pmc14_ita2_reg and pmd3_ita2_reg in pfmlib_itanium2.h (David Mosberger) papi-5.3.0/src/libpfm-3.y/docs/0000700003276200002170000000000012247131122015532 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/docs/Makefile0000600003276200002170000000610012247131122017171 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk ifeq ($(CONFIG_PFMLIB_ARCH_IA64),y) ARCH_MAN=libpfm_itanium.3 libpfm_itanium2.3 libpfm_montecito.3 endif ifeq ($(CONFIG_PFMLIB_ARCH_I386),y) ARCH_MAN=libpfm_p6.3 libpfm_core.3 libpfm_amd64.3 libpfm_atom.3 libpfm_nehalem.3 endif ifeq ($(CONFIG_PFMLIB_ARCH_X86_64),y) ARCH_MAN=libpfm_amd64.3 libpfm_core.3 libpfm_atom.3 libpfm_nehalem.3 endif ifeq ($(CONFIG_PFMLIB_ARCH_MIPS64),y) endif ifeq ($(CONFIG_PFMLIB_ARCH_POWERPC),y) ARCH_MAN=libpfm_powerpc.3 endif ifeq ($(CONFIG_PFMLIB_ARCH_CRAYXT),y) endif ifeq ($(CONFIG_PFMLIB_CELL),y) endif GEN_MAN= libpfm.3 pfm_dispatch_events.3 pfm_find_event.3 pfm_find_event_bycode.3 \ pfm_find_event_bycode_next.3 pfm_find_event_mask.3 pfm_find_full_event.3 \ pfm_force_pmu.3 pfm_get_cycle_event.3 pfm_get_event_code.3 pfm_get_event_code_counter.3 \ pfm_get_event_counters.3 pfm_get_event_description.3 pfm_get_event_mask_code.3 \ pfm_get_event_mask_description.3 pfm_get_event_mask_name.3 pfm_get_event_name.3 \ pfm_get_full_event_name.3 pfm_get_hw_counter_width.3 pfm_get_impl_counters.3 \ pfm_get_impl_pmcs.3 pfm_get_impl_pmds.3 pfm_get_inst_retired.3 pfm_get_max_event_name_len.3 \ pfm_get_num_counters.3 pfm_get_num_events.3 pfm_get_num_pmcs.3 \ pfm_get_num_pmds.3 pfm_get_pmu_name.3 pfm_get_pmu_name_bytype.3 \ pfm_get_pmu_type.3 pfm_get_version.3 pfm_initialize.3 \ pfm_list_supported_pmus.3 pfm_pmu_is_supported.3 pfm_regmask_and.3 \ pfm_regmask_clr.3 pfm_regmask_copy.3 pfm_regmask_eq.3 pfm_regmask_isset.3 \ pfm_regmask_or.3 pfm_regmask_set.3 pfm_regmask_weight.3 pfm_set_options.3 \ pfm_strerror.3 MAN=$(GEN_MAN) $(ARCH_MAN) install: -mkdir -p $(DESTDIR)$(MANDIR)/man3 ( cd man3; $(INSTALL) -m 644 $(MAN) $(DESTDIR)$(MANDIR)/man3 ) papi-5.3.0/src/libpfm-3.y/docs/man3/0000700003276200002170000000000012247131122016370 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/docs/man3/pfm_set_options.30000600003276200002170000000345412247131122021674 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_set_options \- set performance monitoring library debug options .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_set_options(pfmlib_options_t *"opt); .sp .SH DESCRIPTION This function can be called at any time to adjust the level of debug of the library. In both cases, extra output will be generated on standard error when the library gets called. This can be useful to figure out how the PMC registers are initialized for instance. .sp The opt argument to this function is a pointer to a .B pfmlib_options_t structure which is defined as follows: .sp .nf typedef struct { unsigned int pfm_debug:1; unsigned int pfm_verbose:1; } pfmlib_options_t; .fi .sp .sp Setting \fBpfm_debug\fR to 1 will enable debug messages whereas setting \fBpfm_verbose\fR will enable verbose messages. .SH ENVIRONMENT VARIABLES Setting library options with this function has lower priority than with environment variables. As such, the call to this function may not have any actual effects. A user can set the following environment variables to control verbosity and debug output: .TP .B LIBPFM_VERBOSE Enable verbose output. Value must be 0 or 1. When not set, verbosity level can be controlled with this function. .TP .B LIBPFM_DEBUG Enable debug output. Value must be 0 or 1. When not set, debug level can be controlled with this function. .LP .SH RETURN The function returns whether or not it was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .sp When environment variables exist, they take precedence and this function returns \fBPFMLIB_SUCCESS\fR. .SH ERRORS .TP .B PFMLIB_ERR_INVAL the argument is invalid, most likely a NULL pointer. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_num_events.30000600003276200002170000000003612247131122022341 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_version.30000600003276200002170000000202212247131122021640 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_get_version \- get performance monitoring library version .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_version(unsigned int *"version); .sp .SH DESCRIPTION This function can be called at any time to get the revision level of the library. The version is encoded into an unsigned integer and returned in the \fBversion\fR argument. A revision number is composed of two fields: a major number and a minor number. Both can be extracted from the returned argument using macros provided in the header file: .TP .B PFMLIB_MAJ_VERSION(v) returns the major number encoded in v. .TP .B PFMLIB_MIN_VERSION(v) returns the minor number encoded in v. .SH RETURN The function returns whether or not it was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .TP .B PFMLIB_ERR_INVAL the argument is invalid, most likely a NULL pointer. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_core.30000600003276200002170000000567012247131122020747 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2006" "" "Linux Programmer's Manual" .SH NAME libpfm_core - support for Intel Core processor family .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the Intel Core processor family, including the Core 2 Duo and Quad series. The interface is defined in \fBpfmlib_core.h\fR. It consists of a set of functions and structures which describe and allow access to the Intel Core processors specific PMU features. .sp When Intel Core processor specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Intel Core processors specific input arguments are described in the \fBpfmlib_core_input_param_t\fR structure. No output parameters are currently defined. The input parameters are defined as follows: .sp .nf typedef struct { unsigned int cnt_mask; unsigned int flags; } pfmlib_core_counter_t; typedef struct { unsigned int pebs_used; } pfmlib_core_pebs_t; typedef struct { pfmlib_core_counter_t pfp_core_counters[PMU_CORE_NUM_COUNTERS]; pfmlib_core_pebs_t pfp_core_pebs; uint64_t reserved[4]; } pfmlib_core_input_param_t; .fi .sp .sp The Intel Core processor provides a few additional per-event features for counters: thresholding, inversion, edge detection. They can be set using the \fBpfp_core_counters\fR data structure for each event. The \fBflags\fR field can be initialized with any combinations of the following values: .TP .B PFMLIB_CORE_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set .TP .B PFMLIB_CORE_SEL_EDGE Enables edge detection of events. .LP The \fBcnt_mask\fR field is used to set the event threshold. The value of the counter is incremented each time the number of occurrences per cycle of the event is greater or equal to the value of the field. Thus the event is modified to actually measure the number of qualifying cycles. When zero all occurrences are counted (this is the default). .sp .SH Support for Precise-Event Based Sampling (PEBS) The library can be used to setup the PMC registers when using PEBS. In this case, the \fBpfp_core_pebs\fR structure must be used and the \fBpebs_used\fR field must be set to 1. When using PEBS, it is not possible to use more than one event. .SH Support for Intel Core 2 Duo and Quad processors The Intel Core 2 Duo and Quad processors are based on the Intel Core micro-architecture. They implement the Intel architectural PMU and some extensions such as PEBS. They support all the architectural events and a lot more Core 2 specific events. The library auto-detects the processor and provides access to Core 2 events whenever possible. .LP .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_counters.30000600003276200002170000000003612247131122023221 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_code.30000600003276200002170000000003612247131122022271 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_copy.30000600003276200002170000000003312247131122021777 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_num_pmds.30000600003276200002170000000003512247131122021777 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_find_full_event.30000600003276200002170000000003212247131122022456 0ustar ralphundrgrad.so man3/pfm_find_event.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_max_event_name_len.30000600003276200002170000000003612247131122024002 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm.30000600003276200002170000001141312247131122017727 0ustar ralphundrgrad.TH LIBPFM 3 "March, 2008" "" "Linux Programmer's Manual" .SH NAME libpfm \- a helper library to program Hardware Performance Units (PMUs) .SH SYNOPSIS .nf .B #include .SH DESCRIPTION The libpfm library is a helper library which is used by applications to help program the Performance Monitoring Unit (PMU), i.e., the hardware performance counters of modern processors. It provides a generic and portable programming interface to help setup the PMU configuration registers given a list of events to measure. A diversity of PMU hardware is supported, a list can be found below under \fBSUPPORTED HARDWARE\fR. The library is primarily designed to be used in conjunction with the Perfmon2 Linux kernel interface. However, at its core, it is totally independent of that interface and could as well be used on other operating systems. It is important to realize that the library does not make the actual kernel calls to program the PMU, it simply helps applications figure out which PMU registers to use to measure certain events or access certain advanced PMU features. The library logically divides PMU registers into two categories. The performance monitoring data registers (PMD) are used to collect results, e.g., counts. The performance monitoring configuration registers (PMCS) are used to indicate what events to measure or what feature to enable. Programming the PMU consists in setting up the PMC registers and collecting the results in the PMD registers. The central piece of the library is the \fBpfm_dispatch_events\fR function. The number of PMC and PMD registers varies between architectures and CPU models. The association of PMC to PMD can also change. Moreover the number and encodings of events can also widely change. Finally, the structure of a PMC register can also change. All these factors make it quite difficult to write monitoring tools. This library is designed to simplify the programming of the PMC registers by hiding the complexity behind a simple interface. The library does this without limiting accessibility to model specific features by using a layered design. The library is structured in two layers. The common layer provides an interface that is shared across all PMU models. This layer is good enough to setup simple monitoring sessions which count occurrences of events. Then, there is a model-specific layer which gives access to the model-specific features. For instance, on Itanium, applications can use the library to setup the registers for the Branch Trace Buffer. Model-specific interfaces have the abbreviated PMU model name in their names. For instance, \fBpfm_ita2_get_event_umask()\fR is an Itanium2 (ita2) specific function. When the library is initialized, it automatically probes the host CPU and enables the right set of interfaces. The common interface is defined in the \fBpfmlib.h\fR header file. Model-specific interfaces are defined in model-specific header files. For instance, \fBpfmlib_amd64.h\fR provides the AMD64 interface. .SH ENVIRONMENT VARIABLES It is possible to enable certain debug output of the library using environment variables. The following variables are defined: .TP .B LIBPFM_VERBOSE Enable verbose output. Value must be 0 or 1. When not set, verbosity level can be controlled with this function. .TP .B LIBPFM_DEBUG Enable debug output. Value must be 0 or 1. When not set, debug level can be controlled with this function. .TP .B LIBPFM_DEBUG_STDOUT Redirect verbose and debug output to the standard output file descriptor (stdout). By default, the output is directed to the standard error file descriptor (stderr). .sp Alternatively, it is possible to control verbosity and debug output using the \fBpfm_set_options\fR function. .LP .SH SUPPORTED HARDWARE .nf libpfm_amd64(3) AMD64 processors K8 and Barcelona (families 0Fh and 10h) libpfm_core(3) Intel Core processor family libpfm_atom(3) Intel Atom processor family libpfm_itanium(3) Intel Itanium libpfm_itanium2(3) Intel Itanium 2 libpfm_montecito(3) Intel dual-core Itanium 2 9000 (Montecito) libpfm_p6(3) P6 processor family including the Pentium M processor libpfm_powerpc(3) IBM PowerPC and POWER processor families (PPC970(FX,GX), PPC970MP POWER4, POWER4+, POWER5, POWER5+, and POWER6) .fi .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP .SH SEE ALSO libpfm(3), libpfm_amd64(3), libpfm_core(3), libpfm_itanium2(3), libpfm_itanium(3), libpfm_montecito(3), libpfm_p6(3), libpfm_powerpc(3). .nf pfm_dispatch_events(3), pfm_find_event(3), pfm_set_options(3), pfm_get_cycle_event(3), pfm_get_event_name(3), pfm_get_impl_pmcs(3), pfm_get_pmu_name(3), pfm_get_version(3), pfm_initialize(3), pfm_regmask_set(3), pfm_set_options(3), pfm_strerror(3). .fi .sp Examples shipped with the library papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_and.30000600003276200002170000000003312247131122021567 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_pmu_is_supported.30000600003276200002170000000003412247131122022716 0ustar ralphundrgrad.so man3/pfm_get_pmu_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_clr.30000600003276200002170000000003312247131122021605 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_itanium.30000600003276200002170000004616412247131122021470 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME libpfm_itanium - support for Itanium specific PMU features .SH SYNOPSIS .nf .B #include .B #include .sp .BI "int pfm_ita_is_ear(unsigned int " i ");" .BI "int pfm_ita_is_dear(unsigned int " i ");" .BI "int pfm_ita_is_dear_tlb(unsigned int " i ");" .BI "int pfm_ita_is_dear_cache(unsigned int " i ");" .BI "int pfm_ita_is_iear(unsigned int " i ");" .BI "int pfm_ita_is_iear_tlb(unsigned int " i ");" .BI "int pfm_ita_is_iear_cache(unsigned int " i ");" .BI "int pfm_ita_is_btb(unsigned int " i ");" .BI "int pfm_ita_support_opcm(unsigned int " i ");" .BI "int pfm_ita_support_iarr(unsigned int " i ");" .BI "int pfm_ita_support_darr(unsigned int " i ");" .BI "int pfm_ita_get_event_maxincr(unsigned int " i ", unsigned int *"maxincr ");" .BI "int pfm_ita_get_event_umask(unsigned int " i ", unsigned long *"umask ");" .sp .SH DESCRIPTION The libpfm library provides full support for all the Itanium specific features of the PMU. The interface is defined in \fBpfmlib_itanium.h\fR. It consists of a set of functions and structures which describe and allow access to the Itanium specific PMU features. .sp The Itanium specific functions presented here are mostly used to retrieve the characteristics of an event. Given a opaque event descriptor, obtained by the \fBpfm_find_event()\fR or its derivative functions, they return a boolean value indicating whether this event support this features or is of a particular kind. .sp The \fBpfm_ita_is_ear()\fR function returns 1 if the event designated by \fBi\fR corresponds to a EAR event, i.e., an Event Address Register type of events. Otherwise 0 is returned. For instance, \fBDATA_EAR_CACHE_LAT4\fR is an ear event, but \fBCPU_CYCLES\fR is not. It can be a data or instruction EAR event. .sp The \fBpfm_ita_is_dear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an Data EAR event. Otherwise 0 is returned. It can be a cache or TLB EAR event. .sp The \fBpfm_ita_is_dear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_ita_is_dear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_ita_is_iear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR event. Otherwise 0 is returned. It can be a cache or TLB instruction EAR event. .sp The \fBpfm_ita_is_iear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_ita_is_iear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_ita_support_opcm()\fR function returns 1 if the event designated by \fBi\fR supports opcode matching, i.e., can this event be measured accurately when opcode matching via PMC8/PMC9 is active. Not all events supports this feature. .sp The \fBpfm_ita_support_iarr()\fR function returns 1 if the event designated by \fBi\fR supports code address range restrictions, i.e., can this event be measured accurately when code range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_ita_support_darr()\fR function returns 1 if the event designated by \fBi\fR supports data address range restrictions, i.e., can this event be measured accurately when data range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_ita_get_event_maxincr()\fR function returns in \fBmaxincr\fR the maximum number of occurrences per cycle for the event designated by \fBi\fR. Certain Itanium events can occur more than once per cycle. When an event occurs more than once per cycle, the PMD counter will be incremented accordingly. It is possible to restrict measurement when event occur more than once per cycle. For instance, \fBNOPS_RETIRED\fR can happen up to 6 times/cycle which means that the threshold can be adjusted between 0 and 5, where 5 would mean that the PMD counter would be incremented by 1 only when the nop instruction is executed more than 5 times/cycle. This function returns the maximum number of occurrences of the event per cycle, and is the non-inclusive upper bound for the threshold to program in the PMC register. .sp The \fBpfm_ita_get_event_umask()\fR function returns in \fBumask\fR the umask for the event designated by \fBi\fR. .sp When the Itanium specific features are needed to support a measurement their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Itanium specific input arguments are described in the \fBpfmlib_ita_input_param_t\fR structure and the output parameters in \fBpfmlib_ita_output_param_t\fR. They are defined as follows: .sp .nf typedef enum { PFMLIB_ITA_ISM_BOTH=0, PFMLIB_ITA_ISM_IA32=1, PFMLIB_ITA_ISM_IA64=2 } pfmlib_ita_ism_t; typedef struct { unsigned int flags; unsigned int thres; pfmlib_ita_ism_t ism; } pfmlib_ita_counter_t; typedef struct { unsigned char opcm_used; unsigned long pmc_val; } pfmlib_ita_opcm_t; typedef struct { unsigned char btb_used; unsigned char btb_tar; unsigned char btb_tac; unsigned char btb_bac; unsigned char btb_tm; unsigned char btb_ptm; unsigned char btb_ppm; unsigned int btb_plm; } pfmlib_ita_btb_t; typedef enum { PFMLIB_ITA_EAR_CACHE_MODE= 0, PFMLIB_ITA_EAR_TLB_MODE = 1, } pfmlib_ita_ear_mode_t; typedef struct { unsigned char ear_used; pfmlib_ita_ear_mode_t ear_mode; pfmlib_ita_ism_t ear_ism; unsigned int ear_plm; unsigned long ear_umask; } pfmlib_ita_ear_t; typedef struct { unsigned int rr_plm; unsigned long rr_start; unsigned long rr_end; } pfmlib_ita_input_rr_desc_t; typedef struct { unsigned long rr_soff; unsigned long rr_eoff; } pfmlib_ita_output_rr_desc_t; typedef struct { unsigned int rr_flags; pfmlib_ita_input_rr_desc_t rr_limits[4]; unsigned char rr_used; } pfmlib_ita_input_rr_t; typedef struct { unsigned int rr_nbr_used; pfmlib_ita_output_rr_desc_t rr_infos[4]; pfmlib_reg_t rr_br[8]; } pfmlib_ita_output_rr_t; typedef struct { pfmlib_ita_counter_t pfp_ita_counters[PMU_ITA_NUM_COUNTERS]; unsigned long pfp_ita_flags; pfmlib_ita_opcm_t pfp_ita_pmc8; pfmlib_ita_opcm_t pfp_ita_pmc9; pfmlib_ita_ear_t pfp_ita_iear; pfmlib_ita_ear_t pfp_ita_dear; pfmlib_ita_btb_t pfp_ita_btb; pfmlib_ita_input_rr_t pfp_ita_drange; pfmlib_ita_input_rr_t pfp_ita_irange; } pfmlib_ita_input_param_t; typedef struct { pfmlib_ita_output_rr_t pfp_ita_drange; pfmlib_ita_output_rr_t pfp_ita_irange; } pfmlib_ita_output_param_t; .fi .sp .SH INSTRUCTION SET .sp The Itanium processor provides two additional per-event features for counters: thresholding and instruction set selection. They can be set using the \fBpfp_ita_counters\fR data structure for each event. The \fBism\fR field can be initialized as follows: .TP .B PFMLIB_ITA_ISM_BOTH The event will be monitored during IA-64 and IA-32 execution .TP .B PFMLIB_ITA_ISM_IA32 The event will only be monitored during IA-32 execution .TP .B PFMLIB_ITA_ISM_IA64 The event will only be monitored during IA-64 execution .sp .LP If \fBism\fR has a value of zero, it will default to PFMLIB_ITA_ISM_BOTH. .sp The \fBthres\fR indicates the threshold for the event. A threshold of \fBn\fR means that the counter will be incremented by one only when the event occurs more than \fBn\fR times per cycle. The \fBflags\fR field contains event-specific flags. The currently defined flags are: .sp .TP PFMLIB_ITA_FL_EVT_NO_QUALCHECK When this flag is set it indicates that the library should ignore the qualifiers constraints for this event. Qualifiers includes opcode matching, code and data range restrictions. When an event is marked as not supporting a particular qualifier, it usually means that it is ignored, i.e., the extra level of filtering is ignored. For instance, the CPU_CYCLES events does not support code range restrictions and by default the library will refuse to program it if range restriction is also requested. Using the flag will override the check and the call to the \fBpfm_dispatch_events()\fR function will succeed. In this case, CPU_CYCLES will be measured for the entire program and not just for the code range requested. For certain measurements this is perfectly acceptable as the range restriction will only be applied relevant to events which support it. Make sure you understand which events do not support certain qualifiers before using this flag. .LP .SH OPCODE MATCHING .sp The \fBpfp_ita_pmc8\fR and \fBpfp_ita_pmc9\fR fields of type \fBpfmlib_ita_opcm_t\fR contain the description of what to do with the opcode matchers. Itanium supports opcode matching via PMC8 and PMC9. When this feature is used the \fBopcm_used\fR field must be set to 1, otherwise it is ignored by the library. The \fBpmc_val\fR simply contains the raw value to store in PMC8 or PMC9. The library does not modify the values for PMC8 and PMC9, they will be stored in the \fBpfp_pmcs\fR table of the generic output parameters. .SH EVENT ADDRESS REGISTERS .sp The \fBpfp_ita_iear\fR field of type \fBpfmlib_ita_ear_t\fR describes what to do with instruction Event Address Registers (I-EARs). Again if this feature is used the \fBear_used\fR must be set to 1, otherwise it will be ignored by the library. The \fBear_mode\fR must be set to either one of \fBPFMLIB_ITA_EAR_TLB_MODE\fR, \fBPFMLIB_ITA_EAR_CACHE_MODE\fRto indicate the type of EAR to program. The umask to store into PMC10 must be in \fBear_umask\fR. The privilege level mask at which the I-EAR will be monitored must be set in \fBear_plm\fR which can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBear_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. Finally the instruction set for which to monitor is in \fBear_ism\fR and can be any one of \fBPFMLIB_ITA_ISM_BOTH\fR, \fBPFMLIB_ITA_ISM_IA32\fR, or \fBPFMLIB_ITA_ISM_IA64\fR. .sp The \fBpfp_ita_dear\fR field of type \fBpfmlib_ita_ear_t\fR describes what to do with data Event Address Registers (D-EARs). The description is identical to the I-EARs except that it applies to PMC11. In general, there are four different methods to program the EAR (data or instruction): .TP .B Method 1 There is an EAR event in the list of events to monitor and \fBear_used\fR is cleared. In this case the EAR will be programmed (PMC10 or PMC11) based on the information encoded in the event. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the type of EAR. .TP .B Method 2 There is an EAR event in the list of events to monitor and \fBear_used\fR is set. In this case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_ita_iear\fR or \fBpfp_ita_dear\fR structure because it contains more detailed information, such as privilege level and instruction set. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the type of EAR. .TP .B Method 3 There is no EAR event in the list of events to monitor and and \fBear_used\fR is cleared. In this case no EAR is programmed. .TP .B Method 4 There is no EAR event in the list of events to monitor and and \fBear_used\fR is set. In this case case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_ita_iear\fR or \fBpfp_ita_dear\fR structure. This is the free running mode for the EAR. .sp .SH BRANCH TRACE BUFFER The \fBpfp_ita_btb\fR of type \fBpfmlib_ita_btb_t\fR field is used to configure the Branch Trace Buffer (BTB). If the \fBbtb_used\fR is set, then the library will take the configuration into account, otherwise any BTB configuration will be ignored. The various fields in this structure provide means to filter out the kind of branches that gets recorded in the BTB. Each one represents an element of the branch architecture of the Itanium processor. Refer to the Itanium specific documentation for more details on the branch architecture. The fields are as follows: .TP .B btb_tar If the value of this field is 1, then branches predicted by the Target Address Register (TAR) predictions are captured. If 0 no branch predicted by the TAR is included. .TP .B btb_tac If this field is 1, then branches predicted by the Target Address Cache (TAC) are captured. If 0 no branch predicted by the TAC is included. .TP .B btb_bac If this field is 1, then branches predicted by the Branch Address Corrector (BAC) are captured. If 0 no branch predicted by the BAC is included. .TP .B btb_tm If this field is 0, then no branch is captured. If this field is 1, then non taken branches are captured. If this field is 2, then taken branches are captured. Finally if this field is 3 then all branches are captured. .TP .B btb_ptm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted target address are captured. If this field is 2, then branches with correctly predicted target address are captured. Finally if this field is 3 then all branches are captured regardless of target address prediction. .TP .B btb_ppm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted path (taken/non taken) are captured. If this field is 2, then branches with correctly predicted path are captured. Finally if this field is 3 then all branches are captured regardless of their path prediction. .TP .B btb_plm This is the privilege level mask at which the BTB captures branches. It can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBbtb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. .sp There are 4 methods to program the BTB and they are as follows: .sp .TP .B Method 1 The \fBBRANCH_EVENT\fR is in the list of events to monitor and \fBbtb_used\fR is cleared. In this case, the BTB will be configured (PMC12) to record ALL branches. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBBRANCH_EVENT\fR. .TP .B Method 2 The \fBBRANCH_EVENT\fR is in the list of events to monitor and \fBbtb_used\fR is set. In this case, the BTB will be configured (PMC12) using the information in the \fBpfp_ita_btb\fR structure. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBBRANCH_EVENT\fR. .TP .B Method 3 The \fBBRANCH_EVENT\fR is not in the list of events to monitor and \fBbtb_used\fR is set. In this case, the BTB will be configured (PMC12) using the information in the \fBpfp_ita_btb\fR structure. This is the free running mode for the BTB. .TP .B Method 4 The \fBBRANCH_EVENT\fR is not in the list of events to monitor and \fBbtb_used\fR is cleared. In this case, the BTB is not programmed. .sp .SH DATA AND CODE RANGE RESTRICTIONS The \fBpfp_ita_drange\fR and \fBpfp_ita_irange\fR fields control the range restrictions for the data and code respectively. The idea is that the application passes a set of ranges, each designated by a start and end address. Upon return from the \fBpfm_dispatch_events()\fR function, the application gets back the set of registers and their values that needs to be programmed via a kernel interface. Range restriction is implemented using the debug registers. There is a limited number of debug registers and they go in pair. With 8 data debug registers, a maximum of 4 distinct ranges can be specified. The same applies to code range restrictions. Moreover, there are some severe constraints on the alignment and size of the range. Given that the size range is specified using a bitmask, there can be situations where the actual range is larger than the requested range. The library will make the best effort to cover only what is requested. It will never cover less than what is requested. The algorithm uses more than one pair of debug registers to get a more precise range if necessary. Hence, up to the 4 pairs can be used to describe a single range. The library returns the start and end offsets of the actual range compared to the requested range. If range restriction is to be used, the \fBrr_used\fR field must be set to one, otherwise settings will be ignored. The ranges are described by the \fBpfmlib_ita_input_rr_t\fR structure. Up to 4 ranges can be defined. Each range is described in by a entry in \fBrr_limits\fR. The \fBpfmlib_ita_input_rr_desc_t\fR structure is defined as follows: .TP .B rr_plm The privilege level at which the range is active. It can be any combinations of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBrr_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used.The privilege level is only relevant for code ranges, data ranges ignores the setting. .TP .B rr_start This is the start address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .TP .B rr_end This is the end address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .LP .sp The library will provide the values for the debug registers as well as some information about the actual ranges in the output parameters and more precisely in the \fBpfmlib_ita_output_rr_t\fR structure for each range. The structure is defined as follows: .TP .B rr_nbr_used Contains the number of debug registers used to cover the range. This is necessarily an even number as debug registers always go in pair. The value of this field is between 0 and 7. .TP .B rr_br This table contains the list of debug registers necessary to cover the ranges. Each element is of type \fBpfmlib_reg_t\fR. The \fBreg_num\fR field contains the debug register index while \fBreg_value\fR contains the debug register value. Both the index and value must be copied into the kernel specific argument to program the debug registers. The library never programs them. .TP .B rr_infos Contains information about the ranges defined. Because of alignment restrictions, the actual range covered by the debug registers may be larger than the requested range. This table describe the differences between the requested and actual ranges expressed as offsets: .TP .B rr_soff Contains the start offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the beginning of the range. Otherwise it represents the number of byte by which the actual range precedes the requested range. .TP .B rr_eoff Contains the end offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the end of the range. Otherwise it represents the number of bytes by which the actual range exceeds the requested range. .sp .LP .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors when using the Itanium specific input and output arguments. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_montecito.30000600003276200002170000006167012247131122022022 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME libpfm_montecito - support for Itanium 2 9000 (Montecito) processor specific PMU features .SH SYNOPSIS .nf .B #include .B #include .sp .BI "int pfm_mont_is_ear(unsigned int " i ");" .BI "int pfm_mont_is_dear(unsigned int " i ");" .BI "int pfm_mont_is_dear_tlb(unsigned int " i ");" .BI "int pfm_mont_is_dear_cache(unsigned int " i ");" .BI "int pfm_mont_is_dear_alat(unsigned int " i ");" .BI "int pfm_mont_is_iear(unsigned int " i ");" .BI "int pfm_mont_is_iear_tlb(unsigned int " i ");" .BI "int pfm_mont_is_iear_cache(unsigned int " i ");" .BI "int pfm_mont_is_etb(unsigned int " i ");" .BI "int pfm_mont_support_opcm(unsigned int " i ");" .BI "int pfm_mont_support_iarr(unsigned int " i ");" .BI "int pfm_mont_support_darr(unsigned int " i ");" .BI "int pfm_mont_get_event_maxincr(unsigned int "i ", unsigned int *"maxincr ");" .BI "int pfm_mont_get_event_umask(unsigned int "i ", unsigned long *"umask ");" .BI "int pfm_mont_get_event_group(unsigned int "i ", int *"grp ");" .BI "int pfm_mont_get_event_set(unsigned int "i ", int *"set ");" .BI "int pfm_mont_get_event_type(unsigned int "i ", int *"type ");" .BI "int pfm_mont_get_ear_mode(unsigned int "i ", pfmlib_mont_ear_mode_t *"mode ");" .BI "int pfm_mont_irange_is_fine(pfmlib_output_param_t *"outp ", pfmlib_mont_output_param_t *"mod_out ");" .sp .SH DESCRIPTION The libpfm library provides full support for all the Itanium 2 900 (Montecito) processor specific features of the PMU. The interface is defined in \fBpfmlib_montecito.h\fR. It consists of a set of functions and structures which describe and allow access to the model specific PMU features. .sp The Itanium 2 900 (Montecito) processor specific functions presented here are mostly used to retrieve the characteristics of an event. Given a opaque event descriptor, obtained by the \fBpfm_find_event()\fR or its derivative functions, they return a boolean value indicating whether this event support this feature or is of a particular kind. .sp The \fBpfm_mont_is_ear()\fR function returns 1 if the event designated by \fBi\fR corresponds to a EAR event, i.e., an Event Address Register type of events. Otherwise 0 is returned. For instance, \fBDATA_EAR_CACHE_LAT4\fR is an ear event, but \fBCPU_OP_CYCLES_ALL\fR is not. It can be a data or instruction EAR event. .sp The \fBpfm_mont_is_dear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an Data EAR event. Otherwise 0 is returned. It can be a cache or TLB EAR event. .sp The \fBpfm_mont_is_dear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_mont_is_dear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_mont_is_dear_alat()\fR function returns 1 if the event designated by \fBi\fR corresponds to a ALAT EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_mont_is_iear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR event. Otherwise 0 is returned. It can be a cache or TLB instruction EAR event. .sp The \fBpfm_mont_is_iear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_mont_is_iear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_mont_support_opcm()\fR function returns 1 if the event designated by \fBi\fR supports opcode matching, i.e., can this event be measured accurately when opcode matching via PMC32/PMC34 is active. Not all events supports this feature. .sp The \fBpfm_mont_support_iarr()\fR function returns 1 if the event designated by \fBi\fR supports code address range restrictions, i.e., can this event be measured accurately when code range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_mont_support_darr()\fR function returns 1 if the event designated by \fBi\fR supports data address range restrictions, i.e., can this event be measured accurately when data range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_mont_get_event_maxincr()\fR function returns in \fBmaxincr\fR the maximum number of occurrences per cycle for the event designated by \fBi\fR. Certain Itanium 2 9000 (Montecito) events can occur more than once per cycle. When an event occurs more than once per cycle, the PMD counter will be incremented accordingly. It is possible to restrict measurement when event occur more than once per cycle. For instance, \fBNOPS_RETIRED\fR can happen up to 6 times/cycle which means that the threshold can be adjusted between 0 and 5, where 5 would mean that the PMD counter would be incremented by 1 only when the nop instruction is executed more than 5 times/cycle. This function returns the maximum number of occurrences of the event per cycle, and is the non-inclusive upper bound for the threshold to program in the PMC register. .sp The \fBpfm_mont_get_event_umask()\fR function returns in \fBumask\fR the umask for the event designated by \fBi\fR. .sp The \fBpfm_mont_get_event_grp()\fR function returns in \fBgrp\fR the group to which the event designated by \fBi\fR belongs. The notion of group is used for L1D and L2D cache events only. For all other events, a group is irrelevant and can be ignored. If the event is an L2D cache event then the value of \fBgrp\fR will be \fBPFMLIB_MONT_EVT_L2D_CACHE_GRP\fR. Similarly, if the event is an L1D cache event, the value of \fBgrp\fR will be \fBPFMLIB_MONT_EVT_L1D_CACHE_GRP\fR. In any other cases, the value of \fBgrp\fR will be \fBPFMLIB_MONT_EVT_NO_GRP\fR. .sp The \fBpfm_mont_get_event_set()\fR function returns in \fBset\fR the set to which the event designated by \fBi\fR belongs. A set is a subdivision of a group and is therefore only relevant for L1 and L2 cache events. An event can only belong to one group and one set. This partitioning of the cache events is due to some hardware limitations which impose some restrictions on events. For a given group, events from different sets cannot be measured at the same time. If the event does not belong to a group then the value of \fBset\fR is \fBPFMLIB_MONT_EVT_NO_SET\fR. .sp The \fBpfm_mont_get_event_type()\fR function returns in \fBtype\fR the type of the event designated by \fBi\fR belongs. The itanium2 9000 (Montecito) events can have any one of the following types: .sp .TP .B PFMLIB_MONT_EVT_ACTIVE The event can only occur when the processor thread that generated it is currently active .TP .B PFMLIB_MONT_EVT_FLOATING The event can be generated when the processor thread is inactive .TP .B PFMLIB_MONT_EVT_CAUSAL The event does not belong to a processor thread .TP .B PFMLIB_MONT_EVT_SELF_FLOATING Hybrid event. It is floating if measured with .me. If is causal otherwise. .LP .sp The \fBpfm_mont_irange_is_fine()\fR function returns 1 if the configuration description passed in \fBoutp\fR, the generic output parameters and \fBmod_out\fR, the Itanium 2 9000 (Montecito) specific output parameters, use code range restriction in fine mode. Otherwise the function returns 0. This function can only be called after a call to the \fBpfm_dispatch_events()\fR function returns successfully and had the data structures pointed to by \fBoutp\fR and \fBmod_out\fR as output parameters. .sp The \fBpfm_mont_get_event_ear_mode()\fR function returns in \fBmode\fR the EAR mode of the event designated by \fBi\fR. If the event is not an EAR event, then \fBPFMLIB_ERR_INVAL\fR is returned and mode is not updated. Otherwise mode can have the following values: .TP .B PFMLIB_MONT_EAR_TLB_MODE The event is an EAR TLB mode. It can be either data or instruction TLB EAR. .TP .B PFMLIB_MONT_EAR_CACHE_MODE The event is a cache EAR. It can be either data or instruction cache EAR. .TP .B PFMLIB_MONT_EAR_ALAT_MODE The event is an ALAT EAR. It can only be a data EAR event. .sp .LP When the Itanium 2 9000 (Montecito) specific features are needed to support a measurement their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Itanium 2 9000 (Montecito) specific input arguments are described in the \fBpfmlib_mont_input_param_t\fR structure and the output parameters in \fBpfmlib_mont_output_param_t\fR. They are defined as follows: .sp .nf typedef struct { unsigned int flags; unsigned int thres; } pfmlib_mont_counter_t; typedef struct { unsigned char opcm_used; unsigned char opcm_m; unsigned char opcm_i; unsigned char opcm_f; unsigned char opcm_b; unsigned long opcm_match; unsigned long opcm_mask; } pfmlib_mont_opcm_t; typedef struct { unsigned char etb_used; unsigned int etb_plm; unsigned char etb_ds; unsigned char etb_tm; unsigned char etb_ptm; unsigned char etb_ppm; unsigned char etb_brt; } pfmlib_mont_etb_t; typedef struct { unsigned char ipear_used; unsigned int ipear_plm; unsigned short ipear_delay; } pfmlib_mont_ipear_t; typedef enum { PFMLIB_MONT_EAR_CACHE_MODE= 0, PFMLIB_MONT_EAR_TLB_MODE = 1, PFMLIB_MONT_EAR_ALAT_MODE = 2 } pfmlib_mont_ear_mode_t; typedef struct { unsigned char ear_used; pfmlib_mont_ear_mode_t ear_mode; unsigned int ear_plm; unsigned long ear_umask; } pfmlib_mont_ear_t; typedef struct { unsigned int rr_plm; unsigned long rr_start; unsigned long rr_end; } pfmlib_mont_input_rr_desc_t; typedef struct { unsigned long rr_soff; unsigned long rr_eoff; } pfmlib_mont_output_rr_desc_t; typedef struct { unsigned int rr_flags; pfmlib_mont_input_rr_desc_t rr_limits[4]; unsigned char rr_used; } pfmlib_mont_input_rr_t; typedef struct { unsigned int rr_nbr_used; pfmlib_mont_output_rr_desc_t rr_infos[4]; pfmlib_reg_t rr_br[8]; } pfmlib_mont_output_rr_t; typedef struct { pfmlib_mont_counter_t pfp_mont_counters[PMU_MONT_NUM_COUNTERS]; unsigned long pfp_mont_flags; pfmlib_mont_opcm_t pfp_mont_opcm1; pfmlib_mont_opcm_t pfp_mont_opcm2; pfmlib_mont_ear_t pfp_mont_iear; pfmlib_mont_ear_t pfp_mont_dear; pfmlib_mont_ipear_t pfp_mont_ipear; pfmlib_mont_etb_t pfp_mont_etb; pfmlib_mont_input_rr_t pfp_mont_drange; pfmlib_mont_input_rr_t pfp_mont_irange; } pfmlib_mont_input_param_t; typedef struct { pfmlib_mont_output_rr_t pfp_mont_drange; pfmlib_mont_output_rr_t pfp_mont_irange; } pfmlib_mont_output_param_t; .fi .sp .SH PER-EVENT OPTIONS .sp The Itanium 2 9000 (Montecito) processor provides one per-event feature for counters: thresholding. It can be set using the \fBpfp_mont_counters\fR data structure for each event. .sp The \fBthres\fR indicates the threshold for the event. A threshold of \fBn\fR means that the counter will be incremented by one only when the event occurs more than \fBn\fR times per cycle. The \fBflags\fR field contains event-specific flags. The currently defined flags are: .sp .TP PFMLIB_MONT_FL_EVT_NO_QUALCHECK When this flag is set it indicates that the library should ignore the qualifiers constraints for this event. Qualifiers includes opcode matching, code and data range restrictions. When an event is marked as not supporting a particular qualifier, it usually means that it is ignored, i.e., the extra level of filtering is ignored. For instance, the FE_BUBBLE_ALL event does not support code range restrictions and by default the library will refuse to program it if range restriction is also requested. Using the flag will override the check and the call to the \fBpfm_dispatch_events()\fR function will succeed. In this case, FE_BUBBLE_ALL will be measured for the entire program and not just for the code range requested. For certain measurements this is perfectly acceptable as the range restriction will only be applied relevant to events which support it. Make sure you understand which events do not support certain qualifiers before using this flag. .LP .SH OPCODE MATCHING .sp The \fBpfp_mont_opcm1\fR and \fBpfp_mont_opcm2\fR fields of type \fBpfmlib_mont_opcm_t\fR contain the description of what to do with the opcode matchers. The Itanium 2 9000 (Montecito) processor supports opcode matching via PMC32 and PMC34. When this feature is used the \fBopcm_used\fR field must be set to 1, otherwise it is ignored by the library. The Itanium 2 9000 (Montecito) processor implements two full 41-bit opcode matchers. As such, it is possible to match all instructions individually. It is possible to match a single instruction or an instruction pattern based on opcode or slot type. The slots are specified in: .TP .B opcm_m Match when the instruction is in a M-slot (memory) .TP .B opcm_i Match when the instruction is in an I-slot (ALU) .TP .B opcm_f Match when the instruction is in an F-slot (FPU) .TP .B opcm_b Match when the instruction is in a B-slot (Branch) .sp .LP Any combinations of slot settings is supported. To match all slot types, simply set all fields to 1. .sp The 41-bit opcode is specified in \fBopcm_match\fR and a 41-bit mask is passed in \fBopcm_mask\fR. When a bit is set in \fBopcm_mask\fR the corresponding bit is ignored in \fBopcm_match\fR. .SH EVENT ADDRESS REGISTERS .sp The \fBpfp_mont_iear\fR field of type \fBpfmlib_mont_ear_t\fR describes what to do with instruction Event Address Registers (I-EARs). Again if this feature is used the \fBear_used\fR must be set to 1, otherwise it will be ignored by the library. The \fBear_mode\fR must be set to either one of \fBPFMLIB_MONT_EAR_TLB_MODE\fR, \fBPFMLIB_MONT_EAR_CACHE_MODE\fRto indicate the type of EAR to program. The umask to store into PMC10 must be in \fBear_umask\fR. The privilege level mask at which the I-EAR will be monitored must be set in \fBear_plm\fR which can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBear_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. .sp The \fBpfp_mont_dear\fR field of type \fBpfmlib_mont_ear_t\fR describes what to do with data Event Address Registers (D-EARs). The description is identical to the I-EARs except that it applies to PMC11 and that a \fBear_mode\fR of \fBPFMLIB_MONT_EAR_ALAT_MODE\fR is possible. In general, there are four different methods to program the EAR (data or instruction): .TP .B Method 1 There is an EAR event in the list of events to monitor and \fBear_used\fR is cleared. In this case the EAR will be programmed (PMC10 or PMC11) based on the information encoded in the event. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBDATA_EAR_EVENT\fR or \fBL1I_EAR_EVENTS\fR depending on the type of EAR. .TP .B Method 2 There is an EAR event in the list of events to monitor and \fBear_used\fR is set. In this case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_ita_iear\fR or \fBpfp_ita_dear\fR structure because it contains more detailed information, such as privilege level and instruction set. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of EAR. .TP .B Method 3 There is no EAR event in the list of events to monitor and and \fBear_used\fR is cleared. In this case no EAR is programmed. .TP .B Method 4 There is no EAR event in the list of events to monitor and and \fBear_used\fR is set. In this case case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_mont_iear\fR or \fBpfp_mont_dear\fR structure. This is the free running mode for the EAR. .sp .SH EXECUTION TRACE BUFFER The \fBpfp_mont_etb\fR of type \fBpfmlib_mont_etb_t\fR field is used to configure the Execution Trace Buffer (ETB). If the \fBetb_used\fR is set, then the library will take the configuration into account, otherwise any ETB configuration will be ignored. The various fields in this structure provide means to filter out the kind of changes in the control flow (branches, traps, rfi, ...) that get recorded in the ETB. Each one represents an element of the branch architecture of the Itanium 2 9000 (Montecito) processor. Refer to the Itanium 2 9000 (Montecito) specific documentation for more details on the branch architecture. The fields are as follows: .TP .B etb_tm If this field is 0, then no branch is captured. If this field is 1, then non taken branches are captured. If this field is 2, then taken branches are captured. Finally if this field is 3 then all branches are captured. .TP .B etb_ptm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted target address are captured. If this field is 2, then branches with correctly predicted target address are captured. Finally if this field is 3 then all branches are captured regardless of target address prediction. .TP .B etb_ppm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted path (taken/non taken) are captured. If this field is 2, then branches with correctly predicted path are captured. Finally if this field is 3 then all branches are captured regardless of their path prediction. .TP .B etb_brt If this field is 0, then no branch is captured. If this field is 1, then only IP-relative branches are captured. If this field is 2, then only return branches are captured. Finally if this field is 3 then only non-return indirect branches are captured. .TP .B etb_plm This is the privilege level mask at which the ETB captures branches. It can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBetb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. .sp There are 4 methods to program the ETB and they are as follows: .sp .TP .B Method 1 The \fBETB_EVENT\fR is in the list of event to monitor and \fBetb_used\fR is cleared. In this case, the ETB will be configured (PMC39) to record ALL branches. A counting monitor will be programmed to count \fBETB_EVENT\fR. .TP .B Method 2 The \fBETB_EVENT\fR is in the list of events to monitor and \fBetb_used\fR is set. In this case, the BTB will be configured (PMC39) using the information in the \fBpfp_mont_etb\fR structure. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBBRANCH_EVENT\fR. .TP .B Method 3 The \fBETB_EVENT\fR is not in the list of events to monitor and \fBetb_used\fR is set. In this case, the ETB will be configured (PMC39) using the information in the \fBpfp_mont_etb\fR structure. This is the free running mode for the ETB. .TP .B Method 4 The \fBETB_EVENT\fR is not in the list of events to monitor and \fBetb_used\fR is cleared. In this case, the ETB is not programmed. .SH DATA AND CODE RANGE RESTRICTIONS The \fBpfp_mont_drange\fR and \fBpfp_mont_irange\fR fields control the range restrictions for the data and code respectively. The idea is that the application passes a set of ranges, each designated by a start and end address. Upon return from the \fBpfm_dispatch_events()\fR function, the application gets back the set of registers and their values that needs to be programmed via a kernel interface. Range restriction is implemented using the debug registers. There is a limited number of debug registers and they go in pair. With 8 data debug registers, a maximum of 4 distinct ranges can be specified. The same applies to code range restrictions. Moreover, there are some severe constraints on the alignment and size of the ranges. Given that the size of a range is specified using a bitmask, there can be situations where the actual range is larger than the requested range. For code ranges, Itanium 2 9000 (Montecito) processor can use what is called a fine mode, where a range is designated using two pairs of code debug registers. In this mode, the bitmask is not used, the start and end addresses are directly specified. Not all code ranges qualify for fine mode, the size of the range must be 64KB or less and the range cannot cross a 64KB page boundary. The library will make a best effort in choosing the right mode for each range. For code ranges, it will try the fine mode first and will default to using the bitmask mode otherwise. Fine mode applies to all code debug registers or none, i.e., you cannot have a range using fine mode and another using the bitmask. The Itanium 2 9000 (Montecito) processor somehow limits the use of multiple pairs to accurately cover a code range. This can only be done for \fBIA64_INST_RETIRED\fR and even then, you need several events to collect the counts. For all other events, only one pair can be used, which leads to more inaccuracy due to approximation. Data ranges can used multiple debug register pairs to gain more accuracy. The library will never cover less than what is requested. The algorithm will use more than one pair of debug registers whenever possible to get a more precise range. Hence, up to the 4 pairs can be used to describe a single range. If range restriction is to be used, the \fBrr_used\fR field must be set to one, otherwise settings will be ignored. The ranges are described by the \fBpfmlib_mont_input_rr_t\fR structure. Up to 4 ranges can be defined. Each range is described in by a entry in \fBrr_limits\fR. Some flags for all ranges can be defined in \fBrr_flags\fR. Currently defined flags are: .sp .TP .B PFMLIB_MONT_RR_INV Inverse the code ranges. The qualifying events will be measurement when executing outside the specified ranges. .TP .B PFMLIB_MONT_RR_NO_FINE_MODE Force non fine mode for all code ranges (mostly for debug) .sp .LP The \fBpfmlib_mont_input_rr_desc_t\fR structure is defined as follows: .TP .B rr_plm The privilege level at which the range is active. It can be any combinations of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBetb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. The privilege level is only relevant for code ranges, data ranges ignores the setting. .TP .B rr_start This is the start address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .TP .B rr_end This is the end address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .sp .LP The library will provide the values for the debug registers as well as some information about the actual ranges in the output parameters and more precisely in the \fBpfmlib_mont_output_rr_t\fR structure for each range. The structure is defined as follows: .TP .B rr_nbr_used Contains the number of debug registers used to cover the range. This is necessarily an even number as debug registers always go in pair. The value of this field is between 0 and 7. .TP .B rr_br This table contains the list of debug registers necessary to cover the ranges. Each element is of type \fBpfmlib_reg_t\fR. The \fBreg_num\fR field contains the debug register index while \fBreg_value\fR contains the debug register value. Both the index and value must be copied into the kernel specific argument to program the debug registers. The library never programs them. .TP .B rr_infos Contains information about the ranges defined. Because of alignment restrictions, the actual range covered by the debug registers may be larger than the requested range. This table describe the differences between the requested and actual ranges expressed as offsets: .TP .B rr_soff Contains the start offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the beginning of the range. Otherwise it represents the number of byte by which the actual range precedes the requested range. .TP .B rr_eoff Contains the end offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the end of the range. Otherwise it represents the number of bytes by which the actual range exceeds the requested range. .sp .LP .SH IP EVENT CAPTURE (IP-EAR) The Execution Trace Buffer (ETB) can be configured to record the addresses of consecutive retiring instructions. In this case the ETB contains IP addresses and not branches related information. This feature cannot be used in conjunction with regular branch captures as described above. To active this feature the \fBipear_used\fR field of the \fBpfmlib_mont_ipear_t\fR must be set to 1. The other fields in this structure are used as follows: .sp .TP .B ipear_plm The privilege level of the instructions to capture. It can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBetb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. .TP .B ipear_delay The number of cycles by which to delay the freeze of the ETB after a PMU interrupt (which freeze the rest of counters). .LP .sp .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors when using the Itanium 2 9000 (Montecito) specific input and output arguments. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_weight.30000600003276200002170000000003312247131122022314 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_strerror.30000600003276200002170000000145212247131122021204 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_strerror \- return string describing error code .SH SYNOPSIS .nf .B #include .sp .BI "char *pfm_strerror(int "code); .sp .SH DESCRIPTION This function returns a string which describes the libpfm error value in \fBcode\fR. The string returned by the call must be considered as read only. The function must \fBonly\fR be used on libpfm calls. It is not designed to handle OS system call errors. .SH RETURN The function returns a pointer to the string describing the error code. If code is invalid then the default error message is returned. .SH ERRORS If the error code is invalid, then the function returns a pointer to a string which says "unknown error code". .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_find_event_bycode.30000600003276200002170000000003212247131122022761 0ustar ralphundrgrad.so man3/pfm_find_event.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_mask_name.30000600003276200002170000000003612247131122023312 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_mask_code.30000600003276200002170000000003612247131122023304 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_or.30000600003276200002170000000003312247131122021445 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_full_event_name.30000600003276200002170000000003612247131122023321 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_mask_description.30000600003276200002170000000003612247131122024715 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_pmu_name_bytype.30000600003276200002170000000003412247131122023351 0ustar ralphundrgrad.so man3/pfm_get_pmu_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_impl_pmds.30000600003276200002170000000003512247131122022141 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_find_event_bycode_next.30000600003276200002170000000003212247131122024017 0ustar ralphundrgrad.so man3/pfm_find_event.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_num_pmcs.30000600003276200002170000000003512247131122021776 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_eq.30000600003276200002170000000003312247131122021432 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_set.30000600003276200002170000000507012247131122021626 0ustar ralphundrgrad.TH LIBPFM 3 "Apr, 2006" "" "Linux Programmer's Manual" .SH NAME pfm_regmask_set, pfm_regmask_isset, pfm_regmask_clr, pfm_regmask_weight, pfm_regmask_eq, pfm_regmask_and, pfm_regmask_or, pfm_regmask_copy -\ operations on pfmlib_regmask_t bitmasks .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_regmask_isset(pfmlib_regmask_t *"mask ", unsigned int "b ");" .BI "int pfm_regmask_set(pfmlib_regmask_t *"mask ", unsigned int "b ");" .BI "int pfm_regmask_clr(pfmlib_regmask_t *"mask ", unsigned int "b ");" .BI "int pfm_regmask_weight(pfmlib_regmask_t *"mask ", unsigned int *"w ");" .BI "int pfm_regmask_eq(pfmlib_regmask_t *"mask1 ", pfmlib_regmask_t *"mask2 ");" .BI "int pfm_regmask_and(pfmlib_regmask_t *"dest ", pfmlib_regmask_t *"m1 ", pmlib_regmask_t *"m2 ");" .BI "int pfm_regmask_or(pfmlib_regmask_t *"dest ", pfmlib_regmask_t *"m1 ", pmlib_regmask_t *"m2 ");" .BI "int pfm_regmask_copy(pfmlib_regmask_t *"dest ", pfmlib_regmask_t *"src ");" .sp .SH DESCRIPTION This set of function is used to operate of the \fBpfmlib_regmask_t\fR bitmasks that are returned by certain functions or passed to the \fBpfm_dispatch_events()\fR function. To ensure portability, it is important that applications use \fBonly\fR the functions specified here to access the bitmasks. It is strongly discouraged to access the internal fields of the \fBpfm_regmask_t\fR structure. The \fBpfm_regmask_set()\fR function is used to set bit \fBb\fR in the bitmask \fBmask\fR. The \fBpfm_regmask_clr()\fR function is used to clear bit \fBb\fR in the bitmask \fBmask\fR. The \fBpfm_regmask_isset()\fR function returns a non-zero value if \fBb\fR is set in the bitmask \fBmask\fR. The \fBpfm_regmask_weight()\fR function returns in \fBw\fR the number of bits set in the bitmask \fBmask\fR. The \fBpfm_regmask_eq()\fR function returns a non-zero value if the bitmasks \fBmask1\fR and \fBmask2\fR are identical. The \fBpfm_regmask_and()\fR function returns in bitmask \fBdest\fR the result of the logical AND operation between bitmask \fBm1\fR and bitmask \fBm2\fR. The \fBpfm_regmask_or()\fR function returns in bitmask \fBdest\fR the result of the logical OR operation between bitmask \fBm1\fR and bitmask \fBm2\fR. The \fBpfm_regmask_copy()\fR function copies bitmask \fBsrc\fR into bitmask \fRdest\fR. .SH RETURN The function returns whether or not it was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_INVAL the bit \fBb\fR exceeds the limit supported by the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_westmere.30000600003276200002170000001410212247131122021640 0ustar ralphundrgrad.TH LIBPFM 3 "January, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_nehalem - support for Intel Nehalem processor family .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the Intel Nehalem processor family, such as Intel Core i7. The interface is defined in \fBpfmlib_intel_nhm.h\fR. It consists of a set of functions and structures describing the Intel Nehalem processor specific PMU features. The Intel Nehalem processor is a quad core, dual thread processor. It includes two types of PMU: core and uncore. The latter measures events at the socket level and is therefore disconnected from any of the four cores. The core PMU implements Intel architectural perfmon version 3 with four generic counters and three fixed counters. The uncore has eight generic counters and one fixed counter. Each Intel Nehalem core also implement a 16-deep branch trace buffer, called Last Branch Record (LBR), which can be used in combination with the core PMU. Intel Nehalem implements a newer version of the Precise Event-Based Sampling (PEBS) mechanism which has the ability to capture where cache misses occur. .sp When Intel Nehalem processor specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Intel Nehalem processors specific input arguments are described in the \fBpfmlib_nhm_input_param_t\fR structure. No output parameters are currently defined. The input parameters are defined as follows: .sp .nf typedef struct { unsigned long cnt_mask; unsigned int flags; } pfmlib_nhm_counter_t; typedef struct { unsigned int lbr_used; unsigned int lbr_plm; unsigned int lbr_filter; } pfmlib_nhm_lbr_t; typedef struct { unsigned int pebs_used; unsigned int ld_lat_thres; } pfmlib_nhm_pebs_t; typedef struct { pfmlib_nhm_counter_t pfp_nhm_counters[PMU_NHM_NUM_COUNTERS]; pfmlib_nhm_pebs_t pfp_nhm_pebs; pfmlib_nhm_lbr_t pfm_nhm_lbr; uint64_t reserved[4]; } pfmlib_nhm_input_param_t; .fi .sp .sp The Intel Nehalem processor provides a few additional per-event features for counters: thresholding, inversion, edge detection, monitoring of both threads, occupancy. They can be set using the \fBpfp_nhm_counters\fR data structure for each event. The \fBflags\fR field can be initialized with the following values, depending on the event: .TP .B PFMLIB_NHM_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set. This flag is supported for core and uncore PMU events. .TP .B PFMLIB_NHM_SEL_EDGE Enables edge detection of events. This flag is supported for core and uncore PMU events. .TP .B PFMLIB_NHM_SEL_ANYTHR Enable measuring the event in any of the two processor threads assuming hyper-threading is enabled. By default, only the current thread is measured. This flag is restricted to core PMU events. .TP .B PFMLIB_NHM_SEL_OCC_RST When set, the queue occupancy counter associated with the event is cleared. This flag is only available to uncore PMU events. .LP The \fBcnt_mask\fR field is used to set the event threshold. The value of the counter is incremented for each cycle in which the number of occurrences of the event is greater or equal to the value of the field. Thus, the event is modified to actually measure the number of qualifying cycles. When zero all occurrences are counted (this is the default). This flag is supported for core and uncore PMU events. .sp .SH Support for Precise-Event Based Sampling (PEBS) The library can be used to setup the PMC registers associated with PEBS. In this case, the \fBpfp_nhm_pebs_t\fR structure must be used and the \fBpebs_used\fR field must be set to 1. .sp To enable the PEBS load latency filtering capability, it is necessary to program the \fBMEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD\fR event into one generic counter. The latency threshold must be passed to the library in the \fBld_lat_thres\fR field. It is expressed in core cycles and \fBmust\fR greater than 3. Note that \fBpebs_used\fR must be set as well. .SH Support for Last Branch Record (LBR) The library can be used to setup LBR registers. On Intel Nehalem processors, the LBR is 16-entry deep and it is possible to filter branches, based on privilege level or type. To configure the LBR, the \fBpfm_nhm_lbr_t\fR structure must be used. .sp Like core PMU counters, LBR only distinguishes two privilege levels, 0 and the rest (1,2,3). When running Linux natively, the kernel is at privilege level 0, applications at level 3. It is possible to specify the privilege level of LBR using the \fBlbr_plm\fR. Any attempt to pass \fBPFM_PLM1\fB or \fBPFM_PLM2\fR will be rejected. If \fB\lbr_plm\fR is 0, then the global value in \fBpfmlib_input_param_t\fR and the \fBpfp_dfl_plm\fR is used. .sp By default, LBR captures all branches. It is possible to filter out branches by passing a set of flags in \fBlbr_select\fR. The flags are as follows: .TP .B PFMLIB_NHM_LBR_JCC When set, LBR does not capture conditional branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_REL_CALL When set, LBR does not capture near calls. Default: off. .TP .B PFM_NHM_LBR_NEAR_IND_CALL When set, LBR does not capture indirect calls. Default: off. .TP .B PFM_NHM_LBR_NEAR_RET When set, LBR does not capture return branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_IND_JMP When set, LBR does not capture indirect branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_REL_JMP When set, LBR does not capture relative branches. Default: off. .TP .B PFM_NHM_LBR_FAR_BRANCH When set, LBR does not capture far branches. Default: off. .SH Support for uncore PMU By nature, the uncore PMU does not distinguish privilege levels, therefore it captures events at all privilege levels. To avoid any misinterpretation, the library enforces that uncore events be measured with both \fBPFM_PLM0\fR and \fBPFM_PLM3\fR set. Tools and operating system kernel interfaces may impose further restrictions on how the uncore PMU can be accessed. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_pmu_name.30000600003276200002170000001345312247131122021766 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_get_pmu_name, pfm_get_pmu_type, pfm_get_pmu_name_bytype, pfm_pmu_is_supported, pfm_force_pmu,pfm_list_supported_pmu \- query library about supported PMU models .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_pmu_name(char *"name ", int " maxlen); .BI "int pfm_get_pmu_type(int *" type); .BI "int pfm_get_pmu_name_bytype(int " type ", char *" name ", int " maxlen); .BI "int pfm_pmu_is_supported(int " type); .BI "int pfm_force_pmu(int " type); .BI "int pfm_list_supported_pmus(int (*" pf ")(const char *"fmt ",...));" .sp .SH DESCRIPTION These functions retrieve information about the detected host PMU and the PMU models supported by the library. More than one model can be supported by the same library. Each PMU model is assigned a type and a name. The latter is just a string and the former is a unique identifier. The currently supported types are: .TP .B PFMLIB_GENERIC_PMU Intel Itanium default architected PMU model, i.e., the basic model. .TP .B PFMLIB_ITANIUM_PMU Intel Itanium processor PMU model. The model is found in the first implementation of the IA-64 architecture, code name Merced. .TP .B PFMLIB_ITANIUM2_PMU Intel Itanium 2 processor PMU model. This is the model provided by McKinley, Madison, and Deerfield processors. .TP .B PFMLIB_MONTECITO_PMU Intel Dual-core Itanium 2 processor PMU model. This is the model provided by Montecito, Montvale processors. .TP .B PFMLIB_AMD64_PMU AMD AMD64 processors (family 15 and 16) .TP .B PFMLIB_GEN_IA32_PMU Intel X86 architectural PMU v1, v2, v3 .TP .B PFMLIB_I386_P6_PMU Intel P6 processors. That includes Pentium Pro, Pentium II, Pentium III, but excludes Pentium M .TP .B PFMLIB_I386_PM_PMU Intel Pentium M processors. .TP .B PFMLIB_INTEL_PII_PMU Intel Pentium II processors. .TP .B PFMLIB_PENTIUM4_PMU Intel processors based on Netburst micro-architecture. That includes Pentium 4. .TP .B PFMLIB_COREDUO_PMU Intel processors based on Yonah micro-architecture. That includes Intel Core Duo/Core Solo processors .TP .B PFMLIB_I386_PM_PMU Intel Pentium M processors .TP .B PFMLIB_INTEL_CORE_PMU Intel processors based on the Core micro-architecture. That includes Intel Core 2 Duo/Quad processors .TP .B PFMLIB_INTEL_ATOM_PMU Intel processors based on the Atom micro-architecture. .TP .B PFMLIB_INTEL_NHM_PMU Intel processors based on the Nehalem micro-architectures. That includes Intel Core i7 processors. .TP .B PFMLIB_MIPS_20KC_PMU MIPS 20KC processors .TP .B PFMLIB_MIPS_24K_PMU MIPS 24K processors .TP .B PFMLIB_MIPS_25KF_PMU MIPS 25KF processors .TP .B PFMLIB_MIPS_34K_PMU MIPS 34K processors .TP .B PFMLIB_MIPS_5KC_PMU MIPS 5KC processors .TP .B PFMLIB_MIPS_74K_PMU MIPS 74K processors .TP .B PFMLIB_MIPS_R10000_PMU MIPS R10000 processors .TP .B PFMLIB_MIPS_R12000_PMU MIPS R12000 processors .TP .B PFMLIB_MIPS_RM7000_PMU MIPS RM7000 processors .TP .B PFMLIB_MIPS_RM9000_PMU MIPS RM9000 processors .TP .B PFMLIB_MIPS_SB1_PMU MIPS SB1/SB1A processors .TP .B PFMLIB_MIPS_VR5432_PMU MIPS VR5432 processors .TP .B PFMLIB_MIPS_VR5500_PMU MIPS VR5500 processors .TP .B PFMLIB_MIPS_ICE9A_PMU SiCortex ICE9A .TP .B PFMLIB_MIPS_ICE9B_PMU SiCortex ICE9B .TP .B PFMLIB_POWERPC_PMU IBM POWERPC processors .TP .B PFMLIB_CRAYX2_PMU Cray X2 processors .TP .B PFMLIB_CELL_PMU IBM Cell processors .TP .B PFMLIB_PPC970_PMU IBM PowerPC 970(FX,GX) processors .TP .B PFMLIB_PPC970MP_PMU IBM PowerPC 970MP processors .TP .B PFMLIB_POWER3_PMU IBM POWER3 processors .TP .B PFMLIB_POWER4_PMU IBM POWER4 processors .TP .B PFMLIB_POWER5_PMU IBM POWER5 processors .TP .B PFMLIB_POWER5p_PMU BM POWER5+ processors .TP .B PFMLIB_POWER6_PMU IBM POWER6 processors .LP The \fBpfm_get_pmu_name()\fR function returns the name of the detected host PMU. The library must have been initialized properly before making this call. The name is returned in the \fBname\fR argument. The \fBmaxlen\fR argument indicates the maximum length of the buffer provided for \fBname\fR. Up to \fBmaxlen-1\fR characters will be returned, not including the termination character. .sp The \fBpfm_get_pmu_type()\fR function returns the type of the detected host PMU. The library must have been initialized properly before making this call. The type returned in \fBtype\fR can be any one of the three listed above. .sp The \fBpfm_get_pmu_name_bytype()\fR function returns the name of a PMU model in \fBname\fR given a type in the \fBtype\fR argument. The \fBmaxlen\fR argument indicates the maximum length of the buffer provided for \fBname\fR. Up to \fBmaxlen-1\fR characters will be returned, not including the termination character. .sp The \fBpfm_pmu_is_supported()\fR function returns \fBPFMLIB_SUCCESS\fR if the given PMU type is supported by the library independently of what the host PMU model is. .sp The \fBpfm_force_pmu()\fR function is used to forced the library to use a particular PMU model compared to what it has detected. The library checks that the selected type can be supported by the host PMU. This is mostly useful to force the library to the use generic PMU model \fBPFMLIB_GENERIC_PMU\fR. This function can be called at any time and upon return the library is considered initialized. .sp The \fBpfm_list_supported_pmu()\fR function is used to print the list PMU types that the library supports. The results is printed using the function provided in the \fBpf\fR argument, which must be a printf-style function. .SH RETURN The function returns whether or not it was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT the library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL invalid argument was given, most likely invalid pointer or invalid PMU type. .TP .B PFMLIB_ERR_NOTSUPP the selected PMU type can be used on the host CPU. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_inst_retired.30000600003276200002170000000003712247131122022652 0ustar ralphundrgrad.so man3/pfm_get_cycle_event.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_regmask_isset.30000600003276200002170000000003312247131122022154 0ustar ralphundrgrad.so man3/pfm_regmask_set.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_pmu_type.30000600003276200002170000000003412247131122022016 0ustar ralphundrgrad.so man3/pfm_get_pmu_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_name.30000600003276200002170000001521012247131122022277 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2006" "" "Linux Programmer's Manual" .SH NAME pfm_get_event_name, pfm_get_full_event_name, pfm_get_event_mask_name, pfm_get_event_code, pfm_get_event_mask_code, pfm_get_event_counters, pfm_get_num_events, pfm_get_max_event_name_len, pfm_get_event_description, pfm_get_event_mask_description \- get event information .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_event_name(unsigned int " e ", char *"name ", size_t " maxlen ");" .BI "int pfm_get_full_event_name(pfmlib_event_t *" ev ", char *"name ", size_t " maxlen ");" .BI "int pfm_get_event_mask_name(unsigned int " e ", unsigned int "mask ", char *"name ", size_t " maxlen ");" .BI "int pfm_get_event_code(unsigned int " e ", int *"code ");" .BI "int pfm_get_event_mask_code(unsigned int " e ", unsigned int "mask ", int *"code ");" .BI "int pfm_get_event_code_counter(unsigned int " e ", unsigned int " cnt ", int *"code ");" .BI "int pfm_get_event_counters(int " e ", pfmlib_regmask_t "counters ");" .BI "int pfm_get_num_events(unsigned int *" count ");" .BI "int pfm_get_max_event_name_len(size_t *" len ");" .BI "int pfm_get_event_description(unsigned int " ev ", char **" str ");" .BI "int pfm_get_event_mask_description(unsigned int " ev ", unsigned int "mask ", char **" str ");" .sp .SH DESCRIPTION The \fBpfm_get_event_name()\fR function returns in \fBname\fR the event name given its opaque descriptor in \fBe\fR. The \fBmaxlen\fR argument indicates the maximum length of the buffer provided for \fBname\fR. Up to \fBmaxlen\fR-1 characters are stored in the buffer. The buffer size must be large enough to store the event name, otherwise an error is returned. This behavior is required to avoid returning partial names with no way for the caller to verify this is not the full name, except by failing other calls. The buffer can be appropriately sized using the \fBpfm_get_max_event_name_len()\fR function. The returned name is a null terminated string with all upper-case characters and no spaces. .sp The \fBpfm_get_full_event_name()\fR function returns in \fBname\fR the event name given the full event description in \fBev\fR. The description contains the event code in \fBev->event\fR and optional unit masks descriptors in \fBev->unit_masks\fR. The \fBmaxlen\fR argument indicates the maximum length of the buffer provided for \fBname\fR. If more than \fBmaxlen\fR-1 characters are needed to represent the event, an error is returned. Applications may use the \fBpfm_get_max_event_name_len()\fR function to size the buffer correctly. In case unit masks are provided, the final event name string is structured as: event_name:unit_masks1[:unit_masks2]. Event names and unit masks names are returned in all upper case. .sp The \fBpfm_get_event_code()\fR function returns the event code in \fBcode\fR given its opaque descriptor \fBe\fR. .sp On some PMU models, the code associated with an event is different based on the counter it is programmed into. The \fBpfm_get_event_code_counter()\fR function is used to retrieve the event code in \fBcode\fR when the event \fBe\fR is programmed into counter \fBcnt\fR. The counter index \fBcnt\fR must correspond to of a counting PMD register. .sp Given an opaque event descriptor \fBe\fR, the \fBpfm_get_event_counters()\fR function returns in \fBcounters\fR a bitmask of type \fBpfmlib_regmask_t\fR where each bit set represents a PMU config register which can be used to program this event. The bitmask must be accessed using accessor macros defined by the library. .so The \fBpfm_get_num_events()\fR function returns in \fBcount\fR the total number of events available for the PMU model. On some PMU models, however, not all events in the table may be useable due to processor stepping changes. However, The library guarantees that no more that \fBcount\fR events are available. .sp It is possible to list all existing events for the detected host PMU using accessor functions as the full table of events is not accessible to the applications. The index of the first event is always zero, then using the \fBpfm_get_num_events()\fR function you get the total number of events. On some PMU models, e.g., AMD64, not all events are necessarily supported by the host PMU, therefore the count returned by this calls may not be the actual number of available events. Event descriptors are contiguous therefore a simple loop will allow complete scanning. The typical scan loop is constructed as follows: .sp .nf unsigned int i, count; char name[256]; int ret; pfm_get_num_events(&count); for(i=0;i < count; i++) { ret = pfm_get_event_name(i, name, 256); if (ret != PFMLIB_SUCCESS) continue; printf("%s\\n", name); } .fi .sp The \fBpfm_get_max_event_name_len()\fR function returns in \fBlen\fR the maximum length in bytes for the name of the events or its unit masks, if any, available on one PMU implementation. The value excludes the string termination character ('\\0'). .sp The \fBpfm_get_event_description()\fR function returns in \fBstr\fR the description string associated with the event specified in \fBev\fR. The description is returned into a buffer that is allocated to hold the entire description text. It is the responsibility of the caller to free the buffer when it becomes useless by calling the \fBfree(3)\fR function. .sp The \fBpfm_get_event_mask_code()\fR function must be used to retrieve the actual unit mask value given a event descriptor in \fBe\fR and a unit mask descriptor in \fBmask\fR. The value is returned in \fBcode\fR. .sp The \fBpfm_get_event_mask_name()\fR function must be used to retrieve the name associated with a unit mask specified in \fBmask\fR for event \fBe\fR. The name is returned in the buffer specified in \fBname\fR. The maximum size of the buffer must be specified in \fBmaxlen\fR. .sp The \fBpfm_get_event_mask_description()\fR function returns in \fBstr\fR the description string associated with the unit mask specified in \fBmask\fR for the event specified in \fBev\fR. The description is returned into a buffer that is allocated to hold the entire description text. It is the responsibility of the caller to free the buffer when it becomes useless by calling the \fBfree(3)\fR function. .SH RETURN All functions return whether or not the call was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT the library has not been initialized properly. .TP .B PFMLIB_ERR_FULL the string buffer provided is too small .TP .B PFMLIB_ERR_INVAL the event or unit mask descriptor, or the \fBcnt\fR argument is invalid, or a pointer argument is NULL. .SH SEE ALSO pfm_get_impl_counters(3), pfm_get_max_event_name_len(3), free(3) .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_powerpc.30000600003276200002170000000331112247131122021464 0ustar ralphundrgrad.TH LIBPFM 3 "October, 2007" "" "Linux Programmer's Manual" .SH NAME libpfm_powerpc - support for IBM PowerPC and POWER processor families .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides support for the IBM PowerPC and POWER processor families. Specifically, it currently provides support for the following processors: PPC970(FX,GX), PPC970MP POWER4, POWER4+, POWER5, POWER5+, and POWER6. .sp .SH MODEL-SPECIFIC PARAMETERS At present, the model_in and model_out model-specific input and output parameters are not used by \fBpfm_dispatch_events()\fR function. For future compatibility, NULLs must be passed for these arguments. .sp .SH COMBINING EVENTS IN A SET As with many architecture's PMU hardware design, events can not be combined together arbitrarily in the same event set, even if there are a sufficient number of counters available. This implementation for IBM PowerPC/POWER bases the event compatibility on a set of previously-defined compatible event groups. If the events placed in an event set are all members of one of the predefined event groups, a call to the \fBpfm_dispatch_events()\fR function will be successful. With the current interface, there is no way to discover apriori which events are compatible, so application software that wishes to combine events must do so by trial and error, possibly using multiplexed event sets to count events that cannot otherwise be combined in the same set. .sp .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Corey Ashford .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_num_counters.30000600003276200002170000000003512247131122022676 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_initialize.30000600003276200002170000000163212247131122021463 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_initialize \- initialize performance monitoring library .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_initialize(void);" .sp .SH DESCRIPTION This is the first function that a program using the library \fBmust\fR call otherwise the library will not function at all. This function probes the host PMU and initialize the internal state of the library. In the case of a multi-threaded application, this function needs to be called only once, most likely by the initial thread. .SH RETURN The function returns whether or not it was successful, i.e., the host PMU has been correctly identified and is supported. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is an error code. .SH ERRORS .TP .B PFMLIB_ERR_NOTSUPP the host PMU is not supported. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_description.30000600003276200002170000000003612247131122023702 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_event_code_counter.30000600003276200002170000000003612247131122024030 0ustar ralphundrgrad.so man3/pfm_get_event_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_list_supported_pmus.30000600003276200002170000000003412247131122023441 0ustar ralphundrgrad.so man3/pfm_get_pmu_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_atom.30000600003276200002170000000555212247131122020756 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2006" "" "Linux Programmer's Manual" .SH NAME libpfm_core - support for Intel Atom processor family .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the Intel Atom processor. This processor implements Intel architectural perfmon v3 with Precise Event-Based Sampling (PEBS) support. It also implements all architected events to which it adds lots of Atom specific events. .sp The libpfm interface is defined in \fBpfmlib_intel_atom.h\fR. It consists of a set of functions and structures which describe and allow access to the Intel Atom processor specific PMU features. .sp When Intel Atom processor specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Intel Atom processors specific input arguments are described in the \fBpfmlib_intel_atom_input_param_t\fR structure. No output parameters are currently defined. The input parameters are defined as follows: .sp .nf typedef struct { unsigned int cnt_mask; unsigned int flags; } pfmlib_intel_atom_counter_t; typedef struct { pfmlib_intel_atom_counter_t pfp_intel_atom_counters[PMU_INTEL_ATOM_NUM_COUNTERS]; unsigned int pfp_intel_atom_pebs_used; uint64_t reserved[4]; } pfmlib_core_input_param_t; .fi .sp .sp The Intel Atom processor provides several additional per-event features for counters: thresholding, inversion, edge detection, monitoring both threads. They can be set using the \fBpfp_intel_atom_counters\fR data structure for each event. The \fBflags\fR field can be initialized with any combinations of the following values: .TP .B PFMLIB_INTEL_ATOM_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set .TP .B PFMLIB_INTEL_ATOM_SEL_EDGE Enable edge detection of events. .TP .B PFMLIB_INTEL_ATOM_SEL_ANYTHR Enable measuring the event in any of the two threads. By default only the current thread is measured. .LP The \fBcnt_mask\fR field is used to set the event threshold. The value of the counter is incremented each time the number of occurrences per cycle of the event is greater or equal to the value of the field. Thus the event is modified to actually measure the number of qualifying cycles. When zero all occurrences are counted (this is the default). .sp .SH Support for Precise-Event Based Sampling (PEBS) The library can be used to setup the PMC registers when using PEBS. In this case, the \fBpfp_intel_atom_pebs_used\fR field must be set to 1. When using PEBS, it is not possible to use more than one event. .LP .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_dispatch_events.30000600003276200002170000002510412247131122022505 0ustar ralphundrgrad.TH LIBPFM 3 "July , 2003" "" "Linux Programmer's Manual" .SH NAME pfm_dispatch_events \- determine PMC registers values for a set of events to measure .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_dispatch_events(pfmlib_input_param_t *"p ", void *" mod_in ", pfmlib_output_param_t *" q, "void *" mod_out ");" .sp .SH DESCRIPTION This function is the central piece of the library. It is important to understand that the library does not effectively program the PMU, i.e., it does not make the operating system calls. The PMU is never actually accessed by the library. Instead, the library helps applications prepare the arguments to pass to the kernel. In particular, it sets up the values to program into the PMU configuration registers (PMC). The list of used data registers (PMD) is also returned. .sp The input argument are divided into two categories: the generic arguments in \fBp\fR and the optional PMU model specific arguments in \fBmod_in\fR. The same applies for the output arguments: \fBq\fR contains the generic output arguments and \fBmod_out\fR the optional PMU model specific arguments. .sp An application describes what it wants to measure in the \fBin\fR and if it uses some model specific features, such as opcode matching on Itanium 2 processors, it must pass a pointer to the relevant model-specific input parameters in \fBmod_in\fR. The generic output parameters contains the register index and values for the PMC and PMD registers needed to make the measurement. The index mapping is guaranteed to match the mapping used by the Linux perfmon2 interface. In case the library is not used on this system, the hardware register addresses or indexes can also be retrieved from the output structure. .sp The \fBpfmlib_input_param_t\fR structure is defined as follows: .sp .nf typedef struct int event; unsigned int plm; unsigned long flags; unsigned int unit_masks[PFMLIB_MAX_MASKS_PER_EVENT]; unsigned int num_masks; } pfmlib_event_t; typedef struct { unsigned int pfp_event_count; unsigned int pfp_dfl_plm; unsigned int pfp_flags; pfmlib_event_t pfp_events[PFMLIB_MAX_PMCS]; pfmlib_regmask_t pfp_unavail_pmcs; } pfmlib_input_param_t; .fi .sp The structure mostly contains one table, called \fBpfp_events\fR which describes the events to be measured. The number of submitted events is indicated by \fBpfp_event_count\fR. Each event is described in the \fBpfp_events\fR table by an opaque descriptor stored in the \fBevent\fR field. This descriptor is obtained with the \fBpfm_find_full_event()\fR or derivative functions. For some events, it may be necessary to specify at least one unit mask in the \fBunit_masks\fR table. A unit mask is yet another opaque descriptor obtained via the \fBpfm_find_event_mask()\fR or \fBpfm_find_full_event()\fR functions. Typically, if an event supports multiple unit masks, they can be combined in which case more than one entry in \fBunit_masks\fR must be specified. The actual number of unit mask descriptors passed must be indicated in \fBnum_masks\fR. When no unit mask is used, this field must be set to 0. A privilege level mask for the event can be provided in \fBplm\fR. This is a bitmask where each bit indicates a privilege level at which to monitor, more than one bit can be set. The library supports up to four levels, but depending on the PMU model, some levels may not be available. The levels are as follows: .TP .B PFM_PLM0 monitor at the privilege level 0. For many architectures, this means kernel level .TP .B PFM_PLM1 monitor at privilege level 1 .TP .B PFM_PLM2 monitor at privilege level 2 .TP .B PFM_PLM3 monitor at the privilege level 3. For many architectures, this means user level .LP .sp .sp Events with a \fBplm\fR value of 0 will use the default privilege level mask as indicated by \fBpfp_dfl_plm\fR which must be set to any combinations of values described above. It is illegal to have a value of 0 for this field. .sp The \fBpfp_flags\fR field contains a set of flags that affect the whole set of events to be monitored. The currently defined flags are: .TP .B PFMLIB_PFP_SYSTEMWIDE indicates that the monitors are to be used in a system-wide monitoring session. This could influence the way the library sets up some register values. .sp .LP The \fBpfp_unavail_pmcs\fR bitmask can be used by applications to communicate to the library the list of PMC registers which are not available on the system. Some kernels may allocate certain PMC registers (and associated data registers) for other purposes. Those registers must not be used by the library otherwise the assignment of events to PMC registers may be rejected by the kernel. Applications must figure out which registers are available using a kernel interface at their disposal, the library does not provide this service. The library expect the restrictions to be expressed using the Linux perfmon2 PMC register mapping. .LP Refer to the PMU specific manual for a description of the model-specific input parameters to be passed in \fBmod_in\fR. The generic output parameters are contained in the fBpfmlib_output_param_t\fR structure which is defined as: .sp .nf typedef struct { unsigned long long reg_value; unsigned int reg_num; unsigned long reg_addr; } pfmlib_reg_t; typedef struct { unsigned int pfp_pmc_count; unsigned int pfp_pmd_count; pfmlib_reg_t pfp_pmcs[PFMLIB_MAX_PMCS]; pfmlib_reg_t pfp_pmds[PFMLIB_MAX_PMDS]; } pfmlib_output_param_t; .fi .sp The number of valid entries in the \fBpfp_pmcs\fR table is indicated by \fBpfp_pmc_count\fR. The number of valid entries in the \fBpfp_pmds\fR table is indicated by \fBpfp_pmd_count\fR. Each entry in both tables is of type \fBpfmlib_reg_t\fR. .sp In the \fBpfp_pmcs\fR table, the \fBreg_num\fR contains the PMC register index (perfmon2 mapping), and the \fBreg_value\fR contains a 64-bit value to be used to program the PMC register. The \fBreg_addr\fR indicates the hardware address or index for the PMC register. .sp In the \fBpfp_pmds\fR table, the \fBreg_num\fR contains the PMD register index (perfmon2 mapping). the \fBreg_value\fR is ignored. The \fBreg_addr\fR indicates the hardware address or index for the PMC register. .sp Refer to the PMU specific manual for a description of the model-specific output parameters to be returned in \fBmod_out\fR. .sp The current implementation of the \fBpfm_dispatch_events()\fR function completely overwrites the \fBpfmlib_output_param\fR structure. In other words, results do not accumulate into the \fBpfp_pmcs\fR table across multiple calls. Unused fields are guaranteed to be zeroed upon successful return. .sp Depending on the PMU model, there may not always be a one to one mapping between a PMC register and a data register. Register dependencies may be more intricate. However the \fBpfm_dispatch_events()\fR function guarantees certain ordering between the \fBpfp_pmcs\fR and \fBpfp_pmds\fR tables. In particular, it guarantees that the \fBpfp_pmds\fR table always starts with the counters corresponding, in the same order, to the events as provided in the \fBpfp_event\fR table on input. There is always one counter per event. Additional PMD registers, if any, come after. .SH EXAMPLE Here is a typical sequence using the perfmon2 interface: .nf #include ... pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_ctx_t ctx; pfarg_pmd_t pd[1]; pfarg_pmc_t pc[1]; pfarg_load_t load_arg; int fd, i; int ret; if (pfm_initialize() != PFMLIB_SUCCESS) { fprintf(stderr, "can't initialize library\\n"); exit(1); } memset(&ctx,0, sizeof(ctx)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(pd, 0, sizeof(pd)); memset(pc, 0, sizeof(pc)); memset(&load_arg, 0, sizeof(load_arg)); ret = pfm_get_cycle_event(&inp.pfp_events[0]); if (ret != PFMLIB_SUCCESS) { fprintf(stderr, "cannot find cycle event\\n"); exit(1); } inp.pfp_dfl_plm = PFM_PLM3; inp.pfp_event_count = 1; ret = pfm_dispatch_events(&inp, NULL, &outp, NULL); if (ret != PFMLIB_SUCCESS) { fprintf(stderr, "cannot dispatch events: %s\\n", pfm_strerror(ret)); exit(1); } /* propagate pmc value to perfmon2 structures */ for(i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for(i=0; i < outp.pfp_pmd_count; i++) { pd[i].reg_num = outp.pfp_pmds[i].reg_num; pd[i].reg_value = 0; } ... if (pfm_create_context(&ctx, NULL, 0) == -1 ) { ... } fd = ctx.ctx_fd; if (pfm_write_pmcs(fd, pc, outp.pfp_pmc_count) == -1) { ... } if (pfm_write_pmds(fd, pd, outp.pfp_pmd_count) == -1) { ... } load_arg.load_pid = getpid(); if (pfm_load_context(fd, &load_arg) == -1) { ... } pfm_start(fd, NULL); /* code to monitor */ pfm_stop(fd); if (pfm_read_pmds(fd, pd, evt.pfp_event_count) == -1) { ... } printf("results: %llu\n", pd[0].reg_value); ... close(fd); ... .fi .SH RETURN The function returns whether or not the call was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT The library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL Some arguments were invalid. For instance the value of *count is zero. This can also be due to he content of the \fBpfmlib_param_t\fR structure. .TP .B PFMLIB_ERR_NOTFOUND No matching event was found. .TP .B PFMLIB_ERR_TOOMANY The number of events to monitor exceed the number of implemented counters. .TP .B PFMLIB_ERR_NOASSIGN The events cannot be dispatched to the PMC because events have conflicting constraints. .TP .B PFMLIB_ERR_MAGIC The model specific extension does not have the right magic number. .TP .B PFMLIB_ERR_FEATCOMB The set of events and features cannot be combined. .TP .B PFMLIB_ERR_EVTMANY An event has been supplied more than once and is causing resource (PMC) conflicts. .TP .B PFMLIB_ERR_IRRINVAL Invalid code range restriction (Itanium, Itanium 2). .TP .B PFMLIB_ERR_IRRALIGN Code range has invalid alignment (Itanium, Itanium 2). .TP .B PFMLIB_ERR_IRRTOOMANY Cannot satisfy all the code ranges (Itanium, Itanium 2). .TP .B PFMLIB_ERR_DRRTOOMANY Cannot satisfy all the data ranges (Itanium, Itanium 2). .TP .B PFMLIB_ERR_DRRINVAL Invalid data range restriction (Itanium, Itanium 2). .TP .B PFMLIB_ERR_EVTSET Some events belong to incompatible sets (Itanium 2). .TP .B PFMLIB_ERR_EVTINCOMP Some events cannot be measured at the same time (Itanium 2). .TP .B PFMLIB_ERR_IRRTOOBIG Code range is too big (Itanium 2). .TP .B PFMLIB_ERR_UMASK Invalid or missing unit mask. .SH SEE ALSO libpfm_itanium(3), libpfm_itanium2(3), pfm_regmask_set(3), pfm_regmask_clr(3), pfm_find_event_code_mask(3) .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_force_pmu.30000600003276200002170000000003412247131122021274 0ustar ralphundrgrad.so man3/pfm_get_pmu_name.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_itanium2.30000600003276200002170000005646712247131122021561 0ustar ralphundrgrad.TH LIBPFM 3 "November, 2003" "" "Linux Programmer's Manual" .SH NAME libpfm_itanium2 - support for Itanium2 specific PMU features .SH SYNOPSIS .nf .B #include .B #include .sp .BI "int pfm_ita2_is_ear(unsigned int " i ");" .BI "int pfm_ita2_is_dear(unsigned int " i ");" .BI "int pfm_ita2_is_dear_tlb(unsigned int " i ");" .BI "int pfm_ita2_is_dear_cache(unsigned int " i ");" .BI "int pfm_ita2_is_dear_alat(unsigned int " i ");" .BI "int pfm_ita2_is_iear(unsigned int " i ");" .BI "int pfm_ita2_is_iear_tlb(unsigned int " i ");" .BI "int pfm_ita2_is_iear_cache(unsigned int " i ");" .BI "int pfm_ita2_is_btb(unsigned int " i ");" .BI "int pfm_ita2_support_opcm(unsigned int " i ");" .BI "int pfm_ita2_support_iarr(unsigned int " i ");" .BI "int pfm_ita2_support_darr(unsigned int " i ");" .BI "int pfm_ita2_get_event_maxincr(unsigned int "i ", unsigned int *"maxincr ");" .BI "int pfm_ita2_get_event_umask(unsigned int "i ", unsigned long *"umask ");" .BI "int pfm_ita2_get_event_group(unsigned int "i ", int *"grp ");" .BI "int pfm_ita2_get_event_set(unsigned int "i ", int *"set ");" .BI "int pfm_ita2_get_ear_mode(unsigned int "i ", pfmlib_ita2_ear_mode_t *"mode ");" .BI "int pfm_ita2_irange_is_fine(pfmlib_output_param_t *"outp ", pfmlib_ita2_output_param_t *"mod_out ");" .sp .SH DESCRIPTION The libpfm library provides full support for all the Itanium 2 specific features of the PMU. The interface is defined in \fBpfmlib_itanium2.h\fR. It consists of a set of functions and structures which describe and allow access to the Itanium 2 specific PMU features. .sp The Itanium 2 specific functions presented here are mostly used to retrieve the characteristics of an event. Given a opaque event descriptor, obtained by the \fBpfm_find_event()\fR or its derivative functions, they return a boolean value indicating whether this event support this feature or is of a particular kind. .sp The \fBpfm_ita2_is_ear()\fR function returns 1 if the event designated by \fBi\fR corresponds to a EAR event, i.e., an Event Address Register type of events. Otherwise 0 is returned. For instance, \fBDATA_EAR_CACHE_LAT4\fR is an ear event, but \fBCPU_CYCLES\fR is not. It can be a data or instruction EAR event. .sp The \fBpfm_ita2_is_dear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an Data EAR event. Otherwise 0 is returned. It can be a cache or TLB EAR event. .sp The \fBpfm_ita2_is_dear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_ita2_is_dear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to a Data EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_ita2_is_dear_alat()\fR function returns 1 if the event designated by \fBi\fR corresponds to a ALAT EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_ita2_is_iear()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR event. Otherwise 0 is returned. It can be a cache or TLB instruction EAR event. .sp The \fBpfm_ita2_is_iear_tlb()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR TLB event. Otherwise 0 is returned. .sp The \fBpfm_ita2_is_iear_cache()\fR function returns 1 if the event designated by \fBi\fR corresponds to an instruction EAR cache event. Otherwise 0 is returned. .sp The \fBpfm_ita2_support_opcm()\fR function returns 1 if the event designated by \fBi\fR supports opcode matching, i.e., can this event be measured accurately when opcode matching via PMC8/PMC9 is active. Not all events supports this feature. .sp The \fBpfm_ita2_support_iarr()\fR function returns 1 if the event designated by \fBi\fR supports code address range restrictions, i.e., can this event be measured accurately when code range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_ita2_support_darr()\fR function returns 1 if the event designated by \fBi\fR supports data address range restrictions, i.e., can this event be measured accurately when data range restriction is active. Otherwise 0 is returned. Not all events supports this feature. .sp The \fBpfm_ita2_get_event_maxincr()\fR function returns in \fBmaxincr\fR the maximum number of occurrences per cycle for the event designated by \fBi\fR. Certain Itanium 2 events can occur more than once per cycle. When an event occurs more than once per cycle, the PMD counter will be incremented accordingly. It is possible to restrict measurement when event occur more than once per cycle. For instance, \fBNOPS_RETIRED\fR can happen up to 6 times/cycle which means that the threshold can be adjusted between 0 and 5, where 5 would mean that the PMD counter would be incremented by 1 only when the nop instruction is executed more than 5 times/cycle. This function returns the maximum number of occurrences of the event per cycle, and is the non-inclusive upper bound for the threshold to program in the PMC register. .sp The \fBpfm_ita2_get_event_umask()\fR function returns in \fBumask\fR the umask for the event designated by \fBi\fR. .sp The \fBpfm_ita2_get_event_grp()\fR function returns in \fBgrp\fR the group to which the event designated by \fBi\fR belongs. The notion of group is used for L1 and L2 cache events only. For all other events, a group is irrelevant and can be ignored. If the event is an L2 cache event then the value of \fBgrp\fR will be \fBPFMLIB_ITA2_EVT_L2_CACHE_GRP\fR. Similarly, if the event is an L1 cache event, the value of \fBgrp\fR will be \fBPFMLIB_ITA2_EVT_L1_CACHE_GRP\fR. In any other cases, the value of \fBgrp\fR will be \fBPFMLIB_ITA2_EVT_NO_GRP\fR. .sp The \fBpfm_ita2_get_event_set()\fR function returns in \fBset\fR the set to which the event designated by \fBi\fR belongs. A set is a subdivision of a group and is therefore only relevant for L1 and L2 cache events. An event can only belong to one group and one set. This partitioning of the cache events is due to some hardware limitations which impose some restrictions on events. For a given group, events from different sets cannot be measured at the same time. If the event does not belong to a group then the value of \fBset\fR is \fBPFMLIB_MONT_EVT_NO_SET\fR. .sp The \fBpfm_ita2_irange_is_fine()\fR function returns 1 if the configuration description passed in \fBoutp\fR, the generic output parameters and \fBmod_out\fR, the Itanium2 specific output parameters, use code range restriction in fine mode. Otherwise the function returns 0. This function can only be called after a call to the \fBpfm_dispatch_events()\fR function returns successfully and had the data structures pointed to by \fBoutp\fR and \fBmod_out\fR as output parameters. .sp The \fBpfm_ita2_get_event_ear_mode()\fR function returns in \fBmode\fR the EAR mode of the event designated by \fBi\fR. If the event is not an EAR event, then \fBPFMLIB_ERR_INVAL\fR is returned and mode is not updated. Otherwise mode can have the following values: .TP .B PFMLIB_ITA2_EAR_TLB_MODE The event is an EAR TLB mode. It can be either data or instruction TLB EAR. .TP .B PFMLIB_ITA2_EAR_CACHE_MODE The event is a cache EAR. It can be either data or instruction cache EAR. .TP .B PFMLIB_ITA2_EAR_ALAT_MODE The event is an ALAT EAR. It can only be a data EAR event. .sp .LP When the Itanium 2 specific features are needed to support a measurement their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Itanium 2 specific input arguments are described in the \fBpfmlib_ita2_input_param_t\fR structure and the output parameters in \fBpfmlib_ita2_output_param_t\fR. They are defined as follows: .sp .nf typedef enum { PFMLIB_ITA2_ISM_BOTH=0, PFMLIB_ITA2_ISM_IA32=1, PFMLIB_ITA2_ISM_IA64=2 } pfmlib_ita2_ism_t; typedef struct { unsigned int flags; unsigned int thres; pfmlib_ita2_ism_t ism; } pfmlib_ita2_counter_t; typedef struct { unsigned char opcm_used; unsigned long pmc_val; } pfmlib_ita2_opcm_t; typedef struct { unsigned char btb_used; unsigned char btb_ds; unsigned char btb_tm; unsigned char btb_ptm; unsigned char btb_ppm; unsigned char btb_brt; unsigned int btb_plm; } pfmlib_ita2_btb_t; typedef enum { PFMLIB_ITA2_EAR_CACHE_MODE= 0, PFMLIB_ITA2_EAR_TLB_MODE = 1, PFMLIB_ITA2_EAR_ALAT_MODE = 2 } pfmlib_ita2_ear_mode_t; typedef struct { unsigned char ear_used; pfmlib_ita2_ear_mode_t ear_mode; pfmlib_ita2_ism_t ear_ism; unsigned int ear_plm; unsigned long ear_umask; } pfmlib_ita2_ear_t; typedef struct { unsigned int rr_plm; unsigned long rr_start; unsigned long rr_end; } pfmlib_ita2_input_rr_desc_t; typedef struct { unsigned long rr_soff; unsigned long rr_eoff; } pfmlib_ita2_output_rr_desc_t; typedef struct { unsigned int rr_flags; pfmlib_ita2_input_rr_desc_t rr_limits[4]; unsigned char rr_used; } pfmlib_ita2_input_rr_t; typedef struct { unsigned int rr_nbr_used; pfmlib_ita2_output_rr_desc_t rr_infos[4]; pfmlib_reg_t rr_br[8]; } pfmlib_ita2_output_rr_t; typedef struct { pfmlib_ita2_counter_t pfp_ita2_counters[PMU_ITA2_NUM_COUNTERS]; unsigned long pfp_ita2_flags; pfmlib_ita2_opcm_t pfp_ita2_pmc8; pfmlib_ita2_opcm_t pfp_ita2_pmc9; pfmlib_ita2_ear_t pfp_ita2_iear; pfmlib_ita2_ear_t pfp_ita2_dear; pfmlib_ita2_btb_t pfp_ita2_btb; pfmlib_ita2_input_rr_t pfp_ita2_drange; pfmlib_ita2_input_rr_t pfp_ita2_irange; } pfmlib_ita2_input_param_t; typedef struct { pfmlib_ita2_output_rr_t pfp_ita2_drange; pfmlib_ita2_output_rr_t pfp_ita2_irange; } pfmlib_ita2_output_param_t; .fi .sp .SH PER-EVENT OPTIONS .sp The Itanium 2 processor provides two additional per-event features for counters: thresholding and instruction set selection. They can be set using the \fBpfp_ita2_counters\fR data structure for each event. The \fBism\fR field can be initialized as follows: .TP .B PFMLIB_ITA2_ISM_BOTH The event will be monitored during IA-64 and IA-32 execution .TP .B PFMLIB_ITA2_ISM_IA32 The event will only be monitored during IA-32 execution .TP .B PFMLIB_ITA2_ISM_IA64 The event will only be monitored during IA-64 execution .sp .LP If \fBism\fR has a value of zero, it will default to PFMLIB_ITA2_ISM_BOTH. The \fBthres\fR indicates the threshold for the event. A threshold of \fBn\fR means that the counter will be incremented by one only when the event occurs more than \fBn\fR times per cycle. The \fBflags\fR field contains event-specific flags. The currently defined flags are: .sp .TP PFMLIB_ITA2_FL_EVT_NO_QUALCHECK When this flag is set it indicates that the library should ignore the qualifiers constraints for this event. Qualifiers includes opcode matching, code and data range restrictions. When an event is marked as not supporting a particular qualifier, it usually means that it is ignored, i.e., the extra level of filtering is ignored. For instance, the CPU_CYCLES event does not support code range restrictions and by default the library will refuse to program it if range restriction is also requested. Using the flag will override the check and the call to the \fBpfm_dispatch_events()\fR function will succeed. In this case, CPU_CYCLES will be measured for the entire program and not just for the code range requested. For certain measurements this is perfectly acceptable as the range restriction will only be applied relevant to events which support it. Make sure you understand which events do not support certain qualifiers before using this flag. .LP .SH OPCODE MATCHING .sp The \fBpfp_ita2_pmc8\fR and \fBpfp_ita2_pmc9\fR fields of type \fBpfmlib_ita2_opcm_t\fR contain the description of what to do with the opcode matchers. Itanium 2 supports opcode matching via PMC8 and PMC9. When this feature is used the \fBopcm_used\fR field must be set to 1, otherwise it is ignored by the library. The \fBpmc_val\fR simply contains the raw value to store in PMC8 or PMC9. The library may adjust the value to enable/disable some options depending on the set of features being used. The final value for PMC8 and PMC9 will be stored in the \fBpfp_pmcs\fR table of the generic output parameters. .SH EVENT ADDRESS REGISTERS .sp The \fBpfp_ita2_iear\fR field of type \fBpfmlib_ita2_ear_t\fR describes what to do with instruction Event Address Registers (I-EARs). Again if this feature is used the \fBear_used\fR must be set to 1, otherwise it will be ignored by the library. The \fBear_mode\fR must be set to either one of \fBPFMLIB_ITA2_EAR_TLB_MODE\fR, \fBPFMLIB_ITA2_EAR_CACHE_MODE\fRto indicate the type of EAR to program. The umask to store into PMC10 must be in \fBear_umask\fR. The privilege level mask at which the I-EAR will be monitored must be set in \fBear_plm\fR which can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBear_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. Finally the instruction set for which to monitor is in \fBear_ism\fR and can be any one of \fBPFMLIB_ITA2_ISM_BOTH\fR, \fBPFMLIB_ITA2_ISM_IA32\fR, or \fBPFMLIB_ITA2_ISM_IA64\fR. .sp The \fBpfp_ita2_dear\fR field of type \fBpfmlib_ita2_ear_t\fR describes what to do with data Event Address Registers (D-EARs). The description is identical to the I-EARs except that it applies to PMC11 and that a \fBear_mode\fR of \fBPFMLIB_ITA2_EAR_ALAT_MODE\fR is possible. In general, there are four different methods to program the EAR (data or instruction): .TP .B Method 1 There is an EAR event in the list of events to monitor and \fBear_used\fR is cleared. In this case the EAR will be programmed (PMC10 or PMC11) based on the information encoded in the event. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBDATA_EAR_EVENT\fR or \fBL1I_EAR_EVENTS\fR depending on the type of EAR. .TP .B Method 2 There is an EAR event in the list of events to monitor and \fBear_used\fR is set. In this case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_ita2_iear\fR or \fBpfp_ita2_dear\fR structure because it contains more detailed information, such as privilege level and instruction set. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of EAR. .TP .B Method 3 There is no EAR event in the list of events to monitor and and \fBear_used\fR is cleared. In this case no EAR is programmed. .TP .B Method 4 There is no EAR event in the list of events to monitor and and \fBear_used\fR is set. In this case case the EAR will be programmed (PMC10 or PMC11) using the information in the \fBpfp_ita2_iear\fR or \fBpfp_ita2_dear\fR structure. This is the free running mode for the EAR. .sp .SH BRANCH TRACE BUFFER The \fBpfp_ita2_btb\fR of type \fBpfmlib_ita2_btb_t\fR field is used to configure the Branch Trace Buffer (BTB). If the \fBbtb_used\fR is set, then the library will take the configuration into account, otherwise any BTB configuration will be ignored. The various fields in this structure provide means to filter out the kind of branches that gets recorded in the BTB. Each one represents an element of the branch architecture of the Itanium 2 processor. Refer to the Itanium 2 specific documentation for more details on the branch architecture. The fields are as follows: .TP .B btb_ds If the value of this field is 1, then detailed information about the branch prediction are recorded in place of information about the target address. If the value is 0, then information about the target address of the branch is recorded instead. .TP .B btb_tm If this field is 0, then no branch is captured. If this field is 1, then non taken branches are captured. If this field is 2, then taken branches are captured. Finally if this field is 3 then all branches are captured. .TP .B btb_ptm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted target address are captured. If this field is 2, then branches with correctly predicted target address are captured. Finally if this field is 3 then all branches are captured regardless of target address prediction. .TP .B btb_ppm If this field is 0, then no branch is captured. If this field is 1, then branches with a mispredicted path (taken/non taken) are captured. If this field is 2, then branches with correctly predicted path are captured. Finally if this field is 3 then all branches are captured regardless of their path prediction. .TP .B btb_brt If this field is 0, then all branches are captured. If this field is 1, then only IP-relative branches are captured. If this field is 2, then only return branches are captured. Finally if this field is 3 then only non-return indirect branches are captured. .TP .B btb_plm This is the privilege level mask at which the BTB captures branches. It can be any combination of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBbtb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. .sp There are 4 methods to program the BTB and they are as follows: .sp .TP .B Method 1 The \fBBRANCH_EVENT\fR is in the list of event to monitor and \fBbtb_used\fR is cleared. In this case, the BTB will be configured (PMC12) to record ALL branches. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBBRANCH_EVENT\fR. .TP .B Method 2 The \fBBRANCH_EVENT\fR is in the list of events to monitor and \fBbtb_used\fR is set. In this case, the BTB will be configured (PMC12) using the information in the \fBpfp_ita2_btb\fR structure. A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count \fBBRANCH_EVENT\fR. .TP .B Method 3 The \fBBRANCH_EVENT\fR is not in the list of events to monitor and \fBbtb_used\fR is set. In this case, the BTB will be configured (PMC12) using the information in the \fBpfp_ita2_btb\fR structure. This is the free running mode for the BTB. .TP .B Method 4 The \fBBRANCH_EVENT\fR is not in the list of events to monitor and \fBbtb_used\fR is cleared. In this case, the BTB is not programmed. .SH DATA AND CODE RANGE RESTRICTIONS The \fBpfp_ita2_drange\fR and \fBpfp_ita2_irange\fR fields control the range restrictions for the data and code respectively. The idea is that the application passes a set of ranges, each designated by a start and end address. Upon return from the \fBpfm_dispatch_events()\fR function, the application gets back the set of registers and their values that needs to be programmed via a kernel interface. Range restriction is implemented using the debug registers. There is a limited number of debug registers and they go in pair. With 8 data debug registers, a maximum of 4 distinct ranges can be specified. The same applies to code range restrictions. Moreover, there are some severe constraints on the alignment and size of the ranges. Given that the size of a range is specified using a bitmask, there can be situations where the actual range is larger than the requested range. For code ranges, the Itanium 2 processor can use what is called a fine mode, where a range is designated using two pairs of code debug registers. In this mode, the bitmask is not used, the start and end addresses are directly specified. Not all code ranges qualify for fine mode, the size of the range must be 4KB or less and the range cannot cross a 4KB page boundary. The library will make a best effort in choosing the right mode for each range. For code ranges, it will try the fine mode first and will default to using the bitmask mode otherwise. Fine mode applies to all code debug registers or none, i.e., you cannot have a range using fine mode and another using the bitmask. the Itanium 2 processor somehow limits the use of multiple pairs to accurately cover a code range. This can only be done for \fBIA64_INST_RETIRED\fR and even then, you need several events to collect the counts. For all other events, only one pair can be used, which leads to more inaccuracy due to approximation. Data ranges can used multiple debug register pairs to gain more accuracy. The library will never cover less than what is requested. The algorithm will use more than one pair of debug registers whenever possible to get a more precise range. Hence, up to the 4 pairs can be used to describe a single range. If range restriction is to be used, the \fBrr_used\fR field must be set to one, otherwise settings will be ignored. The ranges are described by the \fBpfmlib_ita2_input_rr_t\fR structure. Up to 4 ranges can be defined. Each range is described in by a entry in \fBrr_limits\fR. Some flags for all ranges can be defined in \fBrr_flags\fR. Currently defined flags are: .sp .TP .B PFMLIB_ITA2_RR_INV Inverse the code ranges. The qualifying events will be measurement when executing outside the specified ranges. .TP .B PFMLIB_ITA2_RR_NO_FINE_MODE Force non fine mode for all code ranges (mostly for debug) .sp .LP The \fBpfmlib_ita2_input_rr_desc_t\fR structure is defined as follows: .TP .B rr_plm The privilege level at which the range is active. It can be any combinations of \fBPFM_PLM0\fR, \fBPFM_PLM1\fR, \fBPFM_PLM2\fR, \fBPFM_PLM3\fR. If \fBbtb_plm\fR is 0 then the default privilege level mask in \fBpfp_dfl_plm\fR is used. The privilege level is only relevant for code ranges, data ranges ignores the setting. .TP .B rr_start This is the start address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .TP .B rr_end This is the end address of the range. Any address is supported but for code range it must be bundle aligned, i.e., 16-byte aligned. .sp .LP The library will provide the values for the debug registers as well as some information about the actual ranges in the output parameters and more precisely in the \fBpfmlib_ita2_output_rr_t\fR structure for each range. The structure is defined as follows: .TP .B rr_nbr_used Contains the number of debug registers used to cover the range. This is necessarily an even number as debug registers always go in pair. The value of this field is between 0 and 7. .TP .B rr_br This table contains the list of debug registers necessary to cover the ranges. Each element is of type \fBpfmlib_reg_t\fR. The \fBreg_num\fR field contains the debug register index while \fBreg_value\fR contains the debug register value. Both the index and value must be copied into the kernel specific argument to program the debug registers. The library never programs them. .TP .B rr_infos Contains information about the ranges defined. Because of alignment restrictions, the actual range covered by the debug registers may be larger than the requested range. This table describe the differences between the requested and actual ranges expressed as offsets: .TP .B rr_soff Contains the start offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the beginning of the range. Otherwise it represents the number of byte by which the actual range precedes the requested range. .TP .B rr_eoff Contains the end offset of the actual range described by the debug registers. If zero, it means the library was able to match exactly the end of the range. Otherwise it represents the number of bytes by which the actual range exceeds the requested range. .sp .LP .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors when using the Itanium 2 specific input and output arguments. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_find_event.30000600003276200002170000001007112247131122021440 0ustar ralphundrgrad.TH LIBPFM 3 "August, 2006" "" "Linux Programmer's Manual" .SH NAME pfm_find_event, pfm_find_full_event, pfm_find_event_bycode, pfm_find_event_bycode_next, pfm_find_event_mask \- search for events and unit masks .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_find_event(const char *"str ", unsigned int *"desc ");" .BI "int pfm_find_full_event(const char *"str ", pfmlib_event_t *"e ");" .BI "int pfm_find_event_bycode(int "code ", unsigned int *"desc ");" .BI "int pfm_find_event_bycode_next(unsigned int "desc1 ", int "code ", unsigned int *"desc ");" .BI "int pfm_find_event_mask(unsigned int "idx ", const char *"str ", unsigned int *"mask_idx ");" .sp .SH DESCRIPTION The PMU counters can be programmed to count the number of occurrences of certain events. The number of events varies from one PMU model to the other. Each event has a name and a code which is used to program the actual PMU register. Some event may need to be further qualified with unit masks. .sp The library does not directly expose the event code, nor unit mask code, to user applications because it is not necessary. Instead applications use names to query the library for particular information about events. Given an event name, the library returns an opaque descriptor. Each descriptor is unique and has no relationship to the event code. .sp The set of functions described here can be used to get an event descriptor given either the name of the event or its code. Several events may share the same code. An event name is a string structured as: event_name[:unit_mask1[:unit_mask2]]. .sp The \fBpfm_find_event()\fR function is a general purpose search routine. Given an event name in \fBstr\fR, it returns the descriptor for the corresponding event. If unit masks are provided, they are not taken into account. This function is being \fBdeprecated\fR in favor of the \fBpfm_find_full_event()\fR function. .sp The \fBpfm_find_full_event()\fR function is the general purpose search routine. Given an event name in \fBstr\fR, it returns in \fBev\fR, the full event descriptor that includes the event descriptor in \fBev->event\fR and the unit mask descriptors in \fBev->unit_masks\fR. The number of unit masks descriptors returned is indicated in \fBev->num_masks\fR. Unit masks are specified as a colon separated list of unit mask names, exact values or value combinations. For instance, if event A supports unit masks M1 (0x1) and M2 (0x40), and both unit masks are to be measured, then the following values for \fBstr\fR are valid: "A:M1:M2", "A:M1:0x40", "A:M2:0x1", "A:0x1:0x40", "A:0x41". .sp The \fBpfm_find_event_bycode()\fR function searches for an event given its \fBcode\fR represented as an integer. It returns in \fBdesc\fR, the event code. Unit masks are ignored. .sp Because there can be several events with the same code, the library provides the \fBpfm_find_event_bycode_next()\fR function to search for other events with the same code. Given an event \fBdesc1\fR and a \fBcode\fR, this function will look for the next event with the same code. If such an event exists, its descriptor will be stored into \fBdesc\fR. It is not necessary to have called the \fBpfm_find_event_bycode()\fR function prior to calling this function. This function is fully threadsafe as it does not maintain any state between calls. .sp The \fBpfm_find_event_mask()\fR function is used to find the unit mask descriptor based on its name or numerical value passed in \fBstr\fR for the event specified in \fBidx\fR. The numeric value must be an exact match of an existing unit mask value, i.e., all bits must match. Some events do not have unit masks, in which case this function returns an error. .SH RETURN All functions return whether or not the call was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT the library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL the event descriptor is invalid, or the pointer argument is NULL. .TP .B PFMLIB_ERR_NOTFOUND no matching event or unit mask was found. .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_impl_pmcs.30000600003276200002170000000575512247131122022156 0ustar ralphundrgrad.TH LIBPFM 3 "July, 2003" "" "Linux Programmer's Manual" .SH NAME pfm_get_impl_pmcs, pfm_get_impl_pmds, pfm_get_impl_counters, pfm_get_num_counters, pfm_get_num_pmcs, pfm_get_num_pmds, pfm_get_hw_counter_width \- return bitmask of implemented PMU registers or number of PMU registers .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_impl_pmcs(pfmlib_regmask_t *" impl_pmcs ");" .BI "int pfm_get_impl_pmds(pfmlib_regmask_t *" impl_pmds ");" .BI "int pfm_get_impl_counters(pfmlib_regmask_t *" impl_counters ");" .BI "int pfm_get_num_counters(unsigned int *"num ");" .BI "int pfm_get_num_pmcs(unsigned int *"num ");" .BI "int pfm_get_num_pmds(unsigned int *"num ");" .BI "int pfm_get_num_counters(unsigned int *"num ");" .BI "int pfm_get_hw_counter_width(unsigned int *"width ");" .sp .SH DESCRIPTION The \fBpfm_get_impl_*()\fR functions can be used to figure out which PMU registers are implemented on the host CPU. All implemented registers may not necessarily be available to applications. Programs need to query the operating system kernel monitoring interface to figure out the list of available registers. .sp The \fBpfm_get_impl_*()\fR functions all return a bitmask of registers corresponding to the query. The bitmask pointer passed as argument is reset to zero by each function. The returned bitmask must be accessed using the set of functions provided by the library to ensure portability. See related man pages below. .sp The \fBpfm_get_num_*()\fR functions return the number of implemented PMC or PMD registers. Those numbers may be different from the actual number of registers available to applications. .sp The \fBpfm_get_impl_pmcs()\fR function returns in \fBimpl_pmcs\fR the bitmask of implemented PMCS. The \fBpfm_get_impl_pmds()\fR function returns in \fBimpl_pmds\fR the bitmask of implemented PMDS. The \fBpfm_get_impl_counters()\fR function returns in \fBimpl_counters\fR a bitmask of the PMD registers used as counters. Depending on the PMU mode, not all PMD registers are necessarily used as counters. .sp The \fBpfm_get_num_counters()\fR function returns in \fBnum\fR the number of PMD used as counters. A counter is a PMD which is used to accumulate the number of occurrences of an event. The \fBpfm_get_num_pmcs()\fR function returns in \fBnum\fR the number of implemented PMCs by the host PMU. The \fBpfm_get_num_pmds()\fR function returns in \fBnum\fR the number of implemented PMDs by the host PMU. The \fBpfm_get_hw_counter_width()\fR function returns the width in bits of the counters in \fBwidth\fR. PMU implementations can have different number of bits implemented. For instance, Itanium has 32-bit counters, while Itanium 2 has 47-bits. .SH RETURN The function returns whether or not it was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT the library has not been initialized properly. .SH SEE ALSO pfm_regmask_set(3), pfm_regmask_isset(3) .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_hw_counter_width.30000600003276200002170000000003512247131122023531 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_find_event_mask.30000600003276200002170000000003212247131122022447 0ustar ralphundrgrad.so man3/pfm_find_event.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_nehalem.30000600003276200002170000001410212247131122021416 0ustar ralphundrgrad.TH LIBPFM 3 "January, 2009" "" "Linux Programmer's Manual" .SH NAME libpfm_nehalem - support for Intel Nehalem processor family .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the Intel Nehalem processor family, such as Intel Core i7. The interface is defined in \fBpfmlib_intel_nhm.h\fR. It consists of a set of functions and structures describing the Intel Nehalem processor specific PMU features. The Intel Nehalem processor is a quad core, dual thread processor. It includes two types of PMU: core and uncore. The latter measures events at the socket level and is therefore disconnected from any of the four cores. The core PMU implements Intel architectural perfmon version 3 with four generic counters and three fixed counters. The uncore has eight generic counters and one fixed counter. Each Intel Nehalem core also implement a 16-deep branch trace buffer, called Last Branch Record (LBR), which can be used in combination with the core PMU. Intel Nehalem implements a newer version of the Precise Event-Based Sampling (PEBS) mechanism which has the ability to capture where cache misses occur. .sp When Intel Nehalem processor specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The Intel Nehalem processors specific input arguments are described in the \fBpfmlib_nhm_input_param_t\fR structure. No output parameters are currently defined. The input parameters are defined as follows: .sp .nf typedef struct { unsigned long cnt_mask; unsigned int flags; } pfmlib_nhm_counter_t; typedef struct { unsigned int lbr_used; unsigned int lbr_plm; unsigned int lbr_filter; } pfmlib_nhm_lbr_t; typedef struct { unsigned int pebs_used; unsigned int ld_lat_thres; } pfmlib_nhm_pebs_t; typedef struct { pfmlib_nhm_counter_t pfp_nhm_counters[PMU_NHM_NUM_COUNTERS]; pfmlib_nhm_pebs_t pfp_nhm_pebs; pfmlib_nhm_lbr_t pfm_nhm_lbr; uint64_t reserved[4]; } pfmlib_nhm_input_param_t; .fi .sp .sp The Intel Nehalem processor provides a few additional per-event features for counters: thresholding, inversion, edge detection, monitoring of both threads, occupancy. They can be set using the \fBpfp_nhm_counters\fR data structure for each event. The \fBflags\fR field can be initialized with the following values, depending on the event: .TP .B PFMLIB_NHM_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set. This flag is supported for core and uncore PMU events. .TP .B PFMLIB_NHM_SEL_EDGE Enables edge detection of events. This flag is supported for core and uncore PMU events. .TP .B PFMLIB_NHM_SEL_ANYTHR Enable measuring the event in any of the two processor threads assuming hyper-threading is enabled. By default, only the current thread is measured. This flag is restricted to core PMU events. .TP .B PFMLIB_NHM_SEL_OCC_RST When set, the queue occupancy counter associated with the event is cleared. This flag is only available to uncore PMU events. .LP The \fBcnt_mask\fR field is used to set the event threshold. The value of the counter is incremented for each cycle in which the number of occurrences of the event is greater or equal to the value of the field. Thus, the event is modified to actually measure the number of qualifying cycles. When zero all occurrences are counted (this is the default). This flag is supported for core and uncore PMU events. .sp .SH Support for Precise-Event Based Sampling (PEBS) The library can be used to setup the PMC registers associated with PEBS. In this case, the \fBpfp_nhm_pebs_t\fR structure must be used and the \fBpebs_used\fR field must be set to 1. .sp To enable the PEBS load latency filtering capability, it is necessary to program the \fBMEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD\fR event into one generic counter. The latency threshold must be passed to the library in the \fBld_lat_thres\fR field. It is expressed in core cycles and \fBmust\fR greater than 3. Note that \fBpebs_used\fR must be set as well. .SH Support for Last Branch Record (LBR) The library can be used to setup LBR registers. On Intel Nehalem processors, the LBR is 16-entry deep and it is possible to filter branches, based on privilege level or type. To configure the LBR, the \fBpfm_nhm_lbr_t\fR structure must be used. .sp Like core PMU counters, LBR only distinguishes two privilege levels, 0 and the rest (1,2,3). When running Linux natively, the kernel is at privilege level 0, applications at level 3. It is possible to specify the privilege level of LBR using the \fBlbr_plm\fR. Any attempt to pass \fBPFM_PLM1\fB or \fBPFM_PLM2\fR will be rejected. If \fB\lbr_plm\fR is 0, then the global value in \fBpfmlib_input_param_t\fR and the \fBpfp_dfl_plm\fR is used. .sp By default, LBR captures all branches. It is possible to filter out branches by passing a set of flags in \fBlbr_select\fR. The flags are as follows: .TP .B PFMLIB_NHM_LBR_JCC When set, LBR does not capture conditional branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_REL_CALL When set, LBR does not capture near calls. Default: off. .TP .B PFM_NHM_LBR_NEAR_IND_CALL When set, LBR does not capture indirect calls. Default: off. .TP .B PFM_NHM_LBR_NEAR_RET When set, LBR does not capture return branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_IND_JMP When set, LBR does not capture indirect branches. Default: off. .TP .B PFM_NHM_LBR_NEAR_REL_JMP When set, LBR does not capture relative branches. Default: off. .TP .B PFM_NHM_LBR_FAR_BRANCH When set, LBR does not capture far branches. Default: off. .SH Support for uncore PMU By nature, the uncore PMU does not distinguish privilege levels, therefore it captures events at all privilege levels. To avoid any misinterpretation, the library enforces that uncore events be measured with both \fBPFM_PLM0\fR and \fBPFM_PLM3\fR set. Tools and operating system kernel interfaces may impose further restrictions on how the uncore PMU can be accessed. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_p6.30000600003276200002170000000457512247131122020347 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2005" "" "Linux Programmer's Manual" .SH NAME libpfm_i386_p6 - support for Intel P6 processor family .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the P6 processor family, including the Pentium M processor. The interface is defined in \fBpfmlib_i386_p6.h\fR. It consists of a set of functions and structures which describe and allow access to the P6 processors specific PMU features. .sp When P6 processor specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The P6 processors specific input arguments are described in the \fBpfmlib_i386_p6_input_param_t\fR structure and the output parameters in \fBpfmlib_i386_p6_output_param_t\fR. They are defined as follows: .sp .nf typedef struct { unsigned int cnt_mask; unsigned int flags; } pfmlib_i386_p6_counter_t; typedef struct { pfmlib_i386_p6_counter_t pfp_i386_p6_counters[PMU_I386_P6_NUM_COUNTERS]; uint64_t reserved[4]; } pfmlib_i386_p6_input_param_t; typedef struct { uint64_t reserved[8]; } pfmlib_i386_p6_output_param_t; .fi .sp .sp The P6 processor provides a few additional per-event features for counters: thresholding, inversion, edge detection. They can be set using the \fBpfp_i386_p6_counters\fR data structure for each event. The \fBflags\fR field can be initialized as follows: .TP .B PFMLIB_I386_P6_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set .TP .B PFMLIB_I386_P6_SEL_EDGE Enables edge detection of events. .LP The \fBcnt_mask\fR field contains is used to set the event threshold. The value of the counter is incremented each time the number of occurrences per cycle of the event is greater or equal to the value of the field. When zero all occurrences are counted. .sp .SH Handling of Pentium M The library provides full support for the Pentium M PMU. A Pentium implements more events than a generic P6 processor. The library autodetects the host processor and can distinguish generic P6 processor from a Pentium. Thus no special call is needed. .sp .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_cycle_event.30000600003276200002170000000407312247131122022463 0ustar ralphundrgrad.TH LIBPFM 3 "September, 2006" "" "Linux Programmer's Manual" .SH NAME pfm_get_cycle_event, pfm_get_inst_retired_event - get basic event descriptors .SH SYNOPSIS .nf .B #include .sp .BI "int pfm_get_cycle_event(pfmlib_event_t *"ev ");" .BI "int pfm_get_inst_retired_event(pfmlib_event_t *"ev ");" .sp .SH DESCRIPTION In order to build very simple generic examples that work across all PMU models, the library provides a way to retrieve information about two basic events that are present in most PMU models: cycles and instruction retired. The first event, cycles, counts the number of elapsed cycles. The second event, instruction retired, counts the number of instructions that have executed and retired from the processor pipeline. Depending on the PMU model, there may be variations in the exact definition of those events. The library provides this information on a best effort basis. User must refer to PMU model specific documentation to validate the event definition. .sp The \fBpfm_get_cycle_event()\fR function returns in \fBev\fR the event and optional unit mask descriptors for the event that counts elapsed cycles. Depending on the PMU model, there may be unit mask(s) necessary to count cycles. Application must check the value returned in \fBev->num_masks\fR. .sp The \fBpfm_get_inst_retired_event()\fR function returns in \fBev\fR the event and optional unit mask descriptors for the event that counts the number of returned instruction. Depending on the PMU model, there may be unit mask(s) necessary to count retired instructions. Application must check the value returned in \fBev->num_masks\fR. .SH RETURN All functions return whether or not the call was successful. A return value of \fBPFMLIB_SUCCESS\fR indicates success, otherwise the value is the error code. .SH ERRORS .B PFMLIB_ERR_NOINIT the library has not been initialized properly. .TP .B PFMLIB_ERR_INVAL the \fBev\fR parameter is NULL. .TP .B PFMLIB_ERR_NOTSUPP the host PMU does not define an event to count cycles or instructions retired. .TP .SH AUTHOR Stephane Eranian .PP papi-5.3.0/src/libpfm-3.y/docs/man3/pfm_get_impl_counters.30000600003276200002170000000003512247131122023040 0ustar ralphundrgrad.so man3/pfm_get_impl_pmcs.3 papi-5.3.0/src/libpfm-3.y/docs/man3/libpfm_amd64.30000600003276200002170000001306512247131122020727 0ustar ralphundrgrad.TH LIBPFM 3 "April, 2008" "" "Linux Programmer's Manual" .SH NAME libpfm_amd64 - support for AMD64 processors .SH SYNOPSIS .nf .B #include .B #include .sp .SH DESCRIPTION The libpfm library provides full support for the AMD64 processor families 0Fh and 10H (K8, Barcelona, Phenom) when running in either 32-bit or 64-bit mode. The interface is defined in \fBpfmlib_amd64.h\fR. It consists of a set of functions and structures which describe and allow access to the AMD64 specific PMU features. Note that it only supports AMD processors. .sp When AMD64 processor-specific features are needed to support a measurement, their descriptions must be passed as model-specific input arguments to the \fBpfm_dispatch_events()\fR function. The AMD64 processor-specific input arguments are described in the \fBpfmlib_amd64_input_param_t\fR structure and the output parameters in \fBpfmlib_amd64_output_param_t\fR. They are defined as follows: .sp .nf typedef struct { uint32_t cnt_mask; uint32_t flags; } pfmlib_amd64_counter_t; typedef struct { unsigned int maxcnt; unsigned int options; } ibs_param_t; typedef struct { pfmlib_amd64_counter_t pfp_amd64_counters[PMU_AMD64_MAX_COUNTERS]; uint32_t flags; uint32_t reserved1; ibs_param_t ibsfetch; ibs_param_t ibsop; uint64_t reserved2; } pfmlib_amd64_input_param_t; typedef struct { uint32_t ibsfetch_base; uint32_t ibsop_base; uint64_t reserved[7]; } pfmlib_amd64_output_param_t; .fi .LP The \fBflags\fR field of \fBpfmlib_amd64_input_param_t\fR describes which features of the PMU to use. Following use flags exist: .TP .B PFMLIB_AMD64_USE_IBSFETCH Profile IBS fetch performance (see below under \fBINSTRUCTION BASED SAMPLING\fR) .TP .B PFMLIB_AMD64_USE_IBSOP Profile IBS execution performance (see below under \fBINSTRUCTION BASED SAMPLING\fR) .LP Multiple features can be selected. Note that there are no use flags needed for \fBADDITIONAL PER-EVENT FEATURES\fR. .LP Various typedefs for MSR encoding and decoding are available. See \fBpfmlib_amd64.h\fR for details. .SS ADDITIONAL PER-EVENT FEATURES AMD64 processors provide a few additional per-event features for counters: thresholding, inversion, edge detection, virtualization. They can be set using the \fBpfp_amd64_counters\fR data structure for each event. The \fBflags\fR field of \fBpfmlib_amd64_counter_t\fR can be initialized as follows: .TP .B PFMLIB_AMD64_SEL_INV Inverse the results of the \fBcnt_mask\fR comparison when set .TP .B PFMLIB_AMD64_SEL_EDGE Enables edge detection of events. .TP .B PFMLIB_AMD64_SEL_GUEST On AMD64 Family 10h processors only. Event is only measured when processor is in guest mode. .TP .B PFMLIB_AMD64_SEL_HOST On AMD64 Family 10h processors only. Event is only measured when processor is in host mode. .LP The \fBcnt_mask\fR field is used to set the event threshold. The value of the counter is incremented each time the number of occurrences per cycle of the event is greater or equal to the value of the field. When zero all occurrences are counted. .SS INSTRUCTION BASED SAMPLING (IBS) The libpfm_amd64 provides access to the model specific feature Instruction Based Sampling (IBS). IBS has been introduced with family 10h. .LP The IBS setup is using the structures \fBpfmlib_amd64_input_param_t\fR and \fBpfmlib_amd64_output_param_t\fR with its members \fBflags\fR, \fBibsfetch\fR, \fBibsop\fR, \fBibsfetch_base\fR, \fBibsop_base\fR. The input arguments \fBibsop\fR and \fBibsfetch\fR can be set in inp_mod (type \fBpfmlib_amd64_input_param_t\fR). The corresponding \fBflags\fR must be set to enable a feature. .LP Both, IBS execution profiling and IBS fetch profiling, require a maximum count value of the periodic counter (\fBmaxcnt\fR) as parameter. This is a 20 bit value, bits 3:0 are always set to zero. Additionally, there is an option (\fBoptions\fR) to enable randomization (\fBIBS_OPTIONS_RANDEN\fR) for IBS fetch profiling. .LP The IBS registers IbsFetchCtl (0xC0011030) and IbsOpCtl (0xC0011033) are available as PMC and PMD in Perfmon. The function \fBpfm_dispatch_events()\fR initializes these registers according to the input parameters in \fBpfmlib_amd64_input_param_t\fR. .LP Also, \fBpfm_dispatch_events()\fR passes back the index in pfp_pmds[] of the IbsOpCtl and IbsFetchCtl register. For this there are the entries \fBibsfetch_base\fR and \fBibsop_base\fR in \fBpfmlib_amd64_output_param_t\fR. The index may vary depending on other PMU settings, especially counter settings. If using the PMU with only one IBS feature and no counters, the index of the base register is 0. .LP Example code: .LP .nf /* initialize IBS */ inp_mod.ibsop.maxcnt = 0xFFFF0; inp_mod.flags |= PFMLIB_AMD64_USE_IBSOP; ret = pfm_dispatch_events(NULL, &inp_mod, &outp, &outp_mod); if (ret != PFMLIB_SUCCESS) { ... } /* setup PMU */ /* PMC_IBSOPCTL */ pc[0].reg_num = outp.pfp_pmcs[0].reg_num; pc[0].reg_value = outp.pfp_pmcs[0].reg_value; /* PMD_IBSOPCTL */ pd[0].reg_num = outp.pfp_pmds[0].reg_num; pd[0].reg_value = 0; /* setup sampling */ pd[0].reg_flags = PFM_REGFL_OVFL_NOTIFY; /* add range check here */ pd[0].reg_smpl_pmds[0] = ((1UL << PMD_IBSOP_NUM) - 1) << outp.pfp_pmds[0].reg_num; /* write pc and pd to PMU */ ... .fi .SH ERRORS Refer to the description of the \fBpfm_dispatch_events()\fR function for errors. .SH SEE ALSO pfm_dispatch_events(3) and set of examples shipped with the library .SH AUTHORS .nf Stephane Eranian Robert Richter .if .PP papi-5.3.0/src/libpfm-3.y/Makefile0000600003276200002170000000432112247131122016244 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # # # Look in config.mk for options # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi) include config.mk DIRS=lib include docs EXAMPLES_DIRS = examples_v2.x ifneq ($(CONFIG_PFMLIB_OLD_PFMV2),y) EXAMPLES_DIRS += examples_v3.x endif ifeq ($(ARCH),ia64) DIRS +=examples_ia64_v2.0 endif ifeq ($(SYS),Linux) DIRS +=libpfms endif DIRS += $(EXAMPLES_DIRS) all: @echo Compiling for \'$(ARCH)\' target @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done lib: $(MAKE) -C lib clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done distclean: clean depend: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done tar: clean a=`basename $$PWD`; cd ..; tar zcf $$a.tar.gz $$a; echo generated ../$$a.tar.gz; tarcvs: clean a=`basename $$PWD`; cd ..; tar --exclude=CVS -zcf $$a.tar.gz $$a; echo generated ../$$a.tar.gz; install: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done install_examples: @set -e ; for d in $(EXAMPLES_DIRS) ; do $(MAKE) -C $$d $@ ; done .PHONY: tar tarcvs lib # DO NOT DELETE papi-5.3.0/src/libpfm-3.y/README0000600003276200002170000000670112247131122015470 0ustar ralphundrgrad ------------------------------------------------------ libpfm-3.10: a helper library to program the Performance Monitoring Unit (PMU) ------------------------------------------------------ Copyright (c) 2001-2007 Hewlett-Packard Development Company, L.P. Contributed by Stephane Eranian This package provides a library, called libpfm, which can be used to develop monitoring tools which use the Performance Monitoring Unit (PMU) of several modern processors. This version of libpfm supports: - For Intel IA-64: Itanium (Merced), Itanium 2 (McKinley, Madison, Deerfield), Itanium 2 9000/9100 (Montecito, Montvale) and Generic - For AMD X86: AMD64 (K8, family 10h) - For Intel X86: Intel P6 (Pentium II, Pentium Pro, Pentium III, Pentium M) Intel Yonah (Core Duo/Core Solo), Intel Netburst (Pentium 4, Xeon) Intel Core (Merom, Penryn, Dunnington) Core 2 and Quad Intel Atom Intel Nehalem (Nehalem, Westmere) Intel architectural perfmon v1, v2, v3 - For MIPS: 5K, 20K, 25KF, 34K, 5KC, 74K, R10000, R12000, RM7000, RM9000, SB1, VR5432, VR5500, SiCortex ICA9A/ICE9B - For Cray: XT3, XT4, XT5, XT5h, X2 - For IBM: IBM Cell processor POWER: PPC970, PPC970MP, POWER4+, POWER5, POWER5+, POWER6, POWER7 - For Sun: Sparc: Ultra12, Ultra3, Ultra3i, Ultra3Plus, Ultra4Plus, Sparc: Niagara1, Niagara2 The core library is generic and does not depend on the perfmon interface. It is possible to use it on other operating systems. WHAT'S THERE ------------- - the library source code including support for all processors listed above - a set of examples showing how the library can be used with the perfmon2 and perfmon3 kernel interface. - a set of older examples for IA-64 only using the legacy perfmon2 interface (v2.0). - a set of library header files and the perfmon2 and perfmon3 kernel interface headers - libpfms: a simple library to help setup SMP system-wide monitoring sessions. It comes with a simple example. This library is not part of libpfm. - man pages for all the library entry points - Python bindings for libpfm and the perfmon interface (experimental). INSTALLATION ------------ - edit config.mk to : - update some of the configuration variables - make your compiler options - type make - type make install - To compile and install the Python bindings, you need to go to the python sub-directory and type make. Python is not systematically built - to compile the library for another ABI (e.g. 32-bit x86 on a 64-bit x86) system, you can pass the ABI flag to the compiler as follows (assuming you have the multilib version of gcc): $ make OPTION="-m32 -O2" REQUIREMENTS: ------------- - to run the programs in the examples subdir, you MUST be using a linux kernel with perfmon3. Perfmon3 is available as a branch of the perfmon kernel GIT tree on kernel.org. - to run the programs in the examples_v2x subdir, you MUST be using a linux kernel with perfmon2. Perfmon2 is available as the main branch of the perfmon kernel GIT tree on kernel.org. - On IA-64, the examples in old_interface_ia64_examples work with any 2.6.x kernels. - to compile the Python bindings, you need to have SWIG and the python development packages installed DOCUMENTATION ------------- - man pages for all entry points - More information can be found on library web site: http://perfmon2.sf.net papi-5.3.0/src/libpfm-3.y/rules.mk0000600003276200002170000000310112247131123016263 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # # This file is part of libpfm, a performance monitoring support library for # applications on Linux/ia64. # .SUFFIXES: .c .S .o .lo .cpp .PHONY: all clean distclean depend install install_examples .S.o: $(CC) $(CFLAGS) -c $*.S .c.o: $(CC) $(CFLAGS) -c $*.c .cpp.o: $(CXX) $(CFLAGS) -c $*.cpp .c.lo: $(CC) -fPIC -DPIC $(CFLAGS) -c $*.c -o $*.lo .S.lo: $(CC) -fPIC -DPIC $(CFLAGS) -c $*.S -o $*.lo papi-5.3.0/src/libpfm-3.y/python/0000700003276200002170000000000012247131123016124 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/python/Makefile0000600003276200002170000000242312247131123017567 0ustar ralphundrgrad# # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # all: ./setup.py build install: ./setup.py install clean: $(RM) src/perfmon_int_wrap.c src/perfmon_int.py src/*.pyc $(RM) -r build papi-5.3.0/src/libpfm-3.y/python/README0000600003276200002170000000037112247131123017007 0ustar ralphundrgradRequirements: To use the python bindings, you need the following packages: 1. swig (http://www.swig.org) 2. python-dev (http://www.python.org) 3. pycpuid (http://code.google.com/p/pycpuid) linux.sched is python package that comes with pycpuid. papi-5.3.0/src/libpfm-3.y/python/sys.py0000700003276200002170000000437312247131123017326 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # # System wide monitoring example. Copied from syst.c # # Run as: ./sys.py -c cpulist -e eventlist import sys import os from optparse import OptionParser import time from perfmon import * if __name__ == '__main__': parser = OptionParser() parser.add_option("-e", "--events", help="Events to use", action="store", dest="events") parser.add_option("-c", "--cpulist", help="CPUs to monitor", action="store", dest="cpulist") parser.set_defaults(cpu=0) (options, args) = parser.parse_args() cpus = options.cpulist.split(',') cpus = [ int(c) for c in cpus ] try: s = SystemWideSession(cpus) if options.events: events = options.events.split(",") else: raise "You need to specify events to monitor" s.dispatch_events(events) s.load() # Measuring loop for i in range(1, 10): s.start() time.sleep(1) s.stop() # Print the counts for cpu in xrange(len(cpus)): for i in xrange(s.npmds): print "CPU%d.PMD%d\t%lu""" % (cpu, s.pmds[cpu][i].reg_num, s.pmds[cpu][i].reg_value) finally: s.cleanup() papi-5.3.0/src/libpfm-3.y/python/self.py0000700003276200002170000000401212247131123017427 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # # Self monitoring example. Copied from self.c import os from optparse import OptionParser import random import errno from perfmon import * if __name__ == '__main__': parser = OptionParser() parser.add_option("-e", "--events", help="Events to use", action="store", dest="events") (options, args) = parser.parse_args() s = PerThreadSession(int(os.getpid())) if options.events: events = options.events.split(",") else: raise "You need to specify events to monitor" s.dispatch_events(events) s.load() s.start() # code to be measured # # note that this is not identical to what examples/self.c does # thus counts will be different in the end for i in range(1, 10000000): random.random() s.stop() # read the counts for i in xrange(s.npmds): print """PMD%d\t%lu""" % (s.pmds[0][i].reg_num, s.pmds[0][i].reg_value) papi-5.3.0/src/libpfm-3.y/python/src/0000700003276200002170000000000012247131123016713 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/python/src/perfmon_int.i0000600003276200002170000001276512247131123021422 0ustar ralphundrgrad/* * Copyright (c) 2008 Google, Inc. * Contributed by Arun Sharma * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR * OTHER DEALINGS IN THE SOFTWARE. * * Python Bindings for perfmon. */ %module perfmon_int %{ #include #include static PyObject *libpfm_err; %} %include "carrays.i" %include "cstring.i" %include /* Some typemaps for corner cases SWIG can't handle */ /* Convert from Python --> C */ %typemap(memberin) pfmlib_event_t[ANY] { int i; for (i = 0; i < $1_dim0; i++) { $1[i] = $input[i]; } } %typemap(out) pfmlib_event_t[ANY] { int len, i; len = $1_dim0; $result = PyList_New(len); for (i = 0; i < len; i++) { PyObject *o = SWIG_NewPointerObj(SWIG_as_voidptr(&$1[i]), SWIGTYPE_p_pfmlib_event_t, 0 | 0 ); PyList_SetItem($result, i, o); } } /* Convert from Python --> C */ %typemap(memberin) pfmlib_reg_t[ANY] { int i; for (i = 0; i < $1_dim0; i++) { $1[i] = $input[i]; } } %typemap(out) pfmlib_reg_t[ANY] { int len, i; len = $1_dim0; $result = PyList_New(len); for (i = 0; i < len; i++) { PyObject *o = SWIG_NewPointerObj(SWIG_as_voidptr(&$1[i]), SWIGTYPE_p_pfmlib_reg_t, 0 | 0 ); PyList_SetItem($result, i, o); } } /* Convert libpfm errors into exceptions */ %typemap(out) os_err_t { if (result == -1) { PyErr_SetFromErrno(PyExc_OSError); SWIG_fail; } resultobj = SWIG_From_int((int)(result)); }; %typemap(out) pfm_err_t { if (result != PFMLIB_SUCCESS) { PyObject *obj = Py_BuildValue("(i,s)", result, pfm_strerror(result)); PyErr_SetObject(libpfm_err, obj); SWIG_fail; } else { PyErr_Clear(); } resultobj = SWIG_From_int((int)(result)); } /* Convert libpfm errors into exceptions */ %typemap(out) os_err_t { if (result == -1) { PyErr_SetFromErrno(PyExc_OSError); SWIG_fail; } resultobj = SWIG_From_int((int)(result)); }; %typemap(out) pfm_err_t { if (result != PFMLIB_SUCCESS) { PyObject *obj = Py_BuildValue("(i,s)", result, pfm_strerror(result)); PyErr_SetObject(libpfm_err, obj); SWIG_fail; } else { PyErr_Clear(); } resultobj = SWIG_From_int((int)(result)); } %cstring_output_maxsize(char *name, size_t maxlen) %cstring_output_maxsize(char *name, int maxlen) %extend pfmlib_regmask_t { unsigned int weight() { unsigned int w = 0; pfm_regmask_weight($self, &w); return w; } } /* Kernel interface */ %include %array_class(pfarg_pmc_t, pmc) %array_class(pfarg_pmd_t, pmd) /* Library interface */ %include %extend pfarg_ctx_t { void zero() { memset(self, 0, sizeof(self)); } } %extend pfarg_load_t { void zero() { memset(self, 0, sizeof(self)); } } %init %{ libpfm_err = PyErr_NewException("perfmon.libpfmError", NULL, NULL); PyDict_SetItemString(d, "libpfmError", libpfm_err); %} %inline %{ /* Helper functions to avoid pointer classes */ int pfm_py_get_pmu_type(void) { int tmp = -1; pfm_get_pmu_type(&tmp); return tmp; } unsigned int pfm_py_get_hw_counter_width(void) { unsigned int tmp = 0; pfm_get_hw_counter_width(&tmp); return tmp; } unsigned int pfm_py_get_num_events(void) { unsigned int tmp = 0; pfm_get_num_events(&tmp); return tmp; } int pfm_py_get_event_code(int idx) { int tmp = 0; pfm_get_event_code(idx, &tmp); return tmp; } unsigned int pfm_py_get_num_event_masks(int idx) { unsigned int tmp = 0; pfm_get_num_event_masks(idx, &tmp); return tmp; } unsigned int pfm_py_get_event_mask_code(int idx, int i) { unsigned int tmp = 0; pfm_get_event_mask_code(idx, i, &tmp); return tmp; } #define PFMON_MAX_EVTNAME_LEN 128 PyObject *pfm_py_get_event_name(int idx) { char name[PFMON_MAX_EVTNAME_LEN]; pfm_get_event_name(idx, name, PFMON_MAX_EVTNAME_LEN); return PyString_FromString(name); } PyObject *pfm_py_get_event_mask_name(int idx, int i) { char name[PFMON_MAX_EVTNAME_LEN]; pfm_get_event_mask_name(idx, i, name, PFMON_MAX_EVTNAME_LEN); return PyString_FromString(name); } PyObject *pfm_py_get_event_description(int idx) { char *desc; PyObject *ret; pfm_get_event_description(idx, &desc); ret = PyString_FromString(desc); free(desc); return ret; } PyObject *pfm_py_get_event_mask_description(int idx, int i) { char *desc; PyObject *ret; pfm_get_event_mask_description(idx, i, &desc); ret = PyString_FromString(desc); free(desc); return ret; } %} papi-5.3.0/src/libpfm-3.y/python/src/session.py0000600003276200002170000001611412247131123020755 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # from perfmon import * from linux import sched import os import sys from threading import Thread # Shouldn't be necessary for python version >= 2.5 from Queue import Queue # http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/425445 def once(func): "A decorator that runs a function only once." def decorated(*args, **kwargs): try: return decorated._once_result except AttributeError: decorated._once_result = func(*args, **kwargs) return decorated._once_result return decorated @once def pfm_initialize_once(): # Initialize once opts = pfmlib_options_t() opts.pfm_verbose = 1 pfm_set_options(opts) pfm_initialize() # Common base class class Session: def __init__(self, n): self.system = System() pfm_initialize_once() # Setup context self.ctxts = [] self.fds = [] self.inps = [] self.outps = [] self.pmcs = [] self.pmds = [] for i in xrange(n): ctx = pfarg_ctx_t() ctx.zero() ctx.ctx_flags = self.ctx_flags fd = pfm_create_context(ctx, None, None, 0) self.ctxts.append(ctx) self.fds.append(fd) def __del__(self): if self.__dict__.has_key("fds"): for fd in self.fds: os.close(fd) def dispatch_event_one(self, events, which): # Select and dispatch events inp = pfmlib_input_param_t() for i in xrange(0, len(events)): pfm_find_full_event(events[i], inp.pfp_events[i]) inp.pfp_dfl_plm = self.default_pl inp.pfp_flags = self.pfp_flags outp = pfmlib_output_param_t() cnt = len(events) inp.pfp_event_count = cnt pfm_dispatch_events(inp, None, outp, None) # pfp_pm_count may be > cnt cnt = outp.pfp_pmc_count pmcs = pmc(outp.pfp_pmc_count) pmds = pmd(outp.pfp_pmd_count) for i in xrange(outp.pfp_pmc_count): npmc = pfarg_pmc_t() npmc.reg_num = outp.pfp_pmcs[i].reg_num npmc.reg_value = outp.pfp_pmcs[i].reg_value pmcs[i] = npmc self.npmds = outp.pfp_pmd_count for i in xrange(outp.pfp_pmd_count): npmd = pfarg_pmd_t() npmd.reg_num = outp.pfp_pmds[i].reg_num pmds[i] = npmd # Program PMCs and PMDs fd = self.fds[which] pfm_write_pmcs(fd, pmcs, outp.pfp_pmc_count) pfm_write_pmds(fd, pmds, outp.pfp_pmd_count) # Save all the state in various vectors self.inps.append(inp) self.outps.append(outp) self.pmcs.append(pmcs) self.pmds.append(pmds) def dispatch_events(self, events): for i in xrange(len(self.fds)): self.dispatch_event_one(events, i) def load_one(self, i): fd = self.fds[i] load = pfarg_load_t() load.zero() load.load_pid = self.targets[i] try: pfm_load_context(fd, load) except OSError, err: import errno if (err.errno == errno.EBUSY): err.strerror = "Another conflicting perfmon session?" raise err def load(self): for i in xrange(len(self.fds)): self.load_one(i) def start_one(self, i): pfm_start(self.fds[i], None) def start(self): for i in xrange(len(self.fds)): self.start_one(i) def stop_one(self, i): fd = self.fds[i] pmds = self.pmds[i] pfm_stop(fd) pfm_read_pmds(fd, pmds, self.npmds) def stop(self): for i in xrange(len(self.fds)): self.stop_one(i) class PerfmonThread(Thread): def __init__(self, session, i, cpu): Thread.__init__(self) self.cpu = cpu self.session = session self.index = i self.done = 0 self.started = 0 def run(self): queue = self.session.queues[self.index] exceptions = self.session.exceptions[self.index] cpu_set = sched.cpu_set_t() cpu_set.set(self.cpu) sched.setaffinity(0, cpu_set) while not self.done: # wait for a command from the master method = queue.get() try: method(self.session, self.index) except: exceptions.put(sys.exc_info()) queue.task_done() break queue.task_done() def run_in_other_thread(func): "A decorator that runs a function in another thread (second argument)" def decorated(*args, **kwargs): self = args[0] i = args[1] # Tell thread i to call func() self.queues[i].put(func) self.queues[i].join() if not self.exceptions[i].empty(): exc = self.exceptions[i].get() # Let the main thread know we had an exception self.exceptions[i].put(exc) print "CPU: %d, exception: %s" % (i, exc) raise exc[1] return decorated class SystemWideSession(Session): def __init__(self, cpulist): self.default_pl = PFM_PLM3 | PFM_PLM0 self.targets = cpulist self.ctx_flags = PFM_FL_SYSTEM_WIDE self.pfp_flags = PFMLIB_PFP_SYSTEMWIDE self.threads = [] self.queues = [] self.exceptions = [] n = len(cpulist) for i in xrange(n): t = PerfmonThread(self, i, cpulist[i]) self.threads.append(t) self.queues.append(Queue(0)) self.exceptions.append(Queue(0)) t.start() Session.__init__(self, n) def __del__(self): self.cleanup() Session.__del__(self) def cleanup(self): for t in self.threads: t.done = 1 # join only threads with no exceptions if self.exceptions[t.index].empty(): if t.started: self.stop_one(t.index) else: self.wakeup(t.index) t.join() self.threads = [] @run_in_other_thread def load_one(self, i): Session.load_one(self, i) @run_in_other_thread def start_one(self, i): Session.start_one(self, i) self.threads[i].started = 1 @run_in_other_thread def stop_one(self, i): Session.stop_one(self, i) self.threads[i].started = 0 @run_in_other_thread def wakeup(self, i): "Do nothing. Just wakeup the other thread" pass class PerThreadSession(Session): def __init__(self, pid): self.targets = [pid] self.default_pl = PFM_PLM3 self.ctx_flags = 0 self.pfp_flags = 0 Session.__init__(self, 1) def __del__(self): Session.__del__(self) papi-5.3.0/src/libpfm-3.y/python/src/pmu.py0000600003276200002170000000663512247131123020102 0ustar ralphundrgrad#!/usr/bin/env python # # Copyright (c) 2008 Google, Inc. # Contributed by Arun Sharma # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), # to deal in the Software without restriction, including without limitation # the rights to use, copy, modify, merge, publish, distribute, sublicense, # and/or sell copies of the Software, and to permit persons to whom the # Software is furnished to do so, subject to the following conditions: # # The above copyright notice and this permission notice shall be included # in all copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR # OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, # ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR # OTHER DEALINGS IN THE SOFTWARE. # import os from perfmon import * def public_members(self): s = "{ " for k, v in self.__dict__.iteritems(): if not k[0] == '_': s += "%s : %s, " % (k, v) s += " }" return s class System: def __init__(self): self.ncpus = os.sysconf('SC_NPROCESSORS_ONLN') self.pmu = PMU() def __repr__(self): return public_members(self) class Event: def __init__(self): pass def __repr__(self): return '\n' + public_members(self) class EventMask: def __init__(self): pass def __repr__(self): return '\n\t' + public_members(self) class PMU: def __init__(self): pfm_initialize() self.type = pfm_py_get_pmu_type() self.name = pfm_get_pmu_name(PFMON_MAX_EVTNAME_LEN)[1] self.width = pfm_py_get_hw_counter_width() # What does the PMU support? self.__implemented_pmcs = pfmlib_regmask_t() self.__implemented_pmds = pfmlib_regmask_t() self.__implemented_counters = pfmlib_regmask_t() pfm_get_impl_pmcs(self.__implemented_pmcs) pfm_get_impl_pmds(self.__implemented_pmds) pfm_get_impl_counters(self.__implemented_counters) self.implemented_pmcs = self.__implemented_pmcs.weight() self.implemented_pmds = self.__implemented_pmds.weight() self.implemented_counters = self.__implemented_counters.weight() self.__events = None def __parse_events(self): nevents = pfm_py_get_num_events() self.__events = [] for idx in range(0, nevents): e = Event() e.name = pfm_py_get_event_name(idx) e.code = pfm_py_get_event_code(idx) e.__counters = pfmlib_regmask_t() pfm_get_event_counters(idx, e.__counters) # Now the event masks e.masks = [] nmasks = pfm_py_get_num_event_masks(idx) for mask_idx in range(0, nmasks): em = EventMask() em.name = pfm_py_get_event_mask_name(idx, mask_idx) em.code = pfm_py_get_event_mask_code(idx, mask_idx) em.desc = pfm_py_get_event_mask_description(idx, mask_idx) e.masks.append(em) self.__events.append(e) def events(self): if not self.__events: self.__parse_events() return self.__events def __repr__(self): return public_members(self) if __name__ == '__main__': from perfmon import * s = System() print s print s.pmu.events() papi-5.3.0/src/libpfm-3.y/python/src/__init__.py0000600003276200002170000000010212247131123021017 0ustar ralphundrgradfrom perfmon_int import * from pmu import * from session import * papi-5.3.0/src/libpfm-3.y/python/setup.py0000700003276200002170000000124012247131123017636 0ustar ralphundrgrad#!/usr/bin/env python from distutils.core import setup, Extension from distutils.command.install_data import install_data setup(name='perfmon', version='0.1', author='Arun Sharma', author_email='arun.sharma@google.com', description='libpfm wrapper', packages=['perfmon'], package_dir={ 'perfmon' : 'src' }, py_modules=['perfmon.perfmon_int'], ext_modules=[Extension('perfmon._perfmon_int', sources = ['src/perfmon_int.i'], libraries = ['pfm'], library_dirs = ['../lib'], include_dirs = ['../include'], swig_opts=['-I../include'])]) papi-5.3.0/src/libpfm-3.y/TODO0000600003276200002170000000043312247131122015274 0ustar ralphundrgradTODO list: ---------- - add Linux/ia64 perfmon support to GNU libc, this would avoid having the perfmon.h perfmon_default_smpl.h headers here. - add library interface to help setup system-wide mode SMP on Linux/ia64 - add support for cumulative calls to pfm_dispatch_events() papi-5.3.0/src/libpfm-3.y/libpfms/0000700003276200002170000000000012247131123016237 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/libpfms/syst_smp.c0000600003276200002170000001575612247131123020304 0ustar ralphundrgrad/* * syst_smp.c - system-wide monitoring for SMP machine using libpfms helper * library * * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static uint32_t popcount(uint64_t c) { uint32_t count = 0; for(; c; c>>=1) { if (c & 0x1) count++; } return count; } int main(int argc, char **argv) { pfarg_ctx_t ctx; pfarg_pmc_t pc[NUM_PMCS]; pfarg_pmd_t *pd; pfmlib_input_param_t inp; pfmlib_output_param_t outp; uint64_t cpu_list; void *desc; unsigned int num_counters; uint32_t i, j, l, k, ncpus, npmds; size_t len; int ret; char *name; if (pfm_initialize() != PFMLIB_SUCCESS) fatal_error("cannot initialize libpfm\n"); if (pfms_initialize()) fatal_error("cannot initialize libpfms\n"); memset(&ctx, 0, sizeof(ctx)); memset(pc, 0, sizeof(pc)); ncpus = (uint32_t)sysconf(_SC_NPROCESSORS_ONLN); if (ncpus == -1) fatal_error("cannot retrieve number of online processors\n"); if (argc > 1) { cpu_list = strtoul(argv[1],NULL,0); if (popcount(cpu_list) > ncpus) fatal_error("too many processors specified\n"); } else { cpu_list = ((1< num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; /* * indicate we are using the monitors for a system-wide session. * This may impact the way the library sets up the PMC values. */ inp.pfp_flags = PFMLIB_PFP_SYSTEMWIDE; /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); npmds = ncpus * inp.pfp_event_count; printf("ncpus=%u npmds=%u\n", ncpus, npmds); pd = calloc(npmds, sizeof(pfarg_pmd_t)); if (pd == NULL) fatal_error("cannot allocate pd array\n"); for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } /* * We use inp.pfp_event_count PMD registers for our events per-CPU. * We need to setup the PMDs we use. They are determined based on the * PMC registers used. The following loop prepares the pd[] array * for pfm_write_pmds(). With libpfms, on PMD write we need to pass * only pfp_event_count PMD registers. But on PMD read, we need * to pass pfp_event_count PMD registers per-CPU because libpfms * does not aggregate counts. To prepapre for PMD read, we therefore * propagate the PMD setup beyond just the first pfp_event_count * elements of pd[]. */ for(l=0, k= 0; l < ncpus; l++) { for (i=0; i < outp.pfp_pmd_count; i++, k++) pd[k].reg_num = outp.pfp_pmds[i].reg_num; } /* * create a context on all CPUs we asked for * * libpfms only works for system-wide, so we set the flag in * the master context. the context argument is not modified by * call. * * desc is an opaque descriptor used to identify session. */ ctx.ctx_flags = PFM_FL_SYSTEM_WIDE; ret = pfms_create(&cpu_list, 1, &ctx, NULL, &desc); if (ret == -1) fatal_error("create error %d\n", ret); /* * program the PMC registers on all CPUs of interest */ ret = pfms_write_pmcs(desc, pc, outp.pfp_pmc_count); if (ret == -1) fatal_error("write_pmcs error %d\n", ret); /* * program the PMD registers on all CPUs of interest */ ret = pfms_write_pmds(desc, pd, outp.pfp_pmd_count); if (ret == -1) fatal_error("write_pmds error %d\n", ret); /* * load context on all CPUs of interest */ ret = pfms_load(desc); if (ret == -1) fatal_error("load error %d\n", ret); printf("monitoring for 10s on all CPUs\n"); /* * start monitoring on all CPUs of interest */ ret = pfms_start(desc); if (ret == -1) fatal_error("start error %d\n", ret); /* * stop and listen to activity for 10s */ sleep(10); /* * stop monitoring on all CPUs of interest */ ret = pfms_stop(desc); if (ret == -1) fatal_error("stop error %d\n", ret); /* * read the PMD registers on all CPUs of interest. * The pd[] array must be organized such that to * read 2 PMDs on each CPU you need: * - 2 * number of CPUs of interest * - the first 2 elements of pd[] read on CPU0 * - the next 2 elements of pd[] read on CPU1 * - and so on */ ret = pfms_read_pmds(desc, pd, npmds); if (ret == -1) fatal_error("read_pmds error %d\n", ret); /* * print per-CPU results */ for(j=0, k= 0; j < ncpus; j++) { for (i=0; i < inp.pfp_event_count; i++, k++) { pfm_get_full_event_name(&inp.pfp_events[i], name, len); printf("CPU%-3d PMD%u %20"PRIu64" %s\n", j, pd[k].reg_num, pd[k].reg_value, name); } } /* * destroy context on all CPUs of interest. * After this call desc is invalid */ ret = pfms_close(desc); if (ret == -1) fatal_error("close error %d\n", ret); free(name); return 0; } papi-5.3.0/src/libpfm-3.y/libpfms/Makefile0000600003276200002170000000355112247131123017705 0ustar ralphundrgrad# # Copyright (c) 2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk DIRS=lib CFLAGS+= -pthread -D_GNU_SOURCE -I./include LIBS += -L$(TOPDIR)/libpfms/lib -lpfms $(PFMLIB) -lm TARGETS=syst_smp all: $(TARGETS) syst_smp: ./lib/libpfms.a syst_smp.o $(CC) $(CFLAGS) $(LDFLAGS) -o $@ syst_smp.o $(LIBS) -lpthread clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done $(RM) -f *.o $(TARGETS) *~ distclean: clean .FORCE: lib/libpfms.a lib/libpfms.a: @set -e ; $(MAKE) -C lib all install depend: $(TARGETS) install depend: ifeq ($(CONFIG_PFMLIB_ARCH_SICORTEX),y) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done endif papi-5.3.0/src/libpfm-3.y/libpfms/include/0000700003276200002170000000000012247131123017662 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/libpfms/include/libpfms.h0000600003276200002170000000363012247131123021473 0ustar ralphundrgrad/* * libpfms.h - header file for libpfms - a helper library for perfmon SMP monitoring * * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef __LIBPFMS_H__ #define __LIBPFMS_H__ #ifdef __cplusplus extern "C" { #endif typedef int (*pfms_ovfl_t)(pfarg_msg_t *msg); int pfms_initialize(void); int pfms_create(uint64_t *cpu_list, size_t n, pfarg_ctx_t *ctx, pfms_ovfl_t *ovfl, void **desc); int pfms_write_pmcs(void *desc, pfarg_pmc_t *pmcs, uint32_t n); int pfms_write_pmds(void *desc, pfarg_pmd_t *pmds, uint32_t n); int pfms_read_pmds(void *desc, pfarg_pmd_t *pmds, uint32_t n); int pfms_start(void *desc); int pfms_stop(void *desc); int pfms_close(void *desc); int pfms_unload(void *desc); int pfms_load(void *desc); #ifdef __cplusplus /* extern C */ } #endif #endif /* __LIBPFMS_H__ */ papi-5.3.0/src/libpfm-3.y/libpfms/lib/0000700003276200002170000000000012247131123017005 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/libpfms/lib/libpfms.c0000600003276200002170000004324612247131123020620 0ustar ralphundrgrad#include #include #include #include #include #include #include #include #include #include #include #include #include "libpfms.h" //#define dprint(format, arg...) fprintf(stderr, "%s.%d: " format , __FUNCTION__ , __LINE__, ## arg) #define dprint(format, arg...) typedef enum { CMD_NONE, CMD_CTX, CMD_LOAD, CMD_UNLOAD, CMD_WPMCS, CMD_WPMDS, CMD_RPMDS, CMD_STOP, CMD_START, CMD_CLOSE } pfms_cmd_t; typedef struct _barrier { pthread_mutex_t mutex; pthread_cond_t cond; uint32_t counter; uint32_t max; uint64_t generation; /* avoid race condition on wake-up */ } barrier_t; typedef struct { uint32_t cpu; uint32_t fd; void *smpl_vaddr; size_t smpl_buf_size; } pfms_cpu_t; typedef struct _pfms_thread { uint32_t cpu; pfms_cmd_t cmd; void *data; uint32_t ndata; sem_t cmd_sem; int ret; pthread_t tid; barrier_t *barrier; } pfms_thread_t; typedef struct { barrier_t barrier; uint32_t ncpus; } pfms_session_t; static uint32_t ncpus; static pfms_thread_t *tds; static pthread_mutex_t tds_lock = PTHREAD_MUTEX_INITIALIZER; static int barrier_init(barrier_t *b, uint32_t count) { int r; r = pthread_mutex_init(&b->mutex, NULL); if (r == -1) return -1; r = pthread_cond_init(&b->cond, NULL); if (r == -1) return -1; b->max = b->counter = count; b->generation = 0; return 0; } static void cleanup_barrier(void *arg) { barrier_t *b = (barrier_t *)arg; int r; r = pthread_mutex_unlock(&b->mutex); dprint("free barrier mutex r=%d\n", r); (void) r; } static int barrier_wait(barrier_t *b) { uint64_t generation; int oldstate; pthread_cleanup_push(cleanup_barrier, b); pthread_mutex_lock(&b->mutex); pthread_testcancel(); if (--b->counter == 0) { /* reset barrier */ b->counter = b->max; /* * bump generation number, this avoids thread getting stuck in the * wake up loop below in case a thread just out of the barrier goes * back in right away before all the thread from the previous "round" * have "escaped". */ b->generation++; pthread_cond_broadcast(&b->cond); } else { generation = b->generation; pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &oldstate); while (b->counter != b->max && generation == b->generation) { pthread_cond_wait(&b->cond, &b->mutex); } pthread_setcancelstate(oldstate, NULL); } pthread_mutex_unlock(&b->mutex); pthread_cleanup_pop(0); return 0; } /* * placeholder for pthread_setaffinity_np(). This stuff is ugly * and I could not figure out a way to get it compiled while also preserving * the pthread_*cancel(). There are issues with LinuxThreads and NPTL. I * decided to quit on this and implement my own affinity call until this * settles. */ static int pin_cpu(uint32_t cpu) { uint64_t *mask; size_t size; pid_t pid; int ret; pid = syscall(__NR_gettid); size = ncpus * sizeof(uint64_t); mask = calloc(1, size); if (mask == NULL) { dprint("CPU%u: cannot allocate bitvector\n", cpu); return -1; } mask[cpu>>6] = 1ULL << (cpu & 63); ret = syscall(__NR_sched_setaffinity, pid, size, mask); free(mask); return ret; } static void pfms_thread_mainloop(void *arg) { long k = (long )arg; uint32_t mycpu = (uint32_t)k; pfarg_ctx_t myctx, *ctx; pfarg_load_t load_args; int fd = -1; pfms_thread_t *td; sem_t *cmd_sem; int ret = 0; memset(&load_args, 0, sizeof(load_args)); load_args.load_pid = mycpu; td = tds+mycpu; ret = pin_cpu(mycpu); dprint("CPU%u wthread created and pinned ret=%d\n", mycpu, ret); cmd_sem = &tds[mycpu].cmd_sem; for(;;) { dprint("CPU%u waiting for cmd\n", mycpu); sem_wait(cmd_sem); switch(td->cmd) { case CMD_NONE: ret = 0; break; case CMD_CTX: /* * copy context to get private fd */ ctx = td->data; myctx = *ctx; fd = pfm_create_context(&myctx, NULL, NULL, 0); ret = fd < 0 ? -1 : 0; dprint("CPU%u CMD_CTX ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_LOAD: ret = pfm_load_context(fd, &load_args); dprint("CPU%u CMD_LOAD ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_UNLOAD: ret = pfm_unload_context(fd); dprint("CPU%u CMD_UNLOAD ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_START: ret = pfm_start(fd, NULL); dprint("CPU%u CMD_START ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_STOP: ret = pfm_stop(fd); dprint("CPU%u CMD_STOP ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_WPMCS: ret = pfm_write_pmcs(fd,(pfarg_pmc_t *)td->data, td->ndata); dprint("CPU%u CMD_WPMCS ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_WPMDS: ret = pfm_write_pmds(fd,(pfarg_pmd_t *)td->data, td->ndata); dprint("CPU%u CMD_WPMDS ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_RPMDS: ret = pfm_read_pmds(fd,(pfarg_pmd_t *)td->data, td->ndata); dprint("CPU%u CMD_RPMDS ret=%d errno=%d fd=%d\n", mycpu, ret, errno, fd); break; case CMD_CLOSE: dprint("CPU%u CMD_CLOSE fd=%d\n", mycpu, fd); ret = close(fd); fd = -1; break; default: break; } td->ret = ret; dprint("CPU%u td->ret=%d\n", mycpu, ret); barrier_wait(td->barrier); } } static int create_one_wthread(int cpu) { int ret; sem_init(&tds[cpu].cmd_sem, 0, 0); ret = pthread_create(&tds[cpu].tid, NULL, (void *(*)(void *))pfms_thread_mainloop, (void *)(long)cpu); return ret; } /* * must be called with tds_lock held */ static int create_wthreads(uint64_t *cpu_list, uint32_t n) { uint64_t v; uint32_t i,k, cpu; int ret = 0; for(k=0, cpu = 0; k < n; k++, cpu+= 64) { v = cpu_list[k]; for(i=0; v && i < 63; i++, v>>=1, cpu++) { if ((v & 0x1) && tds[cpu].tid == 0) { ret = create_one_wthread(cpu); if (ret) break; } } } if (ret) dprint("cannot create wthread on CPU%u\n", cpu); return ret; } int pfms_initialize(void) { printf("cpu_t=%zu thread=%zu session_t=%zu\n", sizeof(pfms_cpu_t), sizeof(pfms_thread_t), sizeof(pfms_session_t)); ncpus = (uint32_t)sysconf(_SC_NPROCESSORS_ONLN); if (ncpus == -1) { dprint("cannot retrieve number of online processors\n"); return -1; } dprint("configured for %u CPUs\n", ncpus); /* * XXX: assuming CPU are contiguously indexed */ tds = calloc(ncpus, sizeof(*tds)); if (tds == NULL) { dprint("cannot allocate thread descriptors\n"); return -1; } return 0; } int pfms_create(uint64_t *cpu_list, size_t n, pfarg_ctx_t *ctx, pfms_ovfl_t *ovfl, void **desc) { uint64_t v; size_t k, i; uint32_t num, cpu; pfms_session_t *s; int ret; if (cpu_list == NULL || n == 0 || ctx == NULL || desc == NULL) { dprint("invalid parameters\n"); return -1; } if ((ctx->ctx_flags & PFM_FL_SYSTEM_WIDE) == 0) { dprint("only works for system wide\n"); return -1; } *desc = NULL; /* * XXX: assuming CPU are contiguously indexed */ num = 0; for(k=0, cpu = 0; k < n; k++, cpu+=64) { v = cpu_list[k]; for(i=0; v && i < 63; i++, v>>=1, cpu++) { if (v & 0x1) { if (cpu >= ncpus) { dprint("unavailable CPU%u\n", cpu); return -1; } num++; } } } if (num == 0) return 0; s = calloc(1, sizeof(*s)); if (s == NULL) { dprint("cannot allocate %u contexts\n", num); return -1; } s->ncpus = num; printf("%u-way session\n", num); /* * +1 to account for main thread waiting */ ret = barrier_init(&s->barrier, num + 1); if (ret) { dprint("cannot init barrier\n"); goto error_free; } /* * lock thread descriptor table, no other create_session, close_session * can occur */ pthread_mutex_lock(&tds_lock); if (create_wthreads(cpu_list, n)) goto error_free_unlock; /* * check all needed threads are available */ for(k=0, cpu = 0; k < n; k++, cpu += 64) { v = cpu_list[k]; for(i=0; v && i < 63; i++, v>>=1, cpu++) { if (v & 0x1) { if (tds[cpu].barrier) { dprint("CPU%u already managing a session\n", cpu); goto error_free_unlock; } } } } /* * send create context order */ for(k=0, cpu = 0; k < n; k++, cpu += 64) { v = cpu_list[k]; for(i=0; v && i < 63; i++, v>>=1, cpu++) { if (v & 0x1) { tds[cpu].cmd = CMD_CTX; tds[cpu].data = ctx; tds[cpu].barrier = &s->barrier; sem_post(&tds[cpu].cmd_sem); } } } barrier_wait(&s->barrier); ret = 0; /* * check for errors */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { ret = tds[k].ret; if (ret) break; } } /* * undo if error found */ if (k < ncpus) { for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { if (tds[k].ret == 0) { tds[k].cmd = CMD_CLOSE; sem_post(&tds[k].cmd_sem); } /* mark as free */ tds[k].barrier = NULL; } } } pthread_mutex_unlock(&tds_lock); if (ret == 0) *desc = s; return ret ? -1 : 0; error_free_unlock: pthread_mutex_unlock(&tds_lock); error_free: free(s); return -1; } int pfms_load(void *desc) { uint32_t k; pfms_session_t *s; int ret; if (desc == NULL) { dprint("invalid parameters\n"); return -1; } s = (pfms_session_t *)desc; if (s->ncpus == 0) { dprint("invalid session content 0 CPUS\n"); return -1; } /* * send create context order */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { tds[k].cmd = CMD_LOAD; sem_post(&tds[k].cmd_sem); } } barrier_wait(&s->barrier); ret = 0; /* * check for errors */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { ret = tds[k].ret; if (ret) { dprint("failure on CPU%u\n", k); break; } } } /* * if error, unload all others */ if (k < ncpus) { for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { if (tds[k].ret == 0) { tds[k].cmd = CMD_UNLOAD; sem_post(&tds[k].cmd_sem); } } } } return ret ? -1 : 0; } static int __pfms_do_simple_cmd(pfms_cmd_t cmd, void *desc, void *data, uint32_t n) { size_t k; pfms_session_t *s; int ret; if (desc == NULL) { dprint("invalid parameters\n"); return -1; } s = (pfms_session_t *)desc; if (s->ncpus == 0) { dprint("invalid session content 0 CPUS\n"); return -1; } /* * send create context order */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { tds[k].cmd = cmd; tds[k].data = data; tds[k].ndata = n; sem_post(&tds[k].cmd_sem); } } barrier_wait(&s->barrier); ret = 0; /* * check for errors */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { ret = tds[k].ret; if (ret) { dprint("failure on CPU%zu\n", k); break; } } } /* * simple commands cannot be undone */ return ret ? -1 : 0; } int pfms_unload(void *desc) { return __pfms_do_simple_cmd(CMD_UNLOAD, desc, NULL, 0); } int pfms_start(void *desc) { return __pfms_do_simple_cmd(CMD_START, desc, NULL, 0); } int pfms_stop(void *desc) { return __pfms_do_simple_cmd(CMD_STOP, desc, NULL, 0); } int pfms_write_pmcs(void *desc, pfarg_pmc_t *pmcs, uint32_t n) { return __pfms_do_simple_cmd(CMD_WPMCS, desc, pmcs, n); } int pfms_write_pmds(void *desc, pfarg_pmd_t *pmds, uint32_t n) { return __pfms_do_simple_cmd(CMD_WPMDS, desc, pmds, n); } int pfms_close(void *desc) { size_t k; pfms_session_t *s; int ret; if (desc == NULL) { dprint("invalid parameters\n"); return -1; } s = (pfms_session_t *)desc; if (s->ncpus == 0) { dprint("invalid session content 0 CPUS\n"); return -1; } for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { tds[k].cmd = CMD_CLOSE; sem_post(&tds[k].cmd_sem); } } barrier_wait(&s->barrier); ret = 0; pthread_mutex_lock(&tds_lock); /* * check for errors */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { if (tds[k].ret) { dprint("failure on CPU%zu\n", k); } ret |= tds[k].ret; tds[k].barrier = NULL; } } pthread_mutex_unlock(&tds_lock); free(s); /* * XXX: we cannot undo close */ return ret ? -1 : 0; } int pfms_read_pmds(void *desc, pfarg_pmd_t *pmds, uint32_t n) { pfms_session_t *s; uint32_t k, pmds_per_cpu; int ret; if (desc == NULL) { dprint("invalid parameters\n"); return -1; } s = (pfms_session_t *)desc; if (s->ncpus == 0) { dprint("invalid session content 0 CPUS\n"); return -1; } if (n % s->ncpus) { dprint("invalid number of pfarg_pmd_t provided, must be multiple of %u\n", s->ncpus); return -1; } pmds_per_cpu = n / s->ncpus; dprint("n=%u ncpus=%u per_cpu=%u\n", n, s->ncpus, pmds_per_cpu); for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { tds[k].cmd = CMD_RPMDS; tds[k].data = pmds; tds[k].ndata= pmds_per_cpu; sem_post(&tds[k].cmd_sem); pmds += pmds_per_cpu; } } barrier_wait(&s->barrier); ret = 0; /* * check for errors */ for(k=0; k < ncpus; k++) { if (tds[k].barrier == &s->barrier) { ret = tds[k].ret; if (ret) { dprint("failure on CPU%u\n", k); break; } } } /* * cannot undo pfm_read_pmds */ return ret ? -1 : 0; } #if 0 /* * beginning of test program */ #include #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static uint32_t popcount(uint64_t c) { uint32_t count = 0; for(; c; c>>=1) { if (c & 0x1) count++; } return count; } int main(int argc, char **argv) { pfarg_ctx_t ctx; pfarg_pmc_t pc[NUM_PMCS]; pfarg_pmd_t *pd; pfmlib_input_param_t inp; pfmlib_output_param_t outp; uint64_t cpu_list; void *desc; unsigned int num_counters; uint32_t i, j, k, l, ncpus, npmds; size_t len; int ret; char *name; if (pfm_initialize() != PFMLIB_SUCCESS) fatal_error("cannot initialize libpfm\n"); if (pfms_initialize()) fatal_error("cannot initialize libpfms\n"); pfm_get_num_counters(&num_counters); pfm_get_max_event_name_len(&len); name = malloc(len+1); if (name == NULL) fatal_error("cannot allocate memory for event name\n"); memset(&ctx, 0, sizeof(ctx)); memset(pc, 0, sizeof(pc)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); cpu_list = argc > 1 ? strtoul(argv[1], NULL, 0) : 0x3; ncpus = popcount(cpu_list); if (pfm_get_cycle_event(&inp.pfp_events[0].event) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1].event) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; inp.pfp_dfl_plm = PFM_PLM3|PFM_PLM0; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; /* * indicate we are using the monitors for a system-wide session. * This may impact the way the library sets up the PMC values. */ inp.pfp_flags = PFMLIB_PFP_SYSTEMWIDE; /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); npmds = ncpus * inp.pfp_event_count; dprint("ncpus=%u npmds=%u\n", ncpus, npmds); pd = calloc(npmds, sizeof(pfarg_pmd_t)); if (pd == NULL) fatal_error("cannot allocate pd array\n"); for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for(l=0, k = 0; l < ncpus; l++) { for (i=0, j=0; i < inp.pfp_event_count; i++, k++) { pd[k].reg_num = outp.pfp_pmcs[j].reg_pmd_num; for(; j < outp.pfp_pmc_count; j++) if (outp.pfp_pmcs[j].reg_evt_idx != i) break; } } /* * create a context on all CPUs we asked for * * libpfms only works for system-wide, so we set the flag in * the master context. the context argument is not modified by * call. * * desc is an opaque descriptor used to identify session. */ ctx.ctx_flags = PFM_FL_SYSTEM_WIDE; ret = pfms_create(&cpu_list, 1, &ctx, NULL, &desc); if (ret == -1) fatal_error("create error %d\n", ret); /* * program the PMC registers on all CPUs of interest */ ret = pfms_write_pmcs(desc, pc, outp.pfp_pmc_count); if (ret == -1) fatal_error("write_pmcs error %d\n", ret); /* * program the PMD registers on all CPUs of interest */ ret = pfms_write_pmds(desc, pd, inp.pfp_event_count); if (ret == -1) fatal_error("write_pmds error %d\n", ret); /* * load context on all CPUs of interest */ ret = pfms_load(desc); if (ret == -1) fatal_error("load error %d\n", ret); /* * start monitoring on all CPUs of interest */ ret = pfms_start(desc); if (ret == -1) fatal_error("start error %d\n", ret); /* * simulate some work */ sleep(10); /* * stop monitoring on all CPUs of interest */ ret = pfms_stop(desc); if (ret == -1) fatal_error("stop error %d\n", ret); /* * read the PMD registers on all CPUs of interest. * The pd[] array must be organized such that to * read 2 PMDs on each CPU you need: * - 2 * number of CPUs of interest * - the first 2 elements of pd[] read on 1st CPU * - the next 2 elements of pd[] read on the 2nd CPU * - and so on */ ret = pfms_read_pmds(desc, pd, npmds); if (ret == -1) fatal_error("read_pmds error %d\n", ret); /* * pre per-CPU results */ for(j=0, k= 0; j < ncpus; j++) { for (i=0; i < inp.pfp_event_count; i++, k++) { pfm_get_full_event_name(&inp.pfp_events[i], name, len); printf("CPU%-3d PMD%u %20"PRIu64" %s\n", j, pd[k].reg_num, pd[k].reg_value, name); } } /* * destroy context on all CPUs of interest. * After this call desc is invalid */ ret = pfms_close(desc); if (ret == -1) fatal_error("close error %d\n", ret); free(name); return 0; } #endif papi-5.3.0/src/libpfm-3.y/libpfms/lib/Makefile0000600003276200002170000000515612247131123020456 0ustar ralphundrgrad# # Copyright (c) 2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/../.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk CFLAGS+= -pthread -D_GNU_SOURCE LDFLAGS+=-static PFMSINCDIR=../include # # Library version # VERSION=0 REVISION=1 AGE=0 SRCS=libpfms.c HEADERS=../include/libpfms.h ALIBPFM=libpfms.a TARGETS=$(ALIBPFM) ifneq ($(CONFIG_PFMLIB_ARCH_CRAYX2),y) SLIBPFM=libpfms.so.$(VERSION).$(REVISION).$(AGE) VLIBPFM=libpfms.so.$(VERSION) endif OBJS=$(SRCS:.c=.o) SOBJS=$(OBJS:.o=.lo) # # assume that if llibpfm built static, libpfms should # also be static, i.e., likely platform does not support # shared libraries. # ifeq ($(CONFIG_PFMLIB_SHARED),y) TARGETS += $(SLIBPFM) endif ifeq ($(SYS),Linux) SLDFLAGS=-shared -Wl,-soname -Wl,libpfms.so.$(VERSION) endif CFLAGS+=-I$(PFMSINCDIR) all: $(TARGETS) $(OBJS) $(SOBJS): $(HEADERS) $(TOPDIR)/config.mk $(TOPDIR)/rules.mk Makefile libpfms.a: $(OBJS) $(RM) $@ $(AR) cru $@ $(OBJS) $(SLIBPFM): $(SOBJS) $(CC) $(CFLAGS) $(SLDFLAGS) -o $@ $(SOBJS) $(LN) -sf $@ libpfms.so.$(VERSION) clean: $(RM) -f *.o *.lo *.a *.so* *~ distclean: clean install: $(TARGETS) install: -mkdir -p $(DESTDIR)$(LIBDIR) $(INSTALL) -m 644 $(ALIBPFM) $(DESTDIR)$(LIBDIR) $(INSTALL) $(SLIBPFM) $(DESTDIR)$(LIBDIR) cd $(DESTDIR)$(LIBDIR); $(LN) $(SLIBPFM) $(VLIBPFM) cd $(DESTDIR)$(LIBDIR); $(LN) $(SLIBPFM) libpfms.so -mkdir -p $(DESTDIR)$(INCDIR)/perfmon $(INSTALL) -m 644 $(HEADERS) $(DESTDIR)$(INCDIR)/perfmon papi-5.3.0/src/libpfm-3.y/examples_v3.x/0000700003276200002170000000000012247131122017276 5ustar ralphundrgradpapi-5.3.0/src/libpfm-3.y/examples_v3.x/task_attach.c0000600003276200002170000002054012247131122021733 0ustar ralphundrgrad/* * task_attach.c - example of how to attach to another task for monitoring * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS #define MAX_EVT_NAME_LEN 128 static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } int parent(pid_t pid) { pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[NUM_PMCS]; pfarg_pmr_t pd[NUM_PMDS]; pfarg_sinfo_t sif; pfarg_msg_t msg; unsigned int i, num_counters; int status, ret; int ctx_fd; char name[MAX_EVT_NAME_LEN]; memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(&sif,0, sizeof(sif)); pfm_get_num_counters(&num_counters); if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; /* * set the privilege mode: * PFM_PLM3 : user level * PFM_PLM0 : kernel level */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; /* * now create a session. we will later attach it to the task we are creating. */ ctx_fd = pfm_create(0, &sif); if (ctx_fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) { fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); } /* * Now prepare the argument to initialize the PMDs and PMCS. * We must pfp_pmc_count to determine the number of PMC to intialize. * We must use pfp_event_count to determine the number of PMD to initialize. * Some events causes extra PMCs to be used, so pfp_pmc_count may be >= pfp_event_count. * * This step is new compared to libpfm-2.x. It is necessary because the library no * longer knows about the kernel data structures. */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for(i=0; i < outp.pfp_pmd_count; i++) pd[i].reg_num = outp.pfp_pmds[i].reg_num; /* * Now program the registers * * We don't use the save variable to indicate the number of elements passed to * the kernel because, as we said earlier, pc may contain more elements than * the number of events we specified, i.e., contains more thann counting monitors. */ if (pfm_write(ctx_fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc)) == -1) fatal_error("pfm_write error errno %d\n",errno); /* * To be read, each PMD must be either written or declared * as being part of a sample (reg_smpl_pmds) */ if (pfm_write(ctx_fd, 0, PFM_RW_PMD, pd, outp.pfp_pmd_count * sizeof(*pd)) == -1) fatal_error("pfm_write(PMD) error errno %d\n",errno); ret = ptrace(PTRACE_ATTACH, pid, NULL, 0); if (ret == -1) { fatal_error("cannot attach to %d: %s\n", pid, strerror(errno)); } /* * wait for the child to be actually stopped */ waitpid(pid, &status, WUNTRACED); /* * check if process exited early */ if (WIFEXITED(status)) fatal_error("command process %d exited too early with status %d\n", pid, WEXITSTATUS(status)); /* * the task is stopped at this point */ /* * now we attach the session to ourself */ if (pfm_attach(ctx_fd, 0, pid) == -1) fatal_error("pfm_attach error errno %d\n",errno); /* * activate monitoring. The task is still STOPPED at this point. Monitoring * will not take effect until the execution of the task is resumed. */ if (pfm_set_state(ctx_fd, 0, PFM_ST_START) == -1) fatal_error("pfm_set_state(start) error errno %d\n",errno); /* * now resume execution of the task, effectively activating * monitoring. */ ptrace(PTRACE_DETACH, pid, NULL, 0); /* * now the task is running */ /* * We cannot simply do a waitpid() because we may be attaching to a process * totally unrelated to our program. Instead we use a perfmon facility that * notifies us when the monitoring task is exiting. * * When a task with a monitoring session attached to it exits, a PFM_MSG_END * is generated. It can be retrieve with a simple read() on the session's descriptor. * * Another reason why you might return from the read is if there was a counter * overflow, unlikely in this example. * * To measure only for short period of time, use select or poll with a timeout, * see task_attach_timeout.c * */ ret = read(ctx_fd, &msg, sizeof(msg)); if (ret == -1) fatal_error("cannot read from descriptor: %s\n", strerror(errno)); if (msg.type != PFM_MSG_END) fatal_error("unexpected msg type : %d\n", msg.type); /* * the task has exited, we can simply read the results */ /* * now simply read the results. */ if (pfm_read(ctx_fd, 0, PFM_RW_PMD, pd, inp.pfp_event_count * sizeof(*pd)) == -1) fatal_error("pfm_read error errno %d\n",errno); /* * print the results * * It is important to realize, that the first event we specified may not * be in PMD4. Not all events can be measured by any monitor. That's why * we need to use the pc[] array to figure out where event i was allocated. * */ for (i=0; i < inp.pfp_event_count; i++) { pfm_get_full_event_name(&inp.pfp_events[i], name, MAX_EVT_NAME_LEN); printf("PMD%-3u %20"PRIu64" %s\n", pd[i].reg_num, pd[i].reg_value, name); } /* * free the session */ close(ctx_fd); return 0; } int main(int argc, char **argv) { pfmlib_options_t pfmlib_options; pid_t pid; int ret; if (argc < 2) { fatal_error("usage: %s pid\n", argv[0]); } pid = atoi(argv[1]); /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); return parent(pid); } papi-5.3.0/src/libpfm-3.y/examples_v3.x/rtop.c0000600003276200002170000006020412247131122020432 0ustar ralphundrgrad/* rtop.c - a simple PMU-based CPU utilization tool * * Copyright (c) 2004-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _GNU_SOURCE #define _GNU_SOURCE /* for getline */ #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define SWITCH_TIMEOUT 1000000000 /* in nanoseconds */ #define RTOP_VERSION "0.1" #define RTOP_MAX_CPUS 1024 /* maximum number of CPU supported */ #define MAX_EVT_NAME_LEN 128 #define RTOP_NUM_PMCS 4 #define RTOP_NUM_PMDS 4 /* * max number of cpus (threads) supported */ #define RTOP_MAX_CPUS 1024 /* MUST BE power of 2 */ #define RTOP_CPUMASK_BITS (sizeof(unsigned long)<<3) #define RTOP_CPUMASK_COUNT (RTOP_MAX_CPUS/RTOP_CPUMASK_BITS) #define RTOP_CPUMASK_SET(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] |= (1UL << ((g) % RTOP_CPUMASK_BITS))) #define RTOP_CPUMASK_CLEAR(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] &= ~(1UL << ((g) % RTOP_CPUMASK_BITS))) #define RTOP_CPUMASK_ISSET(m, g) ((m)[(g)/RTOP_CPUMASK_BITS] & (1UL << ((g) % RTOP_CPUMASK_BITS))) typedef unsigned long rtop_cpumask_t[RTOP_CPUMASK_COUNT]; typedef struct { struct { int opt_verbose; int opt_delay; /* refresh delay in second */ int opt_delay_set; } program_opt_flags; rtop_cpumask_t cpu_mask; /* which CPUs to use in system wide mode */ long online_cpus; long selected_cpus; unsigned long cpu_mhz; char *outfile; } program_options_t; #define opt_verbose program_opt_flags.opt_verbose #define opt_delay program_opt_flags.opt_delay #define opt_delay_set program_opt_flags.opt_delay_set typedef struct { char *name; unsigned int plm; } eventdesc_t; typedef enum { THREAD_STARTED, THREAD_RUN, THREAD_DONE, THREAD_ERROR } thread_state_t; typedef struct { pthread_t tid; /* logical thread identification */ long cpuid; unsigned int id; thread_state_t state; int is_last; sem_t his_sem; sem_t my_sem; FILE *fp; uint64_t nsamples; int has_msg; } thread_desc_t; typedef struct _setdesc_t { pfarg_pmr_t pc[RTOP_NUM_PMDS]; pfarg_pmr_t pd[RTOP_NUM_PMCS]; pfmlib_input_param_t inp; pfmlib_output_param_t outp; uint16_t set_id; uint32_t set_flags; uint32_t set_timeout; /* actual timeout */ int (*handler)(int fd, FILE *fp, thread_desc_t *td, struct _setdesc_t *my_sdesc); void *data; eventdesc_t *evt_desc; } setdesc_t; typedef struct _barrier { pthread_mutex_t mutex; pthread_cond_t cond; unsigned long counter; unsigned long max; unsigned long generation; /* avoid race condition on wake-up */ } barrier_t; typedef enum { SESSION_INIT, SESSION_RUN, SESSION_STOP, SESSION_ABORTED } session_state_t; typedef struct { uint64_t prev_k_cycles; uint64_t prev_u_cycles; } set0_data_t; static barrier_t barrier; static session_state_t session_state; static program_options_t options; static thread_desc_t *thread_info; static struct termios saved_tty; static int time_to_quit; static int term_rows, term_cols; static eventdesc_t set0_evt[]={ { .name = "*", .plm = PFM_PLM0 }, { .name = "*", .plm = PFM_PLM3 }, { .name = NULL} }; static int handler_set0(int fd, FILE *fp, thread_desc_t *td, setdesc_t *my_sdesc); static setdesc_t setdesc_tab[]={ { .set_id = 0, .evt_desc = set0_evt, .handler = handler_set0 } }; #define RTOP_NUM_SDESC (sizeof(setdesc_tab)/sizeof(setdesc_t)) static int barrier_init(barrier_t *b, unsigned long count) { int r; r = pthread_mutex_init(&b->mutex, NULL); if (r == -1) return -1; r = pthread_cond_init(&b->cond, NULL); if (r == -1) return -1; b->max = b->counter = count; b->generation = 0; return 0; } static void cleanup_barrier(void *arg) { int r; barrier_t *b = (barrier_t *)arg; r = pthread_mutex_unlock(&b->mutex); (void)r; } static int barrier_wait(barrier_t *b) { unsigned long generation; int oldstate; pthread_cleanup_push(cleanup_barrier, b); pthread_mutex_lock(&b->mutex); pthread_testcancel(); if (--b->counter == 0) { /* reset barrier */ b->counter = b->max; /* * bump generation number, this avoids thread getting stuck in the * wake up loop below in case a thread just out of the barrier goes * back in right away before all the thread from the previous "round" * have "escaped". */ b->generation++; pthread_cond_broadcast(&b->cond); } else { generation = b->generation; pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &oldstate); while (b->counter != b->max && generation == b->generation) { pthread_cond_wait(&b->cond, &b->mutex); } pthread_setcancelstate(oldstate, NULL); } pthread_mutex_unlock(&b->mutex); pthread_cleanup_pop(0); return 0; } static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void warning(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); } static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } int gettid(void) { int tmp; tmp = syscall(__NR_gettid); return tmp; } #ifndef __NR_sched_setaffinity #error "you need to define __NR_sched_setaffinity" #endif /* * Hack to get this to work without libc support */ int pin_self_cpu(unsigned int cpu) { unsigned long my_mask; my_mask = 1UL << cpu; return syscall(__NR_sched_setaffinity, gettid(), sizeof(my_mask), &my_mask); } static void sigint_handler(int n) { time_to_quit = 1; } static unsigned long find_cpu_speed(void) { FILE *fp1; unsigned long f1 = 0, f2 = 0; char buffer[128], *p, *value; memset(buffer, 0, sizeof(buffer)); fp1 = fopen("/proc/cpuinfo", "r"); if (fp1 == NULL) return 0; for (;;) { buffer[0] = '\0'; p = fgets(buffer, 127, fp1); if (p == NULL) break; /* skip blank lines */ if (*p == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) break; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncasecmp("cpu MHz", buffer, 7)) { float fl; sscanf(value, "%f", &fl); f1 = lroundf(fl); break; } if (!strncasecmp("BogoMIPS", buffer, 8)) { float fl; sscanf(value, "%f", &fl); f2 = lroundf(fl); } } fclose(fp1); return f1 == 0 ? f2 : f1; } static void get_term_size(void) { int ret; struct winsize ws; ret = ioctl(1, TIOCGWINSZ, &ws); if (ret == -1) fatal_error("cannot determine screen size\n"); if (ws.ws_row > 10) { term_cols = ws.ws_col; term_rows = ws.ws_row; } else { term_cols = 80; term_rows = 24; } if (term_rows < options.selected_cpus) fatal_error("you need at least %ld rows on your terminal to display all CPUs\n", options.selected_cpus); } static void sigwinch_handler(int n) { get_term_size(); } static void setup_screen(void) { int ret; ret = tcgetattr(0, &saved_tty); if (ret == -1) fatal_error("cannot save tty settings\n"); get_term_size(); initscr(); nocbreak(); resizeterm(term_rows, term_cols); } static void close_screen(void) { int ret; endwin(); ret = tcsetattr(0, TCSAFLUSH, &saved_tty); if (ret == -1) warning("cannot restore tty settings\n"); } static void setup_signals(void) { struct sigaction act; sigset_t my_set; /* * SIGINT is a asynchronous signal * sent to the process (not a specific thread). POSIX states * that one and only one thread will execute the handler. This * could be any thread that does not have the signal blocked. */ /* * install SIGINT handler */ memset(&act,0,sizeof(act)); sigemptyset(&my_set); act.sa_handler = (__sighandler_t)sigint_handler; sigaction (SIGINT, &act, 0); /* * install SIGWINCH handler */ memset(&act,0,sizeof(act)); sigemptyset(&my_set); act.sa_handler = (__sighandler_t)sigwinch_handler; sigaction (SIGWINCH, &act, 0); } static void setup_worker_signals(void) { struct sigaction act; sigset_t my_set; /* * SIGINT is a asynchronous signal * sent to the process (not a specific thread). POSIX states * that one and only one thread will execute the handler. This * could be any thread that does not have the signal blocked. */ /* * block SIGINT, forcing it to master thread only */ memset(&act,0,sizeof(act)); sigemptyset(&my_set); sigaddset(&my_set, SIGINT); sigaddset(&my_set, SIGWINCH); pthread_sigmask(SIG_BLOCK, &my_set, NULL); } static struct option rtop_cmd_options[]={ { "help", 0, 0, 1 }, { "version", 0, 0, 2 }, { "delay", 0, 0, 3 }, { "cpu-list", 1, 0, 4 }, { "outfile", 1, 0, 5 }, { "verbose", 0, &options.opt_verbose, 1 }, { 0, 0, 0, 0} }; int handler_set0(int fd, FILE *fp, thread_desc_t *td, setdesc_t *my_sdesc) { double k_cycles, u_cycles, i_cycles; set0_data_t *sd1; uint64_t itc_delta; long mycpu; mycpu = td->cpuid; if (my_sdesc->data == NULL) { my_sdesc->data = sd1 = calloc(1, sizeof(set0_data_t)); if (sd1 == NULL) return -1; } sd1 = my_sdesc->data; /* * now read the results */ if (pfm_read(fd, 0, PFM_RW_PMD, my_sdesc->pd, my_sdesc->inp.pfp_event_count * sizeof(pfarg_pmr_t)) == -1) { warning( "CPU%ld pfm_readerror errno %d\n", mycpu, errno); return -1; } /* * expected maximum duration with monitoring active for this set * set_timeout is in nanoseconds, we need to divide mhz by 1000 * to get cycles. */ itc_delta = (my_sdesc->set_timeout*(uint64_t)options.cpu_mhz)/1000; k_cycles = (double)(my_sdesc->pd[0].reg_value - sd1->prev_k_cycles)*100.0/ (double)itc_delta; u_cycles = (double)(my_sdesc->pd[1].reg_value - sd1->prev_u_cycles)*100.0/ (double)itc_delta; i_cycles = 100.0 - (k_cycles + u_cycles); /* * adjust for rounding errors */ if (i_cycles < 0.0) i_cycles = 0.0; if (i_cycles > 100.0) i_cycles = 100.0; if (k_cycles > 100.0) k_cycles = 100.0; if (u_cycles > 100.0) u_cycles = 100.0; printw("CPU%-2ld %6.2f%% usr %6.2f%% sys %6.2f%% idle\n", mycpu, u_cycles, k_cycles, i_cycles); sd1->prev_k_cycles = my_sdesc->pd[0].reg_value; sd1->prev_u_cycles = my_sdesc->pd[1].reg_value; if (fp) fprintf(fp, "%"PRIu64" %6.2f %6.2f %6.2f\n", td->nsamples, u_cycles, k_cycles, i_cycles); td->nsamples++; return 0; } static void do_measure_one_cpu(void *data) { thread_desc_t *arg= (thread_desc_t *)data; pfarg_set_desc_t setd; setdesc_t *my_sdesc, *my_sdesc_tab = NULL; long mycpu; sem_t *his_sem; unsigned int id; int fd, ret, j; int old_rows; char cpu_str[16]; char *fn; FILE *fp = NULL; setup_worker_signals(); mycpu = arg->cpuid; id = arg->id; his_sem = &arg->his_sem; old_rows = term_rows; if (options.outfile) { sprintf(cpu_str,".cpu%ld", mycpu); fn = malloc(strlen(options.outfile)+1+strlen(cpu_str)); if (fn == NULL) goto error; strcpy(fn, options.outfile); strcat(fn, cpu_str); fp = fopen(fn, "w"); if (fp == NULL) { warning("cannot open %s\n", fn); free(fn); goto error; } free(fn); fprintf(fp, "# Results for CPU%ld\n" "# sample delay %d seconds\n" "# Column1 : sample number\n" "# Column2 : %% user time\n" "# Column3 : %% system time\n" "# Column4 : %% idle\n" "# Column5 : kernel entry-exit\n", mycpu, options.opt_delay); } memset(&setd, 0, sizeof(setd)); ret = pin_self_cpu(mycpu); if (ret) { warning("CPU%ld cannot pin\n"); } my_sdesc_tab = malloc(sizeof(setdesc_t)*RTOP_NUM_SDESC); if (my_sdesc_tab == NULL) { warning("CPU%ld cannot allocate sdesc\n", mycpu); goto error; } memcpy(my_sdesc_tab, setdesc_tab, sizeof(setdesc_t)*RTOP_NUM_SDESC); fd = pfm_create(PFM_FL_SYSTEM_WIDE, NULL); if (fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } warning("CPU%ld cannot create session: %d\n", mycpu, errno); goto error; } /* * WARNING: on processors where the idle loop goes into some power-saving * state, the results of this program may be incorrect */ for(j=0; j < RTOP_NUM_SDESC; j++) { my_sdesc = my_sdesc_tab+j; setd.set_id = my_sdesc->set_id; setd.set_flags = my_sdesc->set_flags; setd.set_timeout = SWITCH_TIMEOUT; /* in nsecs */ /* * do not bother if we have only one set */ if (RTOP_NUM_SDESC > 1 && pfm_create_sets(fd, 0, &setd, 1) == -1) { warning("CPU%ld cannot create set%u: %d\n", mycpu, j, errno); goto error; } my_sdesc->set_timeout = setd.set_timeout; if (pfm_write(fd, 0, PFM_RW_PMC, my_sdesc->pc, my_sdesc->outp.pfp_pmc_count * sizeof(pfarg_pmr_t)) == -1) { warning("CPU%ld pfm_write error errno %d\n", mycpu, errno); goto error; } /* * To be read, each PMD must be either written or declared * as being part of a sample (reg_smpl_pmds) */ if (pfm_write(fd, 0, PFM_RW_PMD, my_sdesc->pd, my_sdesc->inp.pfp_event_count * sizeof(pfarg_pmr_t)) == -1) { warning("CPU%ld pfm_write(PMD) error errno %d\n", mycpu, errno); goto error; } } /* * in system-wide mode, this field must provide the CPU the caller wants * to monitor. The kernel checks and if calling from the wrong CPU, the * call fails. The affinity is not affected. */ if (pfm_attach(fd, 0, mycpu) == -1) { warning("CPU%ld pfm_attach error errno %d\n", mycpu, errno); goto error; } thread_info[id].state = THREAD_RUN; barrier_wait(&barrier); /* * must wait until we are sure the master is out of its thread_create loop */ barrier_wait(&barrier); for(;session_state == SESSION_RUN;) { if (pfm_set_state(fd, 0, PFM_ST_START) == -1) { warning("CPU%ld pfm_set_state(start) error errno %d\n", mycpu, errno); goto error; } /* * wait for order from master */ sem_wait(his_sem); if (pfm_set_state(fd, 0, PFM_ST_STOP) == -1) { warning("CPU%ld pfm_set_state(stop) error %d\n", mycpu, errno); goto error; } if (old_rows != term_rows) { resizeterm(term_rows, term_cols); old_rows = term_rows; } for(j=0; j < RTOP_NUM_SDESC; j++) { move(id*RTOP_NUM_SDESC+j, 0); if (my_sdesc_tab[j].handler) (*my_sdesc_tab[j].handler)(fd, fp, arg, my_sdesc_tab+j); } if (session_state == SESSION_RUN) { sem_post(&arg->my_sem); barrier_wait(&barrier); } } if (fp) fclose(fp); close(fd); thread_info[id].state = THREAD_DONE; pthread_exit((void *)(0)); error: thread_info[id].state = THREAD_ERROR; barrier_wait(&barrier); if (fp) fclose(fp); if (my_sdesc_tab) free(my_sdesc_tab); pthread_exit((void *)(~0)); } static void mainloop(void) { long i, j, ncpus = 0; int ret; void *retval; struct pollfd fds; ncpus = options.selected_cpus; barrier_init(&barrier, ncpus+1); thread_info = malloc(sizeof(thread_desc_t)*ncpus); if (thread_info == NULL) { fatal_error("cannot allocate thread_desc for %ld CPUs\n", ncpus); } for(i=0, j = 0; ncpus; i++) { if (RTOP_CPUMASK_ISSET(options.cpu_mask, i) == 0) continue; thread_info[j].id = j; thread_info[j].cpuid = i; sem_init(&thread_info[j].his_sem, 0, 0); sem_init(&thread_info[j].my_sem, 0, 0); ret = pthread_create(&thread_info[j].tid, NULL, (void *(*)(void *))do_measure_one_cpu, (void *)(thread_info+j)); if (ret != 0) goto abort; ncpus--; j++; } /* set last marker */ thread_info[j-1].is_last = 1; ncpus = j; barrier_wait(&barrier); /* * check if some threads got problems */ for(i=0; i < ncpus ; i++) { if (thread_info[i].state == THREAD_ERROR) { printw("aborting\n"); refresh(); goto abort; } } session_state = SESSION_RUN; barrier_wait(&barrier); fds.fd = 0; fds.events = POLLIN; fds.revents = 0; for(;time_to_quit == 0;) { ret = poll(&fds, 1, options.opt_delay*1000); switch(ret) { case 0: for(i=0; i < ncpus ; i++) { /* give order to print */ sem_post(&thread_info[i].his_sem); /* wait for thread to be done */ sem_wait(&thread_info[i].my_sem); } /* give order to start measuring again */ refresh(); barrier_wait(&barrier); break; case -1: /* restart in case of signal */ if (errno == EINTR) continue; warning("polling error: %s\n", strerror(errno)); /* fall through */ default: time_to_quit = 1; } } session_state = SESSION_STOP; /* * get worker thread out of their mainloop */ for (i=0; i < ncpus; i++) sem_post(&thread_info[i].his_sem); join_all: for(i=0; i< ncpus; i++) { ret = pthread_join(thread_info[i].tid, &retval); if (ret !=0) fatal_error("cannot join thread %ld\n", i); } free(thread_info); return; abort: session_state = SESSION_ABORTED; for(i=0; i < ncpus; i++) { pthread_cancel(thread_info[i].tid); } goto join_all; } static void setup_measurement(void) { pfarg_sinfo_t sif; pfmlib_options_t pfmlib_options; eventdesc_t *evt; setdesc_t *sdesc; pfmlib_event_t trigger_event; unsigned int i, j; int ret; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; pfmlib_options.pfm_verbose = 0; pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); if (pfm_get_cycle_event(&trigger_event) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event for trigger\n"); memset(&sif, 0, sizeof(sif)); get_sif(PFM_FL_SYSTEM_WIDE, &sif); for(i=0; i < RTOP_NUM_SDESC; i++) { sdesc = setdesc_tab+i; sdesc->inp.pfp_dfl_plm = PFM_PLM3|PFM_PLM0; /* * indicate we are using the monitors for a system-wide session. * This may impact the way the library sets up the PMC values. */ sdesc->inp.pfp_flags = PFMLIB_PFP_SYSTEMWIDE; evt = sdesc->evt_desc; for(j=0; evt[j].name ; j++) { if (*evt[j].name == '*') sdesc->inp.pfp_events[j] = trigger_event; else if (pfm_find_full_event(evt[j].name, &sdesc->inp.pfp_events[j]) != PFMLIB_SUCCESS) fatal_error("cannot find %s event\n", evt[j].name); sdesc->inp.pfp_events[j].plm = evt[j].plm; } /* * how many counters we use in this set (add the overflow trigger) */ sdesc->inp.pfp_event_count = j; /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &sdesc->inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&sdesc->inp, NULL, &sdesc->outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); for (j=0; j < sdesc->outp.pfp_pmc_count; j++) { sdesc->pc[j].reg_set = i; sdesc->pc[j].reg_num = sdesc->outp.pfp_pmcs[j].reg_num; sdesc->pc[j].reg_value = sdesc->outp.pfp_pmcs[j].reg_value; } for (j=0; j < sdesc->outp.pfp_pmd_count; j++) { sdesc->pd[j].reg_num = sdesc->outp.pfp_pmds[j].reg_num; sdesc->pd[j].reg_set = i; } } } void populate_cpumask(char *cpu_list) { char *p; unsigned long start_cpu, end_cpu = 0; unsigned long i, count = 0; options.online_cpus = sysconf(_SC_NPROCESSORS_ONLN); if (options.online_cpus == -1) fatal_error("cannot figure out the number of online processors\n"); if (cpu_list == NULL) { /* * The limit is mostly driven by the affinity support in NPTL and glibc __CPU_SETSIZE. * the kernel interface does not expose any limitation. */ if (options.online_cpus >= RTOP_MAX_CPUS) fatal_error("rtop can only handle to %u CPUs\n", RTOP_MAX_CPUS); for(i=0; i < options.online_cpus; i++) { RTOP_CPUMASK_SET(options.cpu_mask, i); } options.selected_cpus = options.online_cpus; return; } while(isdigit(*cpu_list)) { p = NULL; start_cpu = strtoul(cpu_list, &p, 0); /* auto-detect base */ if (start_cpu == ULONG_MAX || (*p != '\0' && *p != ',' && *p != '-')) goto invalid; if (p && *p == '-') { cpu_list = ++p; p = NULL; end_cpu = strtoul(cpu_list, &p, 0); /* auto-detect base */ if (end_cpu == ULONG_MAX || (*p != '\0' && *p != ',')) goto invalid; if (end_cpu < start_cpu) goto invalid_range; } else { end_cpu = start_cpu; } if (start_cpu >= RTOP_MAX_CPUS || end_cpu >= RTOP_MAX_CPUS) goto too_big; for (; start_cpu <= end_cpu; start_cpu++) { if (start_cpu >= options.online_cpus) goto not_online; /* XXX: assume contiguous range of CPUs */ if (RTOP_CPUMASK_ISSET(options.cpu_mask, start_cpu)) continue; RTOP_CPUMASK_SET(options.cpu_mask, start_cpu); count++; } if (*p) ++p; cpu_list = p; } options.selected_cpus = count; return; invalid: fatal_error("invalid cpu list argument: %s\n", cpu_list); /* no return */ not_online: fatal_error("cpu %lu is not online\n", start_cpu); /* no return */ invalid_range: fatal_error("cpu range %lu - %lu is invalid\n", start_cpu, end_cpu); /* no return */ too_big: fatal_error("rtop is limited to %u CPUs\n", RTOP_MAX_CPUS); /* no return */ } static void usage(void) { printf( "usage: rtop [options]:\n" "-h, --help\t\t\tdisplay this help and exit\n" "-v, --verbose\t\t\tverbose output\n" "-V, --version\t\t\tshow version and exit\n" "-d nsec, --delay=nsec\t\tnumber of seconds between refresh (default=1s)\n" "--cpu-list=cpu1,cpu2\t\tlist of CPUs to monitor(default=all)\n" ); } int main(int argc, char **argv) { int c; char *cpu_list = NULL; while ((c=getopt_long(argc, argv,"+vhVd:", rtop_cmd_options, 0)) != -1) { switch(c) { case 0: continue; /* fast path for options */ case 'v': options.opt_verbose = 1; break; case 1: case 'h': usage(); exit(0); case 2: case 'V': printf("rtop version " RTOP_VERSION " Date: " __DATE__ "\n" "Copyright (C) 2004 Hewlett-Packard Company\n"); exit(0); case 3: case 'd': if (options.opt_delay_set) fatal_error("cannot set delay twice\n"); options.opt_delay = atoi(optarg); if (options.opt_delay < 0) { fatal_error("invalid delay, must be >= 0\n"); } options.opt_delay_set = 1; break; case 4: if (cpu_list) fatal_error("cannot specify --cpu-list more than once\n"); if (*optarg == '\0') fatal_error("--cpu-list needs an argument\n"); cpu_list = optarg; break; case 5: if (options.outfile) fatal_error("cannot specify --outfile more than once\n"); if (*optarg == '\0') fatal_error("--outfile needs an argument\n"); options.outfile = optarg; break; default: fatal_error("unknown option\n"); } } /* * default refresh delay */ if (options.opt_delay_set == 0) options.opt_delay = 1; options.cpu_mhz = find_cpu_speed(); populate_cpumask(cpu_list); setup_measurement(); setup_signals(); setup_screen(); mainloop(); close_screen(); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/check_events.c0000600003276200002170000001016212247131122022105 0ustar ralphundrgrad/* * check_events.c - check if event assignment is possible * * Copyright (c) 2008 Google, Inc * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS #define MAX_PMU_NAME_LEN 32 static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } /* * The goal of this program is to exercise the event assignment * code for a specific PMU model. This program is independent of * the kernel API. */ int main(int argc, char **argv) { char **p; unsigned int i; int ret; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfmlib_options_t pfmlib_options; char model[MAX_PMU_NAME_LEN]; unsigned int num_counters; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 0; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); pfm_get_pmu_name(model, MAX_PMU_NAME_LEN); printf("PMU model: %s\n", model); pfm_get_num_counters(&num_counters); printf("%u counters available\n", num_counters); /* * prepare parameters to library. */ memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); /* * be nice to user! */ if (argc > 1) { p = argv+1; for (i=0; *p ; i++, p++) { ret = pfm_find_full_event(*p, &inp.pfp_events[i]); if (ret != PFMLIB_SUCCESS) fatal_error("event %s: %s\n", *p, pfm_strerror(ret)); } } else { if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; } /* * set the default privilege mode for all counters: * PFM_PLM3 : user level only */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); for (i=0; i < outp.pfp_pmc_count; i++) printf("PMC%u=0x%llx\n", outp.pfp_pmcs[i].reg_num, outp.pfp_pmcs[i].reg_value); for (i=0; i < outp.pfp_pmd_count; i++) printf("PMD%u\n", outp.pfp_pmds[i].reg_num); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/self_pipe.c0000600003276200002170000002141012247131122021410 0ustar ralphundrgrad/* * self_pipe.c - dual process ping-pong example to stress PMU context switch of one process * * Copyright (c) 2008 Stephane Eranian * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } /* * pin task to CPU */ #ifndef __NR_sched_setaffinity #error "you need to define __NR_sched_setaffinity" #endif #define MAX_CPUS 2048 #define NR_CPU_BITS (MAX_CPUS>>3) int pin_cpu(pid_t pid, unsigned int cpu) { uint64_t my_mask[NR_CPU_BITS]; if (cpu >= MAX_CPUS) fatal_error("this program supports only up to %d CPUs\n", MAX_CPUS); my_mask[cpu>>6] = 1ULL << (cpu&63); return syscall(__NR_sched_setaffinity, pid, sizeof(my_mask), &my_mask); } static volatile int quit; void sig_handler(int n) { quit = 1; } static void do_child(int fr, int fw) { char c; ssize_t ret; for(;;) { ret = read(fr, &c, 1); if (ret < 0) break; ret = write(fw, "c", 1); if (ret < 0) break; } printf("child exited\n"); exit(0); } int main(int argc, char **argv) { char **p; unsigned int i; int ret, ctx_fd; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pd[NUM_PMDS]; pfarg_pmr_t pc[NUM_PMCS]; pfarg_sinfo_t sif; pfmlib_options_t pfmlib_options; unsigned int num_counters; int pr[2], pw[2]; int which_cpu; pid_t pid; size_t len; ssize_t nbytes; char *name; char c = '0'; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 1; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); srandom(getpid()); which_cpu = random() % sysconf(_SC_NPROCESSORS_ONLN); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); ret = pipe(pr); if (ret) fatal_error("cannot create read pipe: %s\n", strerror(errno)); ret = pipe(pw); if (ret) fatal_error("cannot create write pipe: %s\n", strerror(errno)); pfm_get_max_event_name_len(&len); name = malloc(len+1); if (!name) fatal_error("cannot allocate event name buffer\n"); /* * Pin to CPU0, inherited by child process. That will enforce * the ping-pionging and thus stress the PMU context switch * which is what we want */ ret = pin_cpu(getpid(), which_cpu); if (ret) fatal_error("cannot pin to CPU%d: %s\n", which_cpu, strerror(errno)); printf("Both processes pinned to CPU%d\n", which_cpu); pfm_get_num_counters(&num_counters); memset(pd, 0, sizeof(pd)); memset(pc, 0, sizeof(pc)); /* * prepare parameters to library. */ memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(&sif,0, sizeof(sif)); /* * be nice to user! */ if (argc > 1) { p = argv+1; for (i=0; *p ; i++, p++) { ret = pfm_find_full_event(*p, &inp.pfp_events[i]); if (ret != PFMLIB_SUCCESS) fatal_error("event %s: %s\n", *p, pfm_strerror(ret)); } } else { if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; } /* * set the default privilege mode for all counters: * PFM_PLM3 : user level only */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; /* * now create a new per-thread session. * This just creates a new session with some initial state, it is not * active nor attached to any process. */ ctx_fd = pfm_create(0, &sif); if (ctx_fd == -1) { if (errno == ENOSYS) fatal_error("Your kernel does not have performance monitoring support!\n"); fatal_error("cannot create session%s\n", strerror(errno)); } /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certain PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); /* * Now prepare the argument to initialize the PMDs and PMCS. * We use pfp_pmc_count to determine the number of PMC to intialize. * We use pfp_pmd_count to determine the number of PMD to initialize. * Some events/features may cause extra PMCs to be used, leading to: * - pfp_pmc_count may be >= pfp_event_count * - pfp_pmd_count may be >= pfp_event_count */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for (i=0; i < outp.pfp_pmd_count; i++) { pd[i].reg_num = outp.pfp_pmds[i].reg_num; } /* * Now program the registers */ if (pfm_write(ctx_fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc))) fatal_error("pfm_write error errno %d\n",errno); if (pfm_write(ctx_fd, 0, PFM_RW_PMD, pd, outp.pfp_pmd_count * sizeof(*pd))) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * now we attached the session to ourself */ if (pfm_attach(ctx_fd, 0, getpid())) fatal_error("pfm_attach error errno %d\n",errno); /* * create second process which is not monitoring at the moment */ switch(pid=fork()) { case -1: fatal_error("cannot create child\n"); case 0: /* do not inherit session fd */ close(ctx_fd); /* pr[]: write master, read child */ /* pw[]: read master, write child */ close(pr[1]); close(pw[0]); do_child(pr[0], pw[1]); exit(1); } close(pr[0]); close(pw[1]); /* * Let's roll now */ if (pfm_set_state(ctx_fd, 0, PFM_ST_START)) fatal_error("pfm_set_state(start) error errno %d\n",errno); signal(SIGALRM, sig_handler); alarm(10); /* * ping pong loop */ while(!quit) { nbytes = write(pr[1], "c", 1); nbytes = read(pw[0], &c, 1); } (void)nbytes; if (pfm_set_state(ctx_fd, 0, PFM_ST_STOP)) fatal_error("pfm_set_state(stop) error errno %d\n",errno); /* * now read the results. We use pfp_event_count because * libpfm guarantees that counters for the events always * come first. */ if (pfm_read(ctx_fd, 0, PFM_RW_PMD, pd, inp.pfp_event_count * sizeof(*pd))) fatal_error( "pfm_read error errno %d\n",errno); /* * print the results */ for (i=0; i < inp.pfp_event_count; i++) { pfm_get_full_event_name(&inp.pfp_events[i], name, len+1); printf("PMD%-3u %20"PRIu64" %s\n", pd[i].reg_num, pd[i].reg_value, name); } free(name); /* * kill child process */ kill(SIGKILL, pid); /* * close pipes */ close(pr[1]); close(pw[0]); /* * and destroy our session */ close(ctx_fd); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/showevtinfo.c0000600003276200002170000001012712247131122022020 0ustar ralphundrgrad/* * showevtinfo.c - show event information * * Copyright (c) 2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #include #include #include #include #include #include #include #include #include #include static void fatal_error(char *fmt,...) __attribute__((noreturn)); static size_t max_len; static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static void show_event_info(char *name, unsigned int idx) { pfmlib_regmask_t cnt, impl_cnt; char *desc; unsigned int n1, n2, i, c; int code, prev_code = 0, first = 1; int ret; pfm_get_event_counters(idx, &cnt); pfm_get_num_counters(&n2); pfm_get_impl_counters(&impl_cnt); n1 = n2; printf("#-----------------------------\n" "Name : %s\n", name); pfm_get_event_description(idx, &desc); printf("Desc : %s\n", desc); free(desc); printf("Code :"); for (i=0; n1; i++) { if (pfm_regmask_isset(&impl_cnt, i)) n1--; if (pfm_regmask_isset(&cnt, i)) { pfm_get_event_code_counter(idx,i,&code); if (first == 1 || code != prev_code) { printf(" 0x%x", code); first = 0; } prev_code = code; } } putchar('\n'); n1 = n2; printf("Counters : [ "); for (i=0; n1; i++) { if (pfm_regmask_isset(&impl_cnt, i)) n1--; if (pfm_regmask_isset(&cnt, i)) printf("%d ", i); } puts("]"); pfm_get_num_event_masks(idx, &n1); for (i = 0; i < n1; i++) { ret = pfm_get_event_mask_name(idx, i, name, max_len+1); if (ret != PFMLIB_SUCCESS) continue; pfm_get_event_mask_description(idx, i, &desc); pfm_get_event_mask_code(idx, i, &c); printf("Umask-%02u : 0x%02x : [%s] : %s\n", i, c, name, desc); free(desc); } } #define MAX_PMU_NAME_LEN 32 int main(int argc, char **argv) { unsigned int i, count, match; int ret; char *name; regex_t preg; char model[MAX_PMU_NAME_LEN]; if (pfm_initialize() != PFMLIB_SUCCESS) fatal_error("PMU model not supported by library\n"); pfm_get_max_event_name_len(&max_len); name = malloc(max_len+1); if (name == NULL) fatal_error("cannot allocate name buffer\n"); pfm_get_num_events(&count); if (argc == 1) *argv = ".*"; /* match everything */ else ++argv; pfm_get_pmu_name(model, MAX_PMU_NAME_LEN); printf("PMU model: %s\n", model); while(*argv) { if (regcomp(&preg, *argv, REG_ICASE|REG_NOSUB)) fatal_error("error in regular expression for event \"%s\"\n", *argv); match = 0; for(i=0; i < count; i++) { ret = pfm_get_event_name(i, name, max_len+1); /* skip unsupported events */ if (ret != PFMLIB_SUCCESS) continue; if (regexec(&preg, name, 0, NULL, 0) == 0) { show_event_info(name, i); match++; } } if (match == 0) fatal_error("event %s not found\n", *argv); argv++; } free(name); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/notify_self3.c0000600003276200002170000002051612247131122022054 0ustar ralphundrgrad/* * notify_self3.c - example of how you can use overflow notifications with no messages * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define SMPL_PERIOD 1000000000ULL static volatile unsigned long notification_received; #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static pfarg_pmr_t pdx[1]; static int ctx_fd; static char *event1_name; static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static void sigio_handler(int n, struct siginfo *info, struct sigcontext *sc) { if (pfm_read(ctx_fd, 0, PFM_RW_PMD, pdx, sizeof(pdx))) fatal_error("pfm_read: %s", strerror(errno)); /* * we do not need to extract the overflow message, we know * where it is coming from. */ /* * increment our notification counter */ notification_received++; /* * XXX: risky to do printf() in signal handler! */ if (event1_name) printf("Notification %02lu: %"PRIu64" %s\n", notification_received, pdx[0].reg_value, event1_name); else printf("Notification %02lu:\n", notification_received); /* * And resume monitoring */ if (pfm_set_state(ctx_fd, 0, PFM_ST_RESTART)) fatal_error("error pfm_set_state(restart): %d\n", errno); } /* * infinite loop waiting for notification to get out */ void busyloop(void) { /* * busy loop to burn CPU cycles */ for(;notification_received < 40;) ; } #define BPL (sizeof(uint64_t)<<3) #define LBPL 6 static inline void pfm_bv_set(uint64_t *bv, uint16_t rnum) { bv[rnum>>LBPL] |= 1UL << (rnum&(BPL-1)); } int main(int argc, char **argv) { int ret; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[NUM_PMCS]; pfarg_pmd_attr_t pd[NUM_PMDS]; pfarg_sinfo_t sif; pfmlib_options_t pfmlib_options; struct sigaction act; size_t len; unsigned int i, num_counters; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 1; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); /* * Install the signal handler (SIGIO) */ memset(&act, 0, sizeof(act)); act.sa_handler = (sig_t)sigio_handler; sigaction (SIGIO, &act, 0); memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(&sif,0, sizeof(sif)); pfm_get_num_counters(&num_counters); if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; /* * set the default privilege mode for all counters: * PFM_PLM3 : user level only */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; if (i > 1) { pfm_get_max_event_name_len(&len); event1_name = malloc(len+1); if (event1_name == NULL) fatal_error("cannot allocate event name\n"); pfm_get_full_event_name(&inp.pfp_events[1], event1_name, len+1); } /* * when we know we are self-monitoring and we have only one session, then * when we get an overflow we know where it is coming from. Therefore we can * save the call to the kernel to extract the notification message. By default, * a message is generated. The queue of messages has a limited size, therefore * it is important to clear the queue by reading the message on overflow. Failure * to do so may result in a queue full and you will lose notification messages. * * With the PFM_FL_OVFL_NO_MSG, no message will be queue, but you will still get * the signal. Similarly, the PFM_MSG_END will be generated. */ ctx_fd = pfm_create(PFM_FL_OVFL_NO_MSG, &sif); if (ctx_fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("Cannot configure events: %s\n", pfm_strerror(ret)); /* * Now prepare the argument to initialize the PMDs and PMCS. */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for (i=0; i < outp.pfp_pmd_count; i++) pd[i].reg_num = outp.pfp_pmds[i].reg_num; /* * We want to get notified when the counter used for our first * event overflows */ pd[0].reg_flags |= PFM_REGFL_OVFL_NOTIFY; if (inp.pfp_event_count > 1) { pfm_bv_set(pd[0].reg_reset_pmds, pd[1].reg_num); pdx[0].reg_num = pd[1].reg_num; } /* * we arm the first counter, such that it will overflow * after SMPL_PERIOD events have been observed */ pd[0].reg_value = - SMPL_PERIOD; pd[0].reg_long_reset = - SMPL_PERIOD; pd[0].reg_short_reset = - SMPL_PERIOD; /* * Now program the registers */ if (pfm_write(ctx_fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc))) fatal_error("pfm_write error errno %d\n",errno); if (pfm_write(ctx_fd, 0, PFM_RW_PMD_ATTR, pd, outp.pfp_pmd_count * sizeof(*pd))) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * we want to monitor ourself */ if (pfm_attach(ctx_fd, 0, getpid())) fatal_error("pfm_attach error errno %d\n",errno); /* * setup asynchronous notification on the file descriptor */ ret = fcntl(ctx_fd, F_SETFL, fcntl(ctx_fd, F_GETFL, 0) | O_ASYNC); if (ret == -1) fatal_error("cannot set ASYNC: %s\n", strerror(errno)); /* * get ownership of the descriptor */ ret = fcntl(ctx_fd, F_SETOWN, getpid()); if (ret == -1) fatal_error("cannot setown: %s\n", strerror(errno)); /* * Let's roll now */ if (pfm_set_state(ctx_fd, 0, PFM_ST_START)) fatal_error("pfm_set_state(start) error errno %d\n", errno); busyloop(); if (pfm_set_state(ctx_fd, 0, PFM_ST_STOP)) fatal_error("pfm_set_state(stop) error errno %d\n", errno); /* * destroy our session */ close(ctx_fd); if (event1_name) free(event1_name); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/notify_self_fork.c0000600003276200002170000002150712247131122023013 0ustar ralphundrgrad/* * notify_self_fork.c - example of how you can use overflow notifications across fork * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * Modified by Phil Mucci to add the fork() * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define SMPL_PERIOD 1000000000ULL static volatile unsigned long notification_received; #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static pfarg_pmr_t pdx[1]; static int ctx_fd; static char *event1_name; static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static void warning(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); } static void sigio_handler(int n, struct siginfo *info, struct sigcontext *sc) { pfarg_msg_t msg; int fd = ctx_fd; int r; if (fd != ctx_fd) fatal_error("handler does not get valid file descriptor\n"); if (event1_name && pfm_read(fd, 0, PFM_RW_PMD, pdx, sizeof(*pdx)) == -1) fatal_error("pfm_read(PMD): %s", strerror(errno)); retry: r = read(fd, &msg, sizeof(msg)); if (r != sizeof(msg)) { if(r == -1 && errno == EINTR) { warning("read interrupted, retrying\n"); goto retry; } fatal_error("cannot read overflow message: %s\n", strerror(errno)); } if (msg.type != PFM_MSG_OVFL) fatal_error("unexpected msg type: %d\n",msg.type); /* * increment our notification counter */ notification_received++; /* * XXX: risky to do printf() in signal handler! */ if (event1_name) printf("Notification %lu: %"PRIu64" %s ip=0x%llx\n", notification_received, pdx[0].reg_value, event1_name, (unsigned long long)msg.pfm_ovfl_msg.msg_ovfl_ip); else printf("Notification %lu ip=0x%llx\n", notification_received, (unsigned long long)msg.pfm_ovfl_msg.msg_ovfl_ip); fflush(stdout); /* * And resume monitoring */ if (pfm_set_state(fd, 0, PFM_ST_RESTART) == -1) fatal_error("pfm_set_state(restart): %d\n", errno); } /* * infinite loop waiting for notification to get out */ void busyloop(void) { /* * busy loop to burn CPU cycles */ for(;notification_received < 3;) ; /* * forking causes the context to be shared with the child * When the child terminates, it closes its descriptor. * The parent's remains and notification keep on coming. */ if (fork() == 0) { printf("child terminates\n"); fflush(stdout); exit(0); } printf("after fork\n"); fflush(stdout); for(;notification_received < 6;) ; } #define BPL (sizeof(uint64_t)<<3) #define LBPL 6 static inline void pfm_bv_set(uint64_t *bv, uint16_t rnum) { bv[rnum>>LBPL] |= 1UL << (rnum&(BPL-1)); } int main(int argc, char **argv) { pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[NUM_PMCS]; pfarg_pmd_attr_t pd[NUM_PMDS]; pfarg_sinfo_t sif; pfmlib_options_t pfmlib_options; struct sigaction act; unsigned int i, num_counters; size_t len; int ret; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 1; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); /* * Install the signal handler (SIGIO) */ memset(&act, 0, sizeof(act)); act.sa_handler = (sig_t)sigio_handler; sigaction (SIGIO, &act, 0); memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(&sif,0, sizeof(sif)); pfm_get_num_counters(&num_counters); if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; /* * set the default privilege mode for all counters: * PFM_PLM3 : user level only */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } inp.pfp_event_count = i; /* * how many counters we use */ if (i > 1) { pfm_get_max_event_name_len(&len); event1_name = malloc(len+1); if (event1_name == NULL) fatal_error("cannot allocate event name\n"); pfm_get_full_event_name(&inp.pfp_events[1], event1_name, len+1); } /* * now create the context for self monitoring/per-task */ ctx_fd = pfm_create(0, &sif); if (ctx_fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("Cannot configure events: %s\n", pfm_strerror(ret)); /* * Now prepare the argument to initialize the PMDs and PMCS. */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for (i=0; i < outp.pfp_pmd_count; i++) pd[i].reg_num = outp.pfp_pmds[i].reg_num; /* * We want to get notified when the counter used for our first * event overflows */ pd[0].reg_flags |= PFM_REGFL_OVFL_NOTIFY; /* * nothing to sample when only one counter */ if (inp.pfp_event_count > 1) { pfm_bv_set(pd[0].reg_reset_pmds, pd[1].reg_num); pdx[0].reg_num = pd[1].reg_num; } /* * we arm the first counter, such that it will overflow * after SMPL_PERIOD events have been observed */ pd[0].reg_value = - SMPL_PERIOD; pd[0].reg_long_reset = - SMPL_PERIOD; pd[0].reg_short_reset = - SMPL_PERIOD; /* * Now program the registers */ if (pfm_write(ctx_fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc))) fatal_error("pfm_write error errno %d\n",errno); if (pfm_write(ctx_fd, 0, PFM_RW_PMD_ATTR, pd, outp.pfp_pmd_count * sizeof(*pd))) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * we want to monitor ourself */ if (pfm_attach(ctx_fd, 0, getpid())) fatal_error("pfm_attach error errno %d\n",errno); /* * setup asynchronous notification on the file descriptor */ ret = fcntl(ctx_fd, F_SETFL, fcntl(ctx_fd, F_GETFL, 0) | O_ASYNC); if (ret == -1) fatal_error("cannot set ASYNC: %s\n", strerror(errno)); /* * get ownership of the descriptor */ ret = fcntl(ctx_fd, F_SETOWN, getpid()); if (ret == -1) fatal_error("cannot setown: %s\n", strerror(errno)); /* * Let's roll now */ if (pfm_set_state(ctx_fd, 0, PFM_ST_START)) fatal_error("pfm_set_state(start) error errno %d\n",errno); busyloop(); if (pfm_set_state(ctx_fd, 0, PFM_ST_STOP)) fatal_error("pfm_set_state(stop) error errno %d\n",errno); /* * free our context */ close(ctx_fd); if (event1_name) free(event1_name); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/self_smpl_multi.c0000600003276200002170000002120012247131122022635 0ustar ralphundrgrad/* * * self_smpl_multi.c - multi-thread self-sampling program * * Copyright (c) 2008 Mark W. Krentel * Contributed by Mark W. Krentel * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. * * Test perfmon overflow without PAPI. * * Create a new thread, launch perfmon overflow counters in both * threads, print the number of interrupts per thread and per second, * and look for anomalous interrupts. Look for mismatched thread * ids, bad message type, or failed pfm_restart(). */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define PROGRAM_TIME 8 #define THRESHOLD 20000000 int program_time = PROGRAM_TIME; int threshold = THRESHOLD; int signum = SIGIO; #define MAX_FD 20 struct over_args { pfmlib_event_t ev; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[PFMLIB_MAX_PMCS]; pfarg_pmd_attr_t pd[PFMLIB_MAX_PMDS]; int fd; pid_t tid; pthread_t self; }; struct over_args *fd2ov[MAX_FD]; long count[MAX_FD]; long total[MAX_FD]; long iter[MAX_FD]; long mismatch[MAX_FD]; long bad_msg[MAX_FD]; long bad_restart[MAX_FD]; long ser_no = 0; pid_t gettid(void) { #ifdef SYS_gettid return (pid_t)syscall(SYS_gettid); #elif defined(__NR_gettid) return (pid_t)syscall(__NR_gettid); #else #error "Unable to implement gettid." #endif } void user_callback(int fd) { count[fd]++; total[fd]++; ser_no++; } void do_cycles(void) { struct timeval start, last, now; double x, sum; int fd; for (fd = 0; fd < MAX_FD; fd++) { if (fd2ov[fd] != NULL && pthread_equal(fd2ov[fd]->self, pthread_self())) { break; } } gettimeofday(&start, NULL); last = start; count[fd] = 0; total[fd] = 0; iter[fd] = 0; do { sum = 1.0; for (x = 1.0; x < 250000.0; x += 1.0) sum += x; if (sum < 0.0) printf("==>> SUM IS NEGATIVE !! <<==\n"); iter[fd]++; gettimeofday(&now, NULL); if (now.tv_sec > last.tv_sec) { printf("%ld: fd = %d, count = %4ld, iter = %4ld, rate = %ld/Kiter\n", now.tv_sec - start.tv_sec, fd, count[fd], iter[fd], (1000 * count[fd])/iter[fd]); count[fd] = 0; iter[fd] = 0; last = now; } } while (now.tv_sec < start.tv_sec + program_time); } #define DPRINT(str) \ printf("(%s) ser = %ld, fd = %d, tid = %d, self = %p\n", \ str, ser_no, fd, tid, (void *)self) void sigio_handler(int sig, siginfo_t *info, void *context) { int fd; pid_t tid; pthread_t self; struct over_args *ov; pfarg_msg_t msg; /* * file descripor is the only reliable source of information * to identify session from which the notification originated * * Depending on scheduling, the signal may not be processed by the * thread which posted it, i.e., the thread which had the nortification * * POSIX aysnchronous signals cannot be targeted to specific threads */ fd = info->si_fd; self = pthread_self(); tid = gettid(); ov = fd2ov[fd]; if (fd < 0 || fd >= MAX_FD) errx(1, "bad info.si_fd: %d", fd); /* * current thread id may not always match the id associated with * the dfile descriptor */ if (tid != ov->tid || !pthread_equal(self, ov->self)) { mismatch[fd]++; DPRINT("bad thread"); } pfm_read(fd, 0, PFM_RW_PMD, &ov->pd[1], sizeof(pfarg_pmd_attr_t)); /* * extract notification message */ if (read(fd, &msg, sizeof(msg)) != sizeof(msg)) errx(1, "read from sigio fd failed"); /* * cannot be PFM_END_MSG starting with perfmon v2.8 */ if (msg.type == PFM_MSG_END) { DPRINT("pfm_msg_end"); } else if (msg.type != PFM_MSG_OVFL) { bad_msg[fd]++; DPRINT("bad msg type"); } user_callback(fd); /* * when session is not that of the current thread, pfm_restart() does * not guarante that upon return monitoring will be resumed. There * may be a delay due to scheduling. */ if (pfm_set_state(fd, 0, PFM_ST_RESTART) != 0) { bad_restart[fd]++; DPRINT("bad restart"); } } void overflow_start(struct over_args *ov, char *name) { int i, fd, flags; pfarg_sinfo_t sif; memset(ov, 0, sizeof(struct over_args)); memset(&sif, 0, sizeof(sif)); if (pfm_get_cycle_event(&ov->ev) != PFMLIB_SUCCESS) errx(1, "pfm_get_cycle_event failed"); ov->inp.pfp_event_count = 1; ov->inp.pfp_dfl_plm = PFM_PLM3; ov->inp.pfp_flags = 0; ov->inp.pfp_events[0] = ov->ev; fd = pfm_create(0, &sif); if (fd < 0) errx(1, "pfm_create_session failed"); fd2ov[fd] = ov; ov->fd = fd; ov->tid = gettid(); ov->self = pthread_self(); detect_unavail_pmu_regs(&sif, &ov->inp.pfp_unavail_pmcs, NULL); if (pfm_dispatch_events(&ov->inp, NULL, &ov->outp, NULL) != PFMLIB_SUCCESS) errx(1, "pfm_dispatch_events failed"); for (i = 0; i < ov->outp.pfp_pmc_count; i++) { ov->pc[i].reg_num = ov->outp.pfp_pmcs[i].reg_num; ov->pc[i].reg_value = ov->outp.pfp_pmcs[i].reg_value; } for (i = 0; i < ov->outp.pfp_pmd_count; i++) { ov->pd[i].reg_num = ov->outp.pfp_pmds[i].reg_num; } ov->pd[0].reg_flags |= PFM_REGFL_OVFL_NOTIFY; ov->pd[0].reg_value = - threshold; ov->pd[0].reg_long_reset = - threshold; ov->pd[0].reg_short_reset = - threshold; if (pfm_write(fd, 0, PFM_RW_PMC, ov->pc, ov->outp.pfp_pmc_count * sizeof(pfarg_pmr_t))) errx(1, "pfm_write failed"); if (pfm_write(fd, 0, PFM_RW_PMD_ATTR, ov->pd, ov->outp.pfp_pmd_count * sizeof(pfarg_pmd_attr_t))) errx(1, "pfm_write(PMD) failed"); if (pfm_attach(fd, 0, gettid()) != 0) errx(1, "pfm_attach failed"); flags = fcntl(fd, F_GETFL, 0); if (fcntl(fd, F_SETFL, flags | O_ASYNC) < 0) errx(1, "fcntl SETFL failed"); if (fcntl(fd, F_SETOWN, gettid()) < 0) errx(1, "fcntl SETOWN failed"); if (fcntl(fd, F_SETSIG, signum) < 0) errx(1, "fcntl SETSIG failed"); if (pfm_set_state(fd, 0, PFM_ST_START)) errx(1, "pfm_set_state(start) failed"); printf("launch %s: fd: %d, tid: %d, self: %p\n", name, fd, ov->tid, (void *)ov->self); } void overflow_stop(struct over_args *ov) { if (pfm_set_state(ov->fd, 0, PFM_ST_STOP)) errx(1, "pfm_set_state(stop) failed"); } void * my_thread(void *v) { struct over_args ov; overflow_start(&ov, "side"); do_cycles(); overflow_stop(&ov); return (NULL); } /* * Program args: program_time, threshold, signum. */ int main(int argc, char **argv) { pthread_t thr; struct over_args ov; struct sigaction sa; sigset_t set; int i; if (argc < 2 || sscanf(argv[1], "%d", &program_time) < 1) program_time = PROGRAM_TIME; if (argc < 3 || sscanf(argv[2], "%d", &threshold) < 1) threshold = THRESHOLD; if (argc < 4 || sscanf(argv[3], "%d", &signum) < 1) signum = SIGIO; printf("program_time = %d, threshold = %d, signum = %d\n", program_time, threshold, signum); for (i = 0; i < MAX_FD; i++) { fd2ov[i] = NULL; mismatch[i] = 0; bad_msg[i] = 0; bad_restart[i] = 0; } memset(&sa, 0, sizeof(sa)); sigemptyset(&set); sa.sa_sigaction = sigio_handler; sa.sa_mask = set; sa.sa_flags = SA_SIGINFO; if (sigaction(signum, &sa, NULL) != 0) errx(1, "sigaction failed"); if (pfm_initialize() != PFMLIB_SUCCESS) errx(1, "pfm_initialize failed"); printf("\n"); if (pthread_create(&thr, NULL, my_thread, NULL) != 0) errx(1, "pthread_create failed"); overflow_start(&ov, "main"); do_cycles(); overflow_stop(&ov); printf("\n"); for (i = 0; i < MAX_FD; i++) { if (fd2ov[i] != NULL) { printf("total[%d] = %ld, mismatch[%d] = %ld, " "bad_msg[%d] = %ld, bad_restart[%d] = %ld\n", i, total[i], i, mismatch[i], i, bad_msg[i], i, bad_restart[i]); } } return (0); } papi-5.3.0/src/libpfm-3.y/examples_v3.x/whichpmu.c0000600003276200002170000001125712247131122021276 0ustar ralphundrgrad/* * whichpmu.c - example of how to figure out the host PMU model detected by libpfm * also shows how to detect which PMU registers are available to * applications * * Copyright (c) 2002-2007 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include "detect_pmcs.h" #define MAX_PMU_NAME_LEN 32 int main(void) { pfmlib_regmask_t impl_pmds, impl_pmcs; pfmlib_regmask_t avail_pmcs, avail_pmds; pfmlib_regmask_t impl_counters, una_pmcs, una_pmds; pfarg_sinfo_t sif; unsigned int n, num_pmds, num_pmcs, num_counters, num_events; unsigned int width = 0; unsigned int i; int ret; char model[MAX_PMU_NAME_LEN]; /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) { printf("Can't initialize library\n"); return 1; } /* * Now simply print the CPU model detected by pfmlib * * When the CPU model is not directly supported AND the generic support * is compiled into the library, the detected will yield "Generic" which * mean that only the architected features will be supported. * * This call can be used to tune applications based on the detected host * CPU model. This is useful because some features are CPU model specific, * such as address range restriction which is an Itanium feature. * */ pfm_get_pmu_name(model, MAX_PMU_NAME_LEN); pfm_get_hw_counter_width(&width); pfm_get_impl_pmds(&impl_pmds); pfm_get_impl_pmcs(&impl_pmcs); pfm_get_impl_counters(&impl_counters); pfm_get_num_events(&num_events); pfm_get_num_pmds(&num_pmds); pfm_get_num_pmcs(&num_pmcs); pfm_get_num_counters(&num_counters); detect_unavail_pmu_regs(&sif, &una_pmcs, &una_pmds); pfm_regmask_andnot(&avail_pmcs, &impl_pmcs, &una_pmcs); pfm_regmask_andnot(&avail_pmds, &impl_pmds, &una_pmds); printf("PMU model detected by pfmlib: %s\n", model); printf("number of implemented PMD registers : %u\n", num_pmds); printf("implemented PMD registers : [ "); for (i=0, n = num_pmds; n; i++) { if (pfm_regmask_isset(&impl_pmds, i) == 0) continue; printf("%u ", i); n--; } pfm_regmask_weight(&avail_pmds, &n); printf("]\nnumber of available PMD registers : %u\n", n); printf("available PMD registers : [ "); for (i=0; n ; i++) { if (pfm_regmask_isset(&avail_pmds, i) == 0) continue; printf("%u ", i); n--; } printf("]\nnumber of implemented PMC registers : %u\n", num_pmcs); printf("implemented PMC registers : [ "); for (i=0, n = num_pmcs; n; i++) { if (pfm_regmask_isset(&impl_pmcs, i) == 0) continue; printf("%u ", i); n--; } pfm_regmask_weight(&avail_pmcs, &n); printf("]\nnumber of available PMC registers : %u\n", n); printf("available PMC registers : [ "); for (i=0; n; i++) { if (pfm_regmask_isset(&avail_pmcs, i) == 0) continue; printf("%u ", i); n--; } printf("]\nnumber of counters : %u\n", num_counters); printf("implemented counters : [ "); for (i=0; num_counters; i++) { if (pfm_regmask_isset(&impl_counters, i) == 0) continue; printf("%u ", i); num_counters--; } pfm_regmask_andnot(&avail_pmds, &impl_counters, &una_pmds); pfm_regmask_weight(&avail_pmds, &n); printf("]\nnumber of available counters : %u\n", n); printf("available counters : [ "); for (i=0; n ; i++) { if (pfm_regmask_isset(&avail_pmds, i) == 0) continue; printf("%u ", i); n--; } printf("]\nhardware counter width : %u\n", width); printf("number of events supported : %u\n", num_events); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/task_smpl_user.c0000600003276200002170000003153712247131122022510 0ustar ralphundrgrad/* * task_smpl_user.c - example of a task collecting a profile from user level * * Copyright (c) 2005-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define SAMPLING_PERIOD 100000 #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS typedef struct { int opt_no_show; int opt_block; int opt_sys; } options_t; static uint64_t collected_samples; static pfarg_pmd_attr_t pd[NUM_PMDS]; static unsigned int num_pmds; static options_t options; static volatile int terminate; static struct option the_options[]={ { "help", 0, 0, 1}, { "ovfl-block", 0, &options.opt_block, 1}, { "no-show", 0, &options.opt_no_show, 1}, { "system-wide", 0, &options.opt_sys, 1}, { 0, 0, 0, 0} }; static void fatal_error(char *fmt,...) __attribute__((noreturn)); #define BPL (sizeof(uint64_t )<<3) #define LBPL 6 static inline void pfm_bv_set(uint64_t *bv, uint16_t rnum) { bv[rnum>>LBPL] |= 1UL << (rnum&(BPL-1)); } static inline int pfm_bv_isset(uint64_t *bv, uint16_t rnum) { return bv[rnum>>LBPL] & (1UL <<(rnum&(BPL-1))) ? 1 : 0; } static inline void pfm_bv_copy(uint64_t *d, uint64_t *j, uint16_t n) { if (n <= BPL) *d = *j; else { memcpy(d, j, (n>>LBPL)*sizeof(uint64_t)); } } /* * pin task to CPU */ #ifndef __NR_sched_setaffinity #error "you need to define __NR_sched_setaffinity" #endif #define MAX_CPUS 2048 #define NR_CPU_BITS (MAX_CPUS>>3) int pin_cpu(pid_t pid, unsigned int cpu) { uint64_t my_mask[NR_CPU_BITS]; if (cpu >= MAX_CPUS) fatal_error("this program supports only up to %d CPUs\n", MAX_CPUS); my_mask[cpu>>6] = 1ULL << (cpu&63); return syscall(__NR_sched_setaffinity, pid, sizeof(my_mask), &my_mask); } static void warning(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); } static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } int child(char **arg) { if (options.opt_sys) { printf("child pinned on CPU0\n"); pin_cpu(getpid(), 0); } /* * force the task to stop before executing the first * user level instruction */ ptrace(PTRACE_TRACEME, 0, NULL, NULL); execvp(arg[0], arg); /* not reached */ exit(1); } void show_task_rusage(const struct timeval *start, const struct timeval *end, const struct rusage *ru) { long secs, suseconds, end_usec; secs = end->tv_sec - start->tv_sec; end_usec = end->tv_usec; if (end_usec < start->tv_usec) { end_usec += 1000000; secs--; } suseconds = end_usec - start->tv_usec; printf ("real %ldh%02ldm%02ld.%03lds user %ldh%02ldm%02ld.%03lds sys %ldh%02ldm%02ld.%03lds\n", secs / 3600, (secs % 3600) / 60, secs % 60, suseconds / 1000, ru->ru_utime.tv_sec / 3600, (ru->ru_utime.tv_sec % 3600) / 60, ru->ru_utime.tv_sec% 60, (long)(ru->ru_utime.tv_usec / 1000), ru->ru_stime.tv_sec / 3600, (ru->ru_stime.tv_sec % 3600) / 60, ru->ru_stime.tv_sec% 60, (long)(ru->ru_stime.tv_usec / 1000) ); } static void process_sample(int fd, unsigned long ip, pid_t pid, pid_t tid, uint16_t cpu) { unsigned int j; if (pfm_read(fd, 0, PFM_RW_PMD_ATTR, pd, num_pmds * sizeof(*pd))) fatal_error("pfm_read(PMD) error errno %d\n",errno); if (options.opt_no_show) goto done; printf("entry %"PRIu64" PID:%d TID: %d CPU:%u LAST_VAL: %"PRIu64" IIP:0x%lx\n", collected_samples, pid, tid, cpu, - pd[0].reg_last_value, ip); for(j=1; j < num_pmds; j++) { printf("PMD%-2d = %"PRIu64"\n", pd[j].reg_num, pd[j].reg_value); } done: collected_samples++; } static void cld_handler(int n) { terminate = 1; } int mainloop(char **arg) { pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[NUM_PMCS]; pfarg_sinfo_t sif; struct timeval start_time, end_time; struct rusage rusage; pfarg_msg_t msg; uint64_t ovfl_count = 0; uint32_t ctx_flags = 0; pid_t pid; int status, ret, fd; unsigned int i, num_counters; /* * intialize all locals */ memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(pc, 0, sizeof(pc)); memset(&sif,0, sizeof(sif)); pfm_get_num_counters(&num_counters); /* * locate events */ if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; /* * set the privilege mode: * PFM_PLM3 : user level * PFM_PLM0 : kernel level */ inp.pfp_dfl_plm = PFM_PLM3; printf("measuring at plm=0x%x\n", inp.pfp_dfl_plm); if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } /* * how many counters we use */ inp.pfp_event_count = i; inp.pfp_flags = options.opt_sys ? PFMLIB_PFP_SYSTEMWIDE : 0; /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ get_sif(options.opt_sys? PFM_FL_SYSTEM_WIDE:0, &sif); detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events: %s\n", pfm_strerror(ret)); /* * Now prepare the argument to initialize the PMDs and PMCS. * We use pfp_pmc_count to determine the number of PMC to intialize. * We use pfp_pmd_count to determine the number of PMD to initialize. * Some events/features may cause extra PMCs to be used, leading to: * - pfp_pmc_count may be >= pfp_event_count * - pfp_pmd_count may be >= pfp_event_count */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for (i=0; i < outp.pfp_pmd_count; i++) { pd[i].reg_num = outp.pfp_pmds[i].reg_num; /* * we also want to reset the other PMDs on * every overflow. If we do not set * this, the non-overflowed counters * will be untouched. */ if (i) pfm_bv_set(pd[0].reg_reset_pmds, pd[i].reg_num); } /* * we our sampling counter overflow, we want to be notified. * The notification will come ONLY when the sampling buffer * becomes full. * * We also activate randomization of the sampling period. */ pd[0].reg_flags |= PFM_REGFL_OVFL_NOTIFY | PFM_REGFL_RANDOM; pd[0].reg_value = - SAMPLING_PERIOD; pd[0].reg_short_reset = - SAMPLING_PERIOD; pd[0].reg_long_reset = - SAMPLING_PERIOD; /* * setup randomization parameters, we allow a range of up to +256 here. */ pd[0].reg_random_mask = 0xff; printf("programming %u PMCS and %u PMDS\n", outp.pfp_pmc_count, inp.pfp_event_count); /* * prepare session flags */ if (options.opt_sys) { if (options.opt_block) fatal_error("blocking mode not supported in system-wide\n"); printf("system-wide monitoring on CPU0\n"); pin_cpu(getpid(), 0); ctx_flags |= PFM_FL_SYSTEM_WIDE; } if (options.opt_block) ctx_flags |= PFM_FL_NOTIFY_BLOCK; /* * now create perfmon session */ fd = pfm_create(ctx_flags, NULL); if (fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * Now program the registers */ if (pfm_write(fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc))) fatal_error("pfm_write error errno %d\n",errno); /* * initialize the PMDs * To be read, each PMD must be either written or declared * as being part of a sample (reg_smpl_pmds) */ if (pfm_write(fd, 0, PFM_RW_PMD_ATTR, pd, outp.pfp_pmd_count * sizeof(*pd))) fatal_error("pfm_write(PMD) error errno %d\n",errno); num_pmds = outp.pfp_pmd_count; signal(SIGCHLD, SIG_IGN); /* * Create the child task */ if ((pid=fork()) == -1) fatal_error("Cannot fork process\n"); /* * In order to get the PFM_END_MSG message, it is important * to ensure that the child task does not inherit the file * descriptor of the session. By default, file descriptor * are inherited during exec(). We explicitely close it * here. We could have set it up through fcntl(FD_CLOEXEC) * to achieve the same thing. */ if (pid == 0) { close(fd); child(arg); } /* * wait for the child to exec */ waitpid(pid, &status, WUNTRACED); /* * process is stopped at this point */ if (WIFEXITED(status)) { warning("task %s [%d] exited already status %d\n", arg[0], pid, WEXITSTATUS(status)); goto terminate_session; } /* * attach to either pid or CPU0 */ if (pfm_attach(fd, 0, options.opt_sys ? 0 : pid)) fatal_error("pfm_attach error errno %d\n",errno); /* * activate monitoring for stopped task. * (nothing will be measured at this point */ if (pfm_set_state(fd, 0, PFM_ST_START)) fatal_error("pfm_set_state(start) error errno %d\n",errno); if (options.opt_sys) signal(SIGCHLD, cld_handler); /* * detach child. Side effect includes * activation of monitoring. */ ptrace(PTRACE_DETACH, pid, NULL, 0); gettimeofday(&start_time, NULL); /* * core loop */ while(terminate == 0) { /* * wait for overflow/end notification messages */ ret = read(fd, &msg, sizeof(msg)); if (ret == -1) { if (errno != EINTR) fatal_error("cannot read perfmon msg: %s\n", strerror(errno)); continue; } switch(msg.type) { case PFM_MSG_OVFL: /* one sample to process */ process_sample(fd, msg.pfm_ovfl_msg.msg_ovfl_ip, msg.pfm_ovfl_msg.msg_ovfl_pid, msg.pfm_ovfl_msg.msg_ovfl_tid, msg.pfm_ovfl_msg.msg_ovfl_cpu); ovfl_count++; if (pfm_set_state(fd, 0, PFM_ST_RESTART) == -1) { if (errno != EBUSY) fatal_error("pfm_set_state(restart) error errno %d\n",errno); } break; case PFM_MSG_END: /* monitored task terminated (not for system-wide) */ printf("task terminated\n"); terminate = 1; break; default: fatal_error("unknown message type %d\n", msg.type); } } terminate_session: /* * cleanup child */ wait4(pid, &status, 0, &rusage); gettimeofday(&end_time, NULL); /* * destroy perfmon session */ close(fd); printf("%"PRIu64" samples collected in %"PRIu64" buffer overflows\n", collected_samples, ovfl_count); show_task_rusage(&start_time, &end_time, &rusage); return 0; } static void usage(void) { printf("usage: task_smpl [-h] [--help] [--no-show] [--ovfl-block] cmd\n"); } int main(int argc, char **argv) { pfmlib_options_t pfmlib_options; int c; while ((c=getopt_long(argc, argv,"h", the_options, 0)) != -1) { switch(c) { case 0: continue; case 1: case 'h': usage(); exit(0); default: fatal_error(""); } } if (argv[optind] == NULL) { fatal_error("You must specify a command to execute\n"); } /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 0; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ if (pfm_initialize() != PFMLIB_SUCCESS) { fatal_error("Can't initialize library\n"); } return mainloop(argv+optind); } papi-5.3.0/src/libpfm-3.y/examples_v3.x/Makefile0000600003276200002170000000555512247131122020752 0ustar ralphundrgrad# # Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. # Contributed by Stephane Eranian # # Permission is hereby granted, free of charge, to any person obtaining a copy # of this software and associated documentation files (the "Software"), to deal # in the Software without restriction, including without limitation the rights # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies # of the Software, and to permit persons to whom the Software is furnished to do so, # subject to the following conditions: # # The above copyright notice and this permission notice shall be included in all # copies or substantial portions of the Software. # # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF # CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE # OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. # TOPDIR := $(shell if [ "$$PWD" != "" ]; then echo $$PWD; else pwd; fi)/.. include $(TOPDIR)/config.mk include $(TOPDIR)/rules.mk DIRS= ifeq ($(ARCH),ia64) DIRS +=ia64 endif ifeq ($(ARCH),ia32) DIRS +=x86 endif ifeq ($(ARCH),x86_64) DIRS +=x86 endif ifeq ($(CONFIG_PFMLIB_ARCH_CRAYXT),y) CFLAGS += -DCONFIG_PFMLIB_ARCH_CRAYXT endif CFLAGS+= -I. -D_GNU_SOURCE LIBS += -lm ifeq ($(SYS),Linux) CFLAGS+= -pthread LIBS += -lrt endif TARGET_GEN=showevtinfo check_events ifeq ($(SYS),Linux) TARGET_LINUX +=self task task_attach task_attach_timeout syst \ notify_self notify_self2 notify_self3 \ multiplex multiplex2 set_notify whichpmu \ showreginfo task_smpl task_smpl_user \ pfmsetup self_smpl_multi self_pipe \ notify_self_fork XTRA += rtop endif all: $(TARGET_GEN) $(TARGET_LINUX) $(XTRA) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done # Many systems don't have ncurses installed rtop: rtop.o detect_pmcs.o $(PFMLIB) -$(CC) $(CFLAGS) $(LDFLAGS) -D_GNU_SOURCE -o $@ $^ $(LIBS) -lpthread -lncurses $(TARGET_LINUX): %:%.o detect_pmcs.o $(PFMLIB) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS) $(TARGET_GEN): %:%.o $(PFMLIB) $(CC) $(CFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS) clean: @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done $(RM) -f *.o $(TARGET_LINUX) $(TARGET_GEN) $(XTRA) *~ distclean: clean install_examples: $(TARGET_LINUX) $(TARGET_GEN) install_examples: @echo installing: $(TARGET_LINUX) $(TARGET_GEN) -mkdir -p $(DESTDIR)$(EXAMPLESDIR) $(INSTALL) -m 755 $(TARGET_LINUX) $(TARGET_GEN) $(DESTDIR)$(EXAMPLESDIR) @set -e ; for d in $(DIRS) ; do $(MAKE) -C $$d $@ ; done # # examples are installed as part of the RPM install, typically in /usr/share/doc/libpfm-X.Y/ # papi-5.3.0/src/libpfm-3.y/examples_v3.x/notify_self2.c0000600003276200002170000002215312247131122022052 0ustar ralphundrgrad/* * notify_self2.c - example of how you can use overflow notifications with F_SETSIG * * Copyright (c) 2002-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies * of the Software, and to permit persons to whom the Software is furnished to do so, * subject to the following conditions: * * The above copyright notice and this permission notice shall be included in all * copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. * * This file is part of libpfm, a performance monitoring support library for * applications on Linux. */ #ifndef _GNU_SOURCE #define _GNU_SOURCE /* for getline */ #endif #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define SMPL_PERIOD 1000000000ULL static volatile unsigned long notification_received; #define NUM_PMCS PFMLIB_MAX_PMCS #define NUM_PMDS PFMLIB_MAX_PMDS static pfarg_pmr_t pdx[1]; static int ctx_fd; static char *event1_name; static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } static void warning(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); } static void sigio_handler(int n, struct siginfo *info, void *data) { pfarg_msg_t msg; int fd; int r =0; if (info == NULL) fatal_error("info is NULL\n"); fd = info->si_fd; if (fd != ctx_fd) fatal_error("handler does not get valid file descriptor\n"); if (event1_name && pfm_read(fd, 0, PFM_RW_PMD, pdx, sizeof(pdx))) fatal_error("pfm_read: %s", strerror(errno)); retry: r = read(fd, &msg, sizeof(msg)); if (r != sizeof(msg)) { if(r == -1 && errno == EINTR) { warning("read interrupted, retrying\n"); goto retry; } fatal_error("cannot read overflow message: %s\n", strerror(errno)); } if (msg.type != PFM_MSG_OVFL) fatal_error("unexpected msg type: %d\n",msg.type); /* * increment our notification counter */ notification_received++; /* * XXX: risky to do printf() in signal handler! */ if (event1_name) printf("Notification %lu: %"PRIu64" %s\n", notification_received, pdx[0].reg_value, event1_name); else printf("Notification %lu\n", notification_received); /* * And resume monitoring */ if (pfm_set_state(fd, 0, PFM_ST_RESTART)) fatal_error("pfm_set_state(restart): %d\n", errno); } /* * infinite loop waiting for notification to get out */ void busyloop(void) { /* * busy loop to burn CPU cycles */ for(;notification_received < 20;) ; } #define BPL (sizeof(uint64_t)<<3) #define LBPL 6 static inline void pfm_bv_set(uint64_t *bv, uint16_t rnum) { bv[rnum>>LBPL] |= 1UL << (rnum&(BPL-1)); } int main(int argc, char **argv) { pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfarg_pmr_t pc[NUM_PMCS]; pfarg_pmd_attr_t pd[NUM_PMDS]; pfarg_sinfo_t sif; pfmlib_options_t pfmlib_options; struct sigaction act; unsigned int i, num_counters; size_t len; int ret; /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = 1; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); /* * Install the signal handler (SIGIO) * * SA_SIGINFO required on some platforms * to get siginfo passed to handler. */ memset(&act, 0, sizeof(act)); act.sa_handler = (sig_t)sigio_handler; act.sa_flags = SA_SIGINFO; sigaction (SIGIO, &act, 0); memset(pc, 0, sizeof(pc)); memset(pd, 0, sizeof(pd)); memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); memset(&sif,0, sizeof(sif)); pfm_get_num_counters(&num_counters); if (pfm_get_cycle_event(&inp.pfp_events[0]) != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); if (pfm_get_inst_retired_event(&inp.pfp_events[1]) != PFMLIB_SUCCESS) fatal_error("cannot find inst retired event\n"); i = 2; /* * set the default privilege mode for all counters: * PFM_PLM3 : user level only */ inp.pfp_dfl_plm = PFM_PLM3; if (i > num_counters) { i = num_counters; printf("too many events provided (max=%d events), using first %d event(s)\n", num_counters, i); } inp.pfp_event_count = i; /* * how many counters we use */ if (i > 1) { pfm_get_max_event_name_len(&len); event1_name = malloc(len+1); if (event1_name == NULL) fatal_error("cannot allocate event name\n"); pfm_get_full_event_name(&inp.pfp_events[1], event1_name, len+1); } /* * now create the session for self monitoring/per-task */ ctx_fd = pfm_create(0, &sif); if (ctx_fd == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); /* * let the library figure out the values for the PMCS */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("Cannot configure events: %s\n", pfm_strerror(ret)); /* * Now prepare the argument to initialize the PMDs and PMCS. */ for (i=0; i < outp.pfp_pmc_count; i++) { pc[i].reg_num = outp.pfp_pmcs[i].reg_num; pc[i].reg_value = outp.pfp_pmcs[i].reg_value; } for (i=0; i < outp.pfp_pmd_count; i++) pd[i].reg_num = outp.pfp_pmds[i].reg_num; /* * We want to get notified when the counter used for our first * event overflows */ pd[0].reg_flags |= PFM_REGFL_OVFL_NOTIFY; if (inp.pfp_event_count > 1) { pfm_bv_set(pd[0].reg_reset_pmds, pd[1].reg_num); pdx[0].reg_num = pd[1].reg_num; } /* * we arm the first counter, such that it will overflow * after SMPL_PERIOD events have been observed */ pd[0].reg_value = - SMPL_PERIOD; pd[0].reg_long_reset = - SMPL_PERIOD; pd[0].reg_short_reset = - SMPL_PERIOD; /* * Now program the registers */ if (pfm_write(ctx_fd, 0, PFM_RW_PMC, pc, outp.pfp_pmc_count * sizeof(*pc))) fatal_error("pfm_write error errno %d\n",errno); if (pfm_write(ctx_fd, 0, PFM_RW_PMD_ATTR, pd, outp.pfp_pmd_count * sizeof(*pd))) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * we want to monitor ourself */ if (pfm_attach(ctx_fd, 0, getpid())) fatal_error("pfm_attach error errno %d\n",errno); /* * setup asynchronous notification on the file descriptor */ ret = fcntl(ctx_fd, F_SETFL, fcntl(ctx_fd, F_GETFL, 0) | O_ASYNC); if (ret == -1) fatal_error("cannot set ASYNC: %s\n", strerror(errno)); /* * get ownership of the descriptor */ ret = fcntl(ctx_fd, F_SETOWN, getpid()); if (ret == -1) fatal_error("cannot setown: %s\n", strerror(errno)); #ifndef _GNU_SOURCE #error "this program must be compiled with -D_GNU_SOURCE" #else /* * when you explicitely declare that you want a particular signal, * even with you use the default signal, the kernel will send more * information concerning the event to the signal handler. * * In particular, it will send the file descriptor from which the * event is originating which can be quite useful when monitoring * multiple tasks from a single thread. */ ret = fcntl(ctx_fd, F_SETSIG, SIGIO); if (ret == -1) fatal_error("cannot setsig: %s\n", strerror(errno)); #endif /* * Let's roll now */ if (pfm_set_state(ctx_fd, 0, PFM_ST_START)) fatal_error("pfm_set_state(start) error errno %d\n", errno); busyloop(); if (pfm_set_state(ctx_fd, 0, PFM_ST_STOP)) fatal_error("pfm_set_state(stop) error errno %d\n", errno); /* * destroy our context */ close(ctx_fd); if (event1_name) free(event1_name); return 0; } papi-5.3.0/src/libpfm-3.y/examples_v3.x/multiplex2.c0000600003276200002170000007522312247131122021562 0ustar ralphundrgrad/* * multiplex2.c - example of kernel-level time-based or overflow-based event multiplexing * * Copyright (c) 2004-2006 Hewlett-Packard Development Company, L.P. * Contributed by Stephane Eranian * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as * published by the Free Software Foundation; either version 2 of the * License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA * 02111-1307 USA */ #ifndef _GNU_SOURCE #define _GNU_SOURCE /* for getline */ #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "detect_pmcs.h" #define MAX_EVT_NAME_LEN 128 #define MULTIPLEX_VERSION "0.2" #define SMPL_FREQ_IN_HZ 100 #define NUM_PMCS 256 typedef struct { struct { int opt_plm; /* which privilege level to monitor (more than one possible) */ int opt_debug; /* print debug information */ int opt_verbose; /* verbose output */ int opt_us_format; /* print large numbers with comma for thousands */ int opt_ovfl_switch; /* overflow-based switching */ int opt_is_system; /* use system-wide */ int opt_excl_idle; /* exclude idle task */ int opt_excl_intr; /* exclude interrupts */ int opt_intr_only; /* interrupts only*/ int opt_no_cmd_out; /* redirect cmd output to /dev/null */ int opt_no_header; /* no header */ } program_opt_flags; unsigned long max_counters; /* maximum number of counter for the platform */ uint64_t smpl_freq_hz; uint64_t smpl_freq_ns; unsigned long session_timeout; uint64_t smpl_period; uint64_t clock_res; unsigned long cpu_mhz; pid_t attach_pid; int pin_cmd_cpu; int pin_cpu; } program_options_t; #define opt_plm program_opt_flags.opt_plm #define opt_debug program_opt_flags.opt_debug #define opt_verbose program_opt_flags.opt_verbose #define opt_us_format program_opt_flags.opt_us_format #define opt_ovfl_switch program_opt_flags.opt_ovfl_switch #define opt_is_system program_opt_flags.opt_is_system #define opt_excl_idle program_opt_flags.opt_excl_idle #define opt_excl_intr program_opt_flags.opt_excl_intr #define opt_intr_only program_opt_flags.opt_intr_only #define opt_no_cmd_out program_opt_flags.opt_no_cmd_out #define opt_no_header program_opt_flags.opt_no_header typedef struct _event_set_t { struct _event_set_t *next; char *event_str; unsigned int n_events; } event_set_t; typedef int pfm_ctxid_t; static program_options_t options; static pfarg_pmr_t *all_pmcs; static pfarg_pmd_attr_t *all_pmds; static pfarg_set_desc_t *all_sets; static event_set_t *all_events; static unsigned int num_pmds, num_pmcs, num_sets, total_events; static volatile int time_to_quit; static jmp_buf jbuf; static void fatal_error(char *fmt,...) __attribute__((noreturn)); static void vbprintf(char *fmt, ...) { va_list ap; if (options.opt_verbose == 0) return; va_start(ap, fmt); vprintf(fmt, ap); va_end(ap); } static void fatal_error(char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); exit(1); } /* * unreliable for CPU with variable clock speed */ static unsigned long get_cpu_speed(void) { FILE *fp1; unsigned long f1 = 0, f2 = 0; char buffer[128], *p, *value; memset(buffer, 0, sizeof(buffer)); fp1 = fopen("/proc/cpuinfo", "r"); if (fp1 == NULL) return 0; for (;;) { buffer[0] = '\0'; p = fgets(buffer, 127, fp1); if (p == NULL) break; /* skip blank lines */ if (*p == '\n') continue; p = strchr(buffer, ':'); if (p == NULL) break; /* * p+2: +1 = space, +2= firt character * strlen()-1 gets rid of \n */ *p = '\0'; value = p+2; value[strlen(value)-1] = '\0'; if (!strncasecmp("cpu MHz", buffer, 7)) { float fl; sscanf(value, "%f", &fl); f1 = lroundf(fl); break; } if (!strncasecmp("BogoMIPS", buffer, 8)) { float fl; sscanf(value, "%f", &fl); f2 = lroundf(fl); } } fclose(fp1); return f1 == 0 ? f2 : f1; } /* * pin task to CPU */ #ifndef __NR_sched_setaffinity #error "you need to define __NR_sched_setaffinity" #endif #define MAX_CPUS 2048 #define NR_CPU_BITS (MAX_CPUS>>3) int pin_cpu(pid_t pid, unsigned int cpu) { uint64_t my_mask[NR_CPU_BITS]; if (cpu >= MAX_CPUS) fatal_error("this program supports only up to %d CPUs\n", MAX_CPUS); my_mask[cpu>>6] = 1ULL << (cpu&63); return syscall(__NR_sched_setaffinity, pid, sizeof(my_mask), &my_mask); } int child(char **arg) { ptrace(PTRACE_TRACEME, 0, NULL, NULL); if (options.pin_cmd_cpu != -1) { pin_cpu(getpid(), options.pin_cmd_cpu); vbprintf("command running on CPU core %d\n", options.pin_cmd_cpu); } if (options.opt_no_cmd_out) { close(1); close(2); } execvp(arg[0], arg); /* not reached */ exit(1); } static void dec2sep(char *str2, char *str, char sep) { int i, l, b, j, c=0; l = strlen(str2); if (l <= 3) { strcpy(str, str2); return; } b = l + l /3 - (l%3 == 0); /* l%3=correction to avoid extraneous comma at the end */ for(i=l, j=0; i >= 0; i--, j++) { if (j) c++; str[b-j] = str2[i]; if (c == 3 && i>0) { str[b-++j] = sep; c = 0; } } } static void print_results(int ctxid, uint64_t *eff_timeout) { unsigned int i, j, cnt, ovfl_event; uint64_t value, tot_runs = 0; uint64_t tot_dur = 0, c; pfarg_set_info_t *all_setinfos; event_set_t *e; char *p; char tmp1[32], tmp2[32], *str; char mtotal_str[32], *mtotal; char stotal_str[32], *stotal; int ret; all_setinfos = malloc(sizeof(pfarg_set_info_t)*num_sets); if (all_setinfos == NULL) fatal_error("cannot allocate all_setinfo\n"); memset(all_setinfos, 0, sizeof(pfarg_set_info_t)*num_sets); for(i=0; i < num_sets; i++) all_setinfos[i].set_id = i; /* * read all counters in one call * * There is a limitation on the size of the argument vector and * it may be necesarry to split into multiple calls. That limit * is usally at page size (16KB) */ ret = pfm_read(ctxid, 0, PFM_RW_PMD_ATTR, all_pmds, num_pmds * sizeof(*all_pmds)); if (ret == -1) fatal_error("cannot read pmds: %s\n", strerror(errno)); /* * extract all set information * * There is a limitation on the size of the argument vector and * it may be necesarry to split into multiple calls. That limit * is usually at page size (16KB) */ ret = pfm_getinfo_sets(ctxid, 0, all_setinfos, num_sets * sizeof(*all_setinfos)); if (ret == -1) fatal_error("cannot get set info: %s\n", strerror(errno)); /* * compute average number of runs * * the number of runs per set can be at most off by 1 between all sets */ for (i=0, cnt = 0; i < num_sets; i++) { if (all_setinfos[i].set_runs == 0) fatal_error("not enough runs to collect meaningful results: set%u did not run\n", i); tot_runs += all_setinfos[i].set_runs; tot_dur += all_setinfos[i].set_duration; } /* * print the results * * It is important to realize, that the first event we specified may not * be in PMD4. Not all events can be measured by any monitor. That's why * we need to use the pc[] array to figure out where event i was allocated. * */ if (options.opt_no_header == 0) { printf("# %.2fHz period = %"PRIu64"nsecs\n# %"PRIu64" cycles @ %lu MHz\n", 1000000000.0 / options.smpl_freq_ns, options.smpl_freq_ns, options.smpl_period, options.cpu_mhz); if (options.opt_ovfl_switch == 0) printf("# using time-based multiplexing\n" "# %"PRIu64" nsecs effective switch timeout\n", *eff_timeout); else printf("# using overflow-based multiplexing\n"); if (options.opt_is_system) printf("# system-wide mode on CPU core %d\n",options.pin_cpu); printf("# %d sets\n", num_sets); printf("# %.2f average run per set\n", (double)tot_runs/num_sets); printf("# %.2f average ns per set\n", (double)tot_dur/num_sets); printf("# set measured total #runs scaled total event name\n"); printf("# ------------------------------------------------------------------\n"); } ovfl_event = options.opt_ovfl_switch ? 1 : 0; for (i=0, e = all_events, cnt = 0; i < num_sets; i++, e = e->next) { str = e->event_str; for(j=0; j < e->n_events-ovfl_event; j++, cnt++) { value = all_pmds[cnt].reg_value; sprintf(tmp1, "%"PRIu64, value); if (options.opt_us_format) { dec2sep(tmp1, mtotal_str, ','); } else { strcpy(mtotal_str, tmp1); } mtotal = mtotal_str; /* * scaling * We use duration rather than number of runs to compute a more precise * scaled value. This avoids overcounting when the last set only partially * ran. * * We use double to avoid overflowing of the 64-bit count in case of very * large total duration */ c = llround(((double)value*tot_dur)/(double)all_setinfos[i].set_duration); sprintf(tmp2, "%"PRIu64, c); if (options.opt_us_format) { dec2sep(tmp2, stotal_str, ','); } else { strcpy(stotal_str, tmp2); } stotal = stotal_str; printf(" %03d %20s %8"PRIu64" %20s %s\n", i, mtotal, all_setinfos[i].set_runs, stotal, str); p = strchr(str, '\0'); if (p) str = p+1; } /* * skip first event */ if (options.opt_ovfl_switch) cnt++; } } static void sigintr_handler(int sig) { if (sig == SIGALRM) time_to_quit = 1; else time_to_quit = 2; longjmp(jbuf, 1); } static int measure_one_task(char **argv) { int ctxid; pfarg_sinfo_t sif; pfarg_set_desc_t *my_sets; pfarg_pmr_t *my_pmcs; pfarg_pmd_attr_t *my_pmds; uint64_t eff_timeout; pfarg_msg_t msg; pid_t pid; int status, ret; memset(&sif, 0, sizeof(sif)); my_pmcs = malloc(sizeof(pfarg_pmr_t)*num_pmcs); my_pmds = malloc(sizeof(pfarg_pmd_attr_t)*num_pmds); my_sets = malloc(sizeof(pfarg_set_desc_t)*num_sets); if (my_pmcs == NULL || my_pmds == NULL || my_sets == NULL) fatal_error("cannot allocate event tables\n"); /* * make private copies */ memcpy(my_pmcs, all_pmcs, sizeof(pfarg_pmr_t)*num_pmcs); memcpy(my_pmds, all_pmds, sizeof(pfarg_pmd_attr_t)*num_pmds); memcpy(my_sets, all_sets, sizeof(pfarg_set_desc_t)*num_sets); /* * create the context */ ctxid = pfm_create(0, &sif); if (ctxid == -1 ) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * set close-on-exec to ensure we will be getting the PFM_END_MSG, i.e., * fd not visible to child. */ if (fcntl(ctxid, F_SETFD, FD_CLOEXEC)) fatal_error("cannot set CLOEXEC: %s\n", strerror(errno)); /* * create the event sets * * event set 0 is always exist by default for backward compatibility * reason. However to avoid special casing set0 for creation, a PFM_CREATE_EVTSETS * for set0 does not complain and behaves as a PFM_CHANGE_EVTSETS */ vbprintf("requested timeout %"PRIu64" nsecs\n", my_sets[0].set_timeout); if (pfm_create_sets(ctxid, 0, my_sets, num_sets * sizeof(*my_sets))) fatal_error("cannot create sets\n"); eff_timeout = my_sets[0].set_timeout; vbprintf("effective timeout %"PRIu64" nsecs\n", my_sets[0].set_timeout); /* * Now program the all the registers in one call * * Note that there is a limitation on the size of the argument vector * that can be passed. It is usually set to a page size (16KB). */ if (pfm_write(ctxid, 0, PFM_RW_PMC, my_pmcs, num_pmcs * sizeof(*my_pmcs)) == -1) fatal_error("pfm_write error errno %d\n",errno); /* * initialize the PMD registers. * * Can use global pma because they are used read-only */ if (pfm_write(ctxid, 0, PFM_RW_PMD_ATTR, my_pmds, num_pmds * sizeof(*my_pmds)) == -1) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * now launch the child code */ if (options.attach_pid == 0) { if ((pid= fork()) == -1) fatal_error("Cannot fork process\n"); if (pid == 0) exit(child(argv)); } else { pid = options.attach_pid; ret = ptrace(PTRACE_ATTACH, pid, NULL, 0); if (ret) { fatal_error("cannot attach to task %d: %s\n",options.attach_pid, strerror(errno)); } } ret = waitpid(pid, &status, WUNTRACED); if (ret < 0 || WIFEXITED(status)) fatal_error("error command already terminated, exit code %d\n", WEXITSTATUS(status)); vbprintf("child created and stopped\n"); /* * now attach session */ if (pfm_attach(ctxid, 0, pid) == -1) fatal_error("pfm_attach error errno %d\n",errno); /* * start monitoring */ if (pfm_set_state(ctxid, 0, PFM_ST_START) == -1) fatal_error("pfm_set_state(start) error errno %d\n",errno); ptrace(PTRACE_DETACH, pid, NULL, 0); vbprintf("child restarted\n"); if (setjmp(jbuf) == 1) { if (time_to_quit == 1) { printf("timeout expired\n"); } if (time_to_quit == 2) printf("session interrupted\n"); goto finish_line; } signal(SIGALRM, sigintr_handler); signal(SIGINT, sigintr_handler); if (options.session_timeout) { printf("\n", options.session_timeout); alarm(options.session_timeout); } /* * mainloop */ ret = read(ctxid, &msg, sizeof(msg)); if (ret < sizeof(msg)) fatal_error("interrupted read\n"); switch(msg.type) { case PFM_MSG_OVFL: fatal_error("unexpected ovfl message\n"); break; case PFM_MSG_END: break; default: printf("unknown message type %d\n", msg.type); } finish_line: /* * cleanup after an alarm timeout */ if (time_to_quit) { /* stop monitored task */ ptrace(PTRACE_ATTACH, pid, NULL, 0); waitpid(pid, NULL, WUNTRACED); /* detach context */ pfm_attach(ctxid, 0, PFM_NO_TARGET); } if (options.attach_pid == 0) { kill(pid, SIGKILL); waitpid(pid, &status, 0); } else { ptrace(PTRACE_DETACH, pid, NULL, 0); } if (time_to_quit < 2) print_results(ctxid, &eff_timeout); close(ctxid); return 0; } static int measure_one_cpu(char **argv) { int ctxid, status; pfarg_pmr_t *my_pmcs; pfarg_pmd_attr_t *my_pmds; pfarg_set_desc_t *my_sets; pfarg_sinfo_t sif; pid_t pid = 0; int ret; memset(&sif, 0, sizeof(sif)); my_pmcs = malloc(sizeof(pfarg_pmr_t)*total_events); my_pmds = malloc(sizeof(pfarg_pmd_attr_t)*total_events); my_sets = malloc(sizeof(pfarg_set_desc_t)*num_sets); if (my_pmcs == NULL || my_pmds == NULL || my_sets == NULL) fatal_error("cannot allocate event tables\n"); /* * make private copies */ memcpy(my_pmcs, all_pmcs, sizeof(pfarg_pmr_t)*num_pmcs); memcpy(my_pmds, all_pmds, sizeof(pfarg_pmd_attr_t)*num_pmds); memcpy(my_sets, all_sets, sizeof(pfarg_set_desc_t)*num_sets); if (options.pin_cpu == -1) { options.pin_cpu = 0; printf("forcing monitoring onto CPU core 0\n"); pin_cpu(getpid(), 0); } /* * create session */ ctxid = pfm_create(PFM_FL_SYSTEM_WIDE, &sif); if (ctxid == -1) { if (errno == ENOSYS) { fatal_error("Your kernel does not have performance monitoring support!\n"); } fatal_error("cannot create session %s\n", strerror(errno)); } /* * set close-on-exec to ensure we will be getting the PFM_END_MSG, i.e., * fd not visible to child. */ if (fcntl(ctxid, F_SETFD, FD_CLOEXEC)) fatal_error("cannot set CLOEXEC: %s\n", strerror(errno)); /* * create the event sets * * event set 0 is always created by default for backward compatibility * reason. However to avoid special casing set0 for creation, a PFM_CREATE_EVTSETS * for set0 does not complain and behaves as a PFM_CHANGE_EVTSETS */ if (pfm_create_sets(ctxid, 0, my_sets, num_sets * sizeof(*my_sets))) fatal_error("cannot create sets\n"); /* * Now program the all the registers in one call * * Note that there is a limitation on the size of the argument vector * that can be passed. It is usually set to a page size (16KB). */ if (pfm_write(ctxid, 0, PFM_RW_PMC, my_pmcs, num_pmcs * sizeof(*my_pmcs)) == -1) fatal_error("pfm_write error errno %d\n",errno); /* * initialize the PMD registers. * * We use all_Pmas because they are not modified, i.e., read-only */ if (pfm_write(ctxid, 0, PFM_RW_PMD_ATTR, my_pmds, num_pmds * sizeof(*my_pmds)) == -1) fatal_error("pfm_write(PMD) error errno %d\n",errno); /* * now launch the child code */ if (*argv) { if ((pid = fork()) == -1) fatal_error("Cannot fork process\n"); if (pid == 0) exit(child(argv)); } /* * wait for the child to exec or be stopped * We do this even in system-wide mode to ensure * that the task does not start until we are ready * to monitor. */ if (pid) { ret = waitpid(pid, &status, WUNTRACED); if (ret < 0 || WIFEXITED(status)) fatal_error("error command already terminated, exit code %d\n", WEXITSTATUS(status)); vbprintf("child created and stopped\n"); } /* * now attach the context */ if (pfm_attach(ctxid, 0, options.pin_cpu) == -1) fatal_error("pfm_attach error errno %d\n",errno); /* * start monitoring */ if (pfm_set_state(ctxid, 0, PFM_ST_START) == -1) fatal_error("pfm_set_state(start) error errno %d\n",errno); if (pid) ptrace(PTRACE_DETACH, pid, NULL, 0); if (pid == 0) { if (options.session_timeout == 0) { printf("\n"); getchar(); } else { printf("\n", options.session_timeout); sleep(options.session_timeout); } } else { ret = waitpid(pid, &status, 0); } print_results(ctxid, &my_sets[0].set_timeout); if (ctxid) close(ctxid); return 0; } int mainloop(char **argv) { event_set_t *e; pfarg_sinfo_t sif; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfmlib_regmask_t impl_counters, used_pmcs; pfmlib_event_t cycle_event; unsigned int i, j; char *p, *str; int ret; unsigned int max_counters, allowed_counters; pfm_get_num_counters(&max_counters); if (max_counters < 2 && options.opt_ovfl_switch) fatal_error("not enough counter to get overflow switching to work\n"); allowed_counters = max_counters; /* * account for overflow counter (cpu cycles) */ if (options.opt_ovfl_switch) allowed_counters--; memset(&used_pmcs, 0, sizeof(used_pmcs)); memset(&impl_counters, 0, sizeof(impl_counters)); pfm_get_impl_counters(&impl_counters); options.smpl_period = (options.cpu_mhz*1000000)/options.smpl_freq_hz; vbprintf("%"PRIu64"Hz period = %"PRIu64" cycles @ %luMhz\n", options.smpl_freq_hz, options.smpl_period, options.cpu_mhz); for (e = all_events; e; e = e->next) { for (p = str = e->event_str; p ; ) { p = strchr(str, ','); if (p) str = p +1; total_events++; } } /* * account for extra event per set (cycle event) */ if (options.opt_ovfl_switch) { total_events += num_sets; /* * look for our trigger event */ if (pfm_get_cycle_event(&cycle_event) != PFMLIB_SUCCESS) fatal_error("Cannot find cycle event\n"); } vbprintf("total_events=%u\n", total_events); /* * assumes number of pmds = number of events * cannot assume number of pmcs = num of events (e.g., P4 2 PMCS per event) */ all_pmcs = calloc(NUM_PMCS, sizeof(pfarg_pmr_t)); all_pmds = calloc(total_events, sizeof(pfarg_pmd_attr_t)); all_sets = calloc(num_sets, sizeof(pfarg_set_desc_t)); if (all_pmcs == NULL || all_pmds == NULL || all_sets == NULL) fatal_error("cannot allocate event tables\n"); /* * use the library to figure out assignments for all events of all sets */ for (i=0, e = all_events; i < num_sets; i++, e = e->next) { memset(&inp,0, sizeof(inp)); memset(&outp,0, sizeof(outp)); /* * build the pfp_unavail_pmcs bitmask by looking * at what perfmon has available. It is not always * the case that all PMU registers are actually available * to applications. For instance, on IA-32 platforms, some * registers may be reserved for the NMI watchdog timer. * * With this bitmap, the library knows which registers NOT to * use. Of source, it is possible that no valid assignement may * be possible if certina PMU registers are not available. */ get_sif(options.opt_is_system? PFM_FL_SYSTEM_WIDE: 0, &sif); detect_unavail_pmu_regs(&sif, &inp.pfp_unavail_pmcs, NULL); str = e->event_str; for(j=0, p = str; p && j < allowed_counters; j++) { p = strchr(str, ','); if (p) *p = '\0'; ret = pfm_find_full_event(str, &inp.pfp_events[j]); if (ret != PFMLIB_SUCCESS) fatal_error("event %s for set %d event %d: %s\n", str, i, j, pfm_strerror(ret)); if (p) str = p + 1; } if (p) { fatal_error("error in set %d: cannot have more than %d event(s) per set %s\n", i, allowed_counters, options.opt_ovfl_switch ? "(overflow switch mode)": "(hardware limit)"); } /* * add the cycle event as the last event when we switch on overflow */ if (options.opt_ovfl_switch) { inp.pfp_events[j] = cycle_event; inp.pfp_event_count = j+1; inp.pfp_dfl_plm = options.opt_plm; e->n_events = j+1; } else { e->n_events = j; inp.pfp_event_count = j; } inp.pfp_dfl_plm = options.opt_plm; if (options.opt_is_system) inp.pfp_flags = PFMLIB_PFP_SYSTEMWIDE; vbprintf("PMU programming for set %d\n", i); /* * let the library do the hard work */ if ((ret=pfm_dispatch_events(&inp, NULL, &outp, NULL)) != PFMLIB_SUCCESS) fatal_error("cannot configure events for set %d: %s\n", i, pfm_strerror(ret)); /* * propagate from libpfm to kernel data structures */ for (j=0; j < outp.pfp_pmc_count; j++, num_pmcs++) { all_pmcs[num_pmcs].reg_num = outp.pfp_pmcs[j].reg_num; all_pmcs[num_pmcs].reg_value = outp.pfp_pmcs[j].reg_value; all_pmcs[num_pmcs].reg_set = i; } for (j=0; j < outp.pfp_pmd_count; j++, num_pmds++) { all_pmds[num_pmds].reg_num = outp.pfp_pmds[j].reg_num; all_pmds[num_pmds].reg_set = i; } /* * setup event set properties */ all_sets[i].set_id = i; if (options.opt_ovfl_switch) { all_sets[i].set_flags = PFM_SETFL_OVFL_SWITCH; /* * last counter contains our sampling counter * * the first overflow of our trigger counter does * trigger a switch. */ all_pmds[num_pmds-1].reg_ovfl_swcnt = 1; /* * We do this even in system-wide mode to ensure * that the task does not start until we are ready * to monitor. * setup the sampling period */ all_pmds[num_pmds-1].reg_value = - options.smpl_period; all_pmds[num_pmds-1].reg_short_reset = - options.smpl_period; all_pmds[num_pmds-1].reg_long_reset = - options.smpl_period; } else { /* * setup the switch timeout (in nanoseconds) * Note that the actual timeout may be bigger than requested * due to timer tick granularity. It is always advised to * check the set_timeout value upon return from set creation. * The structure will by then contain the actual timeout. */ all_sets[i].set_flags = PFM_SETFL_TIME_SWITCH; all_sets[i].set_timeout = options.smpl_freq_ns; } #ifdef __ia64__ if (options.opt_excl_intr && options.opt_is_system) all_sets[i].set_flags |= PFM_ITA_SETFL_EXCL_INTR; if (options.opt_intr_only && options.opt_is_system) all_sets[i].set_flags |= PFM_ITA_SETFL_INTR_ONLY; #endif } if (options.opt_is_system) return measure_one_cpu(argv); return measure_one_task(argv); } static struct option multiplex_options[]={ { "help", 0, 0, 1}, { "freq", 1, 0, 2 }, { "kernel-level", 0, 0, 3 }, { "user-level", 0, 0, 4 }, { "version", 0, 0, 5 }, { "set", 1, 0, 6 }, { "session-timeout", 1, 0, 7 }, { "attach-task", 1, 0, 8 }, { "pin-cmd", 1, 0, 9 }, { "cpu", 1, 0, 10 }, { "verbose", 0, &options.opt_verbose, 1 }, { "debug", 0, &options.opt_debug, 1 }, { "us-counter-format", 0, &options.opt_us_format, 1}, { "ovfl-switch", 0, &options.opt_ovfl_switch, 1}, { "system-wide", 0, &options.opt_is_system, 1}, #ifdef __ia64__ { "excl-intr", 0, &options.opt_excl_intr, 1}, { "intr-only", 0, &options.opt_intr_only, 1}, #endif { "no-cmd-output", 0, &options.opt_no_cmd_out, 1}, { "no-header", 0, &options.opt_no_header, 1}, { 0, 0, 0, 0} }; static void generate_default_sets(void) { event_set_t *es, *tail = NULL; pfmlib_event_t events[2]; size_t len; char *name; unsigned int i; int ret; ret = pfm_get_cycle_event(&events[0]); if (ret != PFMLIB_SUCCESS) fatal_error("cannot find cycle event\n"); ret = pfm_get_inst_retired_event(&events[1]); if (ret != PFMLIB_SUCCESS) fatal_error("cannot find instruction retired event\n"); pfm_get_max_event_name_len(&len); for (i=0; i < 2; i++) { name = malloc(len+1); if (name == NULL) fatal_error("cannot allocate space for event name\n"); pfm_get_full_event_name(events+i, name, len+1); es = (event_set_t *)malloc(sizeof(event_set_t)); if (es == NULL) fatal_error("cannot allocate new event set\n"); memset(es, 0, sizeof(*es)); es->event_str = name; es->next = NULL; es->n_events = 0; if (all_events == NULL) all_events = es; else tail->next = es; tail = es; } num_sets = i; } static void print_usage(char **argv) { printf("usage: %s [OPTIONS]... COMMAND\n", argv[0]); printf( "-h, --help\t\t\t\tdisplay this help and exit\n" "-V, --version\t\t\t\toutput version information and exit\n" "-u, --user-level\t\t\tmonitor at the user level for all events\n" "-k, --kernel-level\t\t\tmonitor at the kernel level for all events\n" "-c, --us-counter-format\t\t\tprint large counts with comma for thousands\n" "-p pid, --attach-task pid\t\tattach to a running task\n" "--set=ev1[,ev2,ev3,ev4,...]\t\tdescribe one set\n" "--freq=number\t\t\t\tset set switching frequency in Hz\n" "-c cpu, --cpu=cpu\t\t\tCPU to use for system-wide [default current]\n" "--ovfl-switch\t\t\t\tuse overflow based multiplexing (default: time-based)\n" "--verbose\t\t\t\tprint more information during execution\n" "--system-wide\t\t\t\tuse system-wide (only one CPU at a time)\n" "--excl-idle\t\t\t\texclude idle task(system-wide only)\n" "--excl-intr\t\t\t\texclude interrupt triggered execution(system-wide only)\n" "--intr-only\t\t\t\tinclude only interrupt triggered execution(system-wide only)\n" "--session-timeout=sec\t\t\tsession timeout in seconds (system-wide only)\n" "--no-cmd-output\t\t\t\toutput of executed command redirected to /dev/null\n" "--pin-cmd=cpu\t\t\t\tpin executed command onto a specific cpu\n" ); } int main(int argc, char **argv) { char *endptr = NULL; pfmlib_options_t pfmlib_options; event_set_t *tail = NULL, *es; unsigned long long_val; struct timespec ts; uint64_t f_ns, d, f_final; int c, ret; options.pin_cmd_cpu = options.pin_cpu = -1; while ((c=getopt_long(argc, argv,"+vhkuVct:p:", multiplex_options, 0)) != -1) { switch(c) { case 0: continue; /* fast path for options */ case 'h': case 1: print_usage(argv); exit(0); case 'v': options.opt_verbose = 1; break; case 'c': options.opt_us_format = 1; break; case 2: if (options.smpl_freq_hz) fatal_error("sampling frequency set twice\n"); options.smpl_freq_hz = strtoull(optarg, &endptr, 10); if (*endptr != '\0') fatal_error("invalid frequency: %s\n", optarg); break; case 3: case 'k': options.opt_plm |= PFM_PLM0; break; case 4: case 'u': options.opt_plm |= PFM_PLM3; break; case 'V': case 5: printf("multiplex version " MULTIPLEX_VERSION " Date: " __DATE__ "\n" "Copyright (C) 2004 Hewlett-Packard Company\n"); exit(0); case 6: es = (event_set_t *)malloc(sizeof(event_set_t)); if (es == NULL) fatal_error("cannot allocate new event set\n"); es->event_str = optarg; es->next = NULL; es->n_events = 0; if (all_events == NULL) all_events = es; else tail->next = es; tail = es; num_sets++; break; case 't': case 7: if (options.session_timeout) fatal_error("too many timeouts\n"); if (*optarg == '\0') fatal_error("--session-timeout needs an argument\n"); long_val = strtoul(optarg,&endptr, 10); if (*endptr != '\0') fatal_error("invalid number of seconds for timeout: %s\n", optarg); if (long_val >= UINT_MAX) fatal_error("timeout is too big, must be < %u\n", UINT_MAX); options.session_timeout = (unsigned int)long_val; break; case 'p': case 8: if (options.attach_pid) fatal_error("process to attach specified twice\n"); options.attach_pid = (pid_t)atoi(optarg); break; case 9: if (options.pin_cmd_cpu != -1) fatal_error("cannot pin command twice\n"); options.pin_cmd_cpu = atoi(optarg); break; case 10: if (options.pin_cpu != -1) fatal_error("cannot pin to more than one cpu\n"); options.pin_cpu = atoi(optarg); break; default: fatal_error(""); /* just quit silently now */ } } if (optind == argc && options.opt_is_system == 0 && options.attach_pid == 0) fatal_error("you need to specify a command to measure\n"); /* * pass options to library (optional) */ memset(&pfmlib_options, 0, sizeof(pfmlib_options)); pfmlib_options.pfm_debug = 0; /* set to 1 for debug */ pfmlib_options.pfm_verbose = options.opt_verbose; /* set to 1 for verbose */ pfm_set_options(&pfmlib_options); /* * Initialize pfm library (required before we can use it) */ ret = pfm_initialize(); if (ret != PFMLIB_SUCCESS) fatal_error("Cannot initialize library: %s\n", pfm_strerror(ret)); if ((options.cpu_mhz = get_cpu_speed()) == 0) fatal_error("can't get CPU speed\n"); /* * extract kernel clock resolution */ clock_getres(CLOCK_MONOTONIC, &ts); options.clock_res = ts.tv_sec * 1000000000 + ts.tv_nsec; /* * adjust frequency to be a multiple of clock resolution * otherwise kernel will fail pfm_create_evtsets() */ /* * f_ns = run period in ns (1s/hz) * default switch period is clock resolution */ if (options.smpl_freq_hz == 0) f_ns = options.clock_res; else f_ns = 1000000000 / options.smpl_freq_hz; /* round up period in nanoseconds */ d = (f_ns+options.clock_res-1) / options.clock_res; /* final period (multilple of clock_res */ f_final = d * options.clock_res; if (options.opt_ovfl_switch) printf("clock_res=%"PRIu64"ns(%.2fHz) ask period=%"PRIu64"ns(%.2fHz) get period=%"PRIu64"ns(%.2fHz)\n", options.clock_res, 1000000000.0 / options.clock_res, f_ns, 1000000000.0 / f_ns, f_final, 1000000000.0 / f_final); if (f_ns != f_final) printf("Not getting the expected frequency due to kernel/hw limitation\n"); /* adjust period */ options.smpl_freq_ns = f_final; /* not used */ options.smpl_freq_hz = 1000000000 / f_final; if (options.opt_plm == 0) options.opt_plm = PFM_PLM3; if (num_sets == 0) generate_default_sets(); return mainloop(argv+optind); } papi-5.3.0/src/libpfm-3.y/examples_v3.x/pfmsetup.c0000600003276200002170000013332712247131122021320 0ustar ralphundrgrad/* * (C) Copyright IBM Corp. 2006 * Contributed by Kevin Corry * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sellcopies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER * DEALINGS IN THE SOFTWARE. * * * pfmsetup * * Very simple command-line tool to drive the perfmon2 kernel API. Inspired * by the dmsetup tool from device-mapper. * * Compile with: * gcc -Wall -o pfmsetup pfmsetup.c -lpfm * * Run with: * pfmsetup * * Available commands for the command_file: * * create_context [options] * Create a new context for accessing the performance counters. Each new * context automatically gets one event-set with an ID of 0. * - options: --system * --no-overflow-msg * --block-on-notify * --sampler * - : specify an integer that you want to associate with * the new context for use in other commands. * * load_context * Attach the specified context and event-set to the specified program. * - : ID that you specified when creating the context. * - : ID that you specified when creating an event-set * within the given context. All contexts automatically * have an event-set with ID of 0. * - : ID that you specified when starting a program * with the run_program command, or the number of * the CPU to attach to for system-wide mode. * * unload_context * Detach the specified context from the program that it's currently * attached to. * - : ID that you specified when creating the context. * * close_context * Clean up the specified context. After this call, the context_id will no * longer be valid. * - : ID that you specified when creating the context. * * write_pmc < >+ * Write one or more control register values. * - : ID that you specified when creating the context. * - : ID that you specified when creating an event-set * within the given context. All contexts automatically * have an event-set with ID of 0. * - : ID of the desired control register. See the register * mappings in the Perfmon kernel code to determine which * PMC represents the control register you're interested in. * - : Value to write into the specified PMC. You need to know * the exact numeric value - no translations are done from * event names or masks. Multiple PMC id/value pairs can * be given in one write_pmc command. * * write_pmd < >+ * Write one or more data register values. * - : ID that you specified when creating the context. * - : ID that you specified when creating an event-set * within the given context. All contexts automatically * have an event-set with ID of 0. * - : ID of the desired data register. See the register * mappings in the Perfmon kernel code to determine which * PMD represents the control register you're interested in. * - : Value to write into the specified PMD. Multiple PMD * id/value pairs can be given in one write_pmd command. * * read_pmd + * Read one or more data register values. * - : ID that you specified when creating the context. * - : ID that you specified when creating an event-set * within the given context. All contexts automatically * have an event-set with ID of 0. * - : ID of the desired data register. See the register * mappings in the Perfmon kernel code to determine which * PMD represents the control register you're interested in. * Multiple PMD IDs can be given in one read_pmd command. * * start_counting * Start counting using the specified context and event-set. * - : ID that you specified when creating the context. * - : ID that you specified when creating an event-set * within the given context. All contexts automatically * have an event-set with ID of 0. * * stop_counting * Stop counting on the specified context. * - : ID that you specified when creating the context. * * restart_counting * Restart counting on the specified context. * - : ID that you specified when creating the context. * * create_eventset [options] * Create a new event-set for an existing context. * - options: --next-set * --timeout * --switch-on-overflow * --exclude-idle * - : ID that you specified when creating the context. * - : specify an integer that you want to associate with * the new event-set for use in other commands. * * delete_eventset * Delete an existing event-set from an existing context. * - : ID that you specified when creating the context. * - : ID that you specified when creating the event-set. * * getinfo_eventset * Display information about an event-set. * - : ID that you specified when creating the context. * - : ID that you specified when creating the event-set. * * run_program * First step in starting a program to monitor. In order to allow time to * set up the counters to monitor the program, this command only forks a * child process. It then suspends itself using ptrace. You must call the * resume_program command to wake up the new child process and exec the * desired program. * - : Specify an integer that you want to associate with * the program for use in other commands. * - : Specify the program and its arguments * exactly as you would on the command * line. * * resume_program * When a program is 'run', a child process is forked, but the child is * ptrace'd before exec'ing the specified program. This gives you time to * do any necessary setup to monitor the program. This resume_program * command wakes up the child process and finishes exec'ing the desired * program. If a context has been loaded and started for this program, * then the counters will have actually started following this command. * - : ID that you specified when starting the program. * * wait_on_program * Wait for a program to complete and exit. After this call, the program_id * will no longer be valid. * - : ID that you specified when starting the program. * * sleep #include #include #include #include #include #include #include #include #include #include #include #define FALSE 0 #define TRUE 1 #define WHITESPACE " \t\n" #define MAX_TOKENS 32 #define PFMSETUP_NAME "pfmsetup" #define USAGE(f, x...) printf(PFMSETUP_NAME ": USAGE: " f "\n" , ## x) #define LOG_ERROR(f, x...) printf(PFMSETUP_NAME ": Error: %s: " f "\n", __FUNCTION__ , ## x) #define LOG_INFO(f, x...) printf(PFMSETUP_NAME ": " f "\n" , ## x) typedef int (*command_fn)(int argc, char **argv); struct command { const char *full_name; const char *short_name; const char *help; command_fn fn; int min_args; }; struct context { int id; int fd; int cpu; uint32_t ctx_flags; pfm_dfl_smpl_arg_t smpl_arg; struct event_set *event_sets; struct context *next; }; struct event_set { int id; struct event_set *next; }; struct program { int id; pid_t pid; struct program *next; }; /* Global list of all contexts that have been created. List is ordered by * context id. Each context contains a list of event-sets belonging to that * context, which is ordered by event-set id. */ static struct context *contexts = NULL; /* Global list of all programs that have been started. * List is ordered by program id. */ static struct program *programs = NULL; /* * Routines to manipulate the context, event-set, and program lists. */ static struct context *find_context(int ctx_id) { struct context *ctx; for (ctx = contexts; ctx; ctx = ctx->next) { if (ctx->id == ctx_id) { break; } } return ctx; } static void insert_context(struct context *ctx) { struct context **next_ctx; for (next_ctx = &contexts; *next_ctx && (*next_ctx)->id < ctx->id; next_ctx = &((*next_ctx)->next)) { ; } ctx->next = *next_ctx; *next_ctx = ctx; } static void remove_context(struct context *ctx) { struct context **next_ctx; for (next_ctx = &contexts; *next_ctx; next_ctx = &((*next_ctx)->next)) { if (*next_ctx == ctx) { *next_ctx = ctx->next; break; } } } static struct event_set *find_event_set(struct context *ctx, int event_set_id) { struct event_set *evt; for (evt = ctx->event_sets; evt; evt = evt->next) { if (evt->id == event_set_id) { break; } } return evt; } static void insert_event_set(struct context *ctx, struct event_set *evt) { struct event_set **next_evt; for (next_evt = &ctx->event_sets; *next_evt && (*next_evt)->id < evt->id; next_evt = &((*next_evt)->next)) { ; } evt->next = *next_evt; *next_evt = evt; } static struct program *find_program(int program_id) { struct program *prog; for (prog = programs; prog; prog = prog->next) { if (prog->id == program_id) { break; } } return prog; } static void insert_program(struct program *prog) { struct program **next_prog; for (next_prog = &programs; *next_prog && (*next_prog)->id < prog->id; next_prog = &((*next_prog)->next)) { ; } prog->next = *next_prog; *next_prog = prog; } static void remove_program(struct program *prog) { struct program **next_prog; for (next_prog = &programs; *next_prog; next_prog = &((*next_prog)->next)) { if (*next_prog == prog) { *next_prog = prog->next; break; } } } /** * set_affinity * * When loading or unloading a system-wide context, we must pin the pfmsetup * process to that CPU before making the system call. Also, get the current * affinity and return it to the caller so we can change it back later. **/ static int set_affinity(int cpu, cpu_set_t *old_cpu_set) { cpu_set_t new_cpu_set; int rc; rc = sched_getaffinity(0, sizeof(*old_cpu_set), old_cpu_set); if (rc) { rc = errno; LOG_ERROR("Can't get current process affinity mask: %d\n", rc); return rc; } CPU_ZERO(&new_cpu_set); CPU_SET(cpu, &new_cpu_set); rc = sched_setaffinity(0, sizeof(new_cpu_set), &new_cpu_set); if (rc) { rc = errno; LOG_ERROR("Can't set process affinity to CPU %d: %d\n", cpu, rc); return rc; } return 0; } /** * revert_affinity * * Reset the process affinity to the specified mask. **/ static void revert_affinity(cpu_set_t *old_cpu_set) { int rc; rc = sched_setaffinity(0, sizeof(*old_cpu_set), old_cpu_set); if (rc) { /* Not a fatal error if we can't reset the affinity. */ LOG_INFO("Can't revert process affinity to original value.\n"); } } /** * create_context * * Arguments: [options] * Options: --system * --no-overflow-msg * --block-on-notify * --sampler * * Call the pfm_create_context system-call to create a new perfmon context. * Add a new entry to the global 'contexts' list. **/ static int create_context(int argc, char **argv) { pfm_dfl_smpl_arg_t smpl_arg; struct context *new_ctx = NULL; char *sampler_name = NULL; void *smpl_p; int no_overflow_msg = FALSE; int block_on_notify = FALSE; int system_wide = FALSE; int c, ctx_id = 0; int rc; uint32_t ctx_flags; size_t sz; struct option long_opts[] = { {"sampler", required_argument, NULL, 1}, {"system", no_argument, NULL, 2}, {"no-overflow-msg", no_argument, NULL, 3}, {"block-on-notify", no_argument, NULL, 4}, {NULL, 0, NULL, 0} }; ctx_flags = 0; opterr = 0; optind = 0; while ((c = getopt_long_only(argc, argv, "", long_opts, NULL)) != EOF) { switch (c) { case 1: sampler_name = optarg; break; case 2: system_wide = TRUE; break; case 3: no_overflow_msg = TRUE; break; case 4: block_on_notify = TRUE; break; default: LOG_ERROR("invalid option: %c", optopt); rc = EINVAL; goto error; } } if (argc < optind + 1) { USAGE("create_context [options] "); rc = EINVAL; goto error; } ctx_id = strtoul(argv[optind], NULL, 0); if (ctx_id <= 0) { LOG_ERROR("Invalid context ID (%s). Must be a positive " "integer.", argv[optind]); rc = EINVAL; goto error; } /* Make sure we don't already have a context with this ID. */ new_ctx = find_context(ctx_id); if (new_ctx) { LOG_ERROR("Context with ID %d already exists.", ctx_id); rc = EINVAL; goto error; } if (sampler_name) { smpl_arg.buf_size = getpagesize(); smpl_p = &smpl_arg; sz = sizeof(smpl_arg); } else { smpl_p = NULL; sz = 0; } ctx_flags = (system_wide ? PFM_FL_SYSTEM_WIDE : 0) | (no_overflow_msg ? PFM_FL_OVFL_NO_MSG : 0) | (block_on_notify ? PFM_FL_NOTIFY_BLOCK : 0); if (sampler_name) ctx_flags |= PFM_FL_SMPL_FMT; rc = pfm_create(ctx_flags, NULL, sampler_name, smpl_p, sz); if (rc == -1) { rc = errno; LOG_ERROR("pfm_create_context system call returned " "an error: %d.", rc); goto error; } /* Allocate and initialize a new context structure and add it to the * global list. Every new context automatically gets one event_set * with an event ID of 0. */ new_ctx = calloc(1, sizeof(*new_ctx)); if (!new_ctx) { LOG_ERROR("Can't allocate structure for new context %d.", ctx_id); rc = ENOMEM; goto error; } new_ctx->event_sets = calloc(1, sizeof(*(new_ctx->event_sets))); if (!new_ctx->event_sets) { LOG_ERROR("Can't allocate event-set structure for new " "context %d.", ctx_id); rc = ENOMEM; goto error; } new_ctx->id = ctx_id; new_ctx->fd = rc; new_ctx->cpu = -1; new_ctx->ctx_flags = ctx_flags; new_ctx->smpl_arg = smpl_arg; insert_context(new_ctx); LOG_INFO("Created context %d with file-descriptor %d.", new_ctx->id, new_ctx->fd); return 0; error: if (new_ctx) { close(new_ctx->fd); free(new_ctx->event_sets); free(new_ctx); } return rc; } /** * load_context * * Arguments: * * Call the pfm_load_context system-call to load a perfmon context into the * system's performance monitoring unit. **/ static int load_context(int argc, char **argv) { struct context *ctx; struct event_set *evt; struct program *prog; cpu_set_t old_cpu_set; int ctx_id, event_set_id, program_id; int system_wide, rc; int load_pid = 0; ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); program_id = strtoul(argv[3], NULL, 0); if (ctx_id <= 0 || event_set_id < 0 || program_id < 0) { LOG_ERROR("context ID, event-set ID, and program/CPU ID must " "be positive integers."); return EINVAL; } /* Find the context, event_set, and program in the global lists. */ ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide) { if (ctx->cpu >= 0) { LOG_ERROR("Trying to load context %d which is already " "loaded on CPU %d.\n", ctx_id, ctx->cpu); return EBUSY; } rc = set_affinity(program_id, &old_cpu_set); if (rc) { return rc; } /* Specify the CPU as the PID. */ load_pid = program_id; } else { prog = find_program(program_id); if (!prog) { LOG_ERROR("Can't find program with ID %d.", program_id); return EINVAL; } load_pid = prog->pid; } rc = pfm_attach(ctx->fd, 0, load_pid); if (rc) { rc = errno; LOG_ERROR("pfm_attach system call returned " "an error: %d.", rc); return rc; } if (system_wide) { /* Keep track of which CPU this context is loaded on. */ ctx->cpu = program_id; revert_affinity(&old_cpu_set); } LOG_INFO("Loaded context %d, event-set %d onto %s %d.", ctx_id, event_set_id, system_wide ? "cpu" : "program", program_id); return 0; } /** * unload_context * * Arguments: * * Call the pfm_unload_context system-call to unload a perfmon context from * the system's performance monitoring unit. **/ static int unload_context(int argc, char **argv) { struct context *ctx; cpu_set_t old_cpu_set; int system_wide; int ctx_id; int rc; ctx_id = strtoul(argv[1], NULL, 0); if (ctx_id <= 0) { LOG_ERROR("context ID must be a positive integer."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide) { if (ctx->cpu < 0) { /* This context isn't loaded on any CPU. */ LOG_ERROR("Trying to unload context %d that isn't " "loaded.\n", ctx_id); return EINVAL; } rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { return rc; } } rc = pfm_attach(ctx->fd, 0, PFM_NO_TARGET); if (rc) { rc = errno; LOG_ERROR("pfm_attach(detach) system call returned " "an error: %d.", rc); return rc; } if (system_wide) { ctx->cpu = -1; revert_affinity(&old_cpu_set); } LOG_INFO("Unloaded context %d.", ctx_id); return 0; } /** * close_context * * Arguments: * * Close the context's file descriptor, remove it from the global list, and * free the context data structures. **/ static int close_context(int argc, char **argv) { struct context *ctx; struct event_set *evt, *next_evt; int ctx_id; ctx_id = strtoul(argv[1], NULL, 0); if (ctx_id <= 0) { LOG_ERROR("context ID must be a positive integer."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } /* There's no perfmon system-call to delete a context. We simply call * close on the file handle. */ close(ctx->fd); remove_context(ctx); for (evt = ctx->event_sets; evt; evt = next_evt) { next_evt = evt->next; free(evt); } free(ctx); LOG_INFO("Closed and freed context %d.", ctx_id); return 0; } /** * write_pmc * * Arguments: < >+ * * Write values to one or more control registers. **/ static int write_pmc(int argc, char **argv) { struct context *ctx; struct event_set *evt; pfarg_pmr_t *pmc_args = NULL; cpu_set_t old_cpu_set; int ctx_id, event_set_id; int pmc_id, num_pmcs; unsigned long long pmc_value; int system_wide, i, rc; ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } /* Allocate an array of PMC structures. */ num_pmcs = (argc - 3) / 2; pmc_args = calloc(num_pmcs, sizeof(*pmc_args)); if (!pmc_args) { LOG_ERROR("Can't allocate PMC argument array."); return ENOMEM; } for (i = 0; i < num_pmcs; i++) { pmc_id = strtoul(argv[3 + i*2], NULL, 0); pmc_value = strtoull(argv[4 + i*2], NULL, 0); if (pmc_id < 0) { LOG_ERROR("PMC ID must be a positive integer."); rc = EINVAL; goto out; } pmc_args[i].reg_num = pmc_id; pmc_args[i].reg_set = evt->id; pmc_args[i].reg_value = pmc_value; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { goto out; } } rc = pfm_write(ctx->fd, 0, PFM_RW_PMC, pmc_args, num_pmcs * sizeof(*pmc_args)); if (rc) { rc = errno; LOG_ERROR("pfm_write system call returned " "an error: %d.", rc); goto out; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } out: free(pmc_args); return rc; } /** * write_pmd * * Arguments: < >+ * * FIXME: Add options for other fields in pfarg_pmd_t. **/ static int write_pmd(int argc, char **argv) { struct context *ctx; struct event_set *evt; pfarg_pmr_t *pmd_args = NULL; cpu_set_t old_cpu_set; int ctx_id, event_set_id; int pmd_id, num_pmds; unsigned long long pmd_value; int system_wide, i, rc; ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } /* Allocate an array of PMD structures. */ num_pmds = (argc - 3) / 2; pmd_args = calloc(num_pmds, sizeof(*pmd_args)); if (!pmd_args) { LOG_ERROR("Can't allocate PMD argument array."); return ENOMEM; } for (i = 0; i < num_pmds; i++) { pmd_id = strtoul(argv[3 + i*2], NULL, 0); pmd_value = strtoull(argv[4 + i*2], NULL, 0); if (pmd_id < 0) { LOG_ERROR("PMD ID must be a positive integer."); rc = EINVAL; goto out; } pmd_args[i].reg_num = pmd_id; pmd_args[i].reg_set = evt->id; pmd_args[i].reg_value = pmd_value; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { goto out; } } rc = pfm_write(ctx->fd, 0, PFM_RW_PMD, pmd_args, num_pmds * sizeof(*pmd_args)); if (rc) { rc = errno; LOG_ERROR("pfm_write system call returned " "an error: %d.", rc); goto out; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } out: free(pmd_args); return rc; } /** * read_pmd * * Arguments: + * * FIXME: Add options for other fields in pfarg_pmd_t. **/ static int read_pmd(int argc, char **argv) { struct context *ctx; struct event_set *evt; pfarg_pmr_t *pmd_args = NULL; cpu_set_t old_cpu_set; int ctx_id, event_set_id; int pmd_id, num_pmds; int system_wide, i, rc; ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } /* Allocate an array of PMD structures. */ num_pmds = argc - 3; pmd_args = calloc(num_pmds, sizeof(*pmd_args)); if (!pmd_args) { LOG_ERROR("Can't allocate PMD argument array."); return ENOMEM; } for (i = 0; i < num_pmds; i++) { pmd_id = strtoul(argv[3 + i], NULL, 0); if (pmd_id < 0) { LOG_ERROR("PMD ID must be a positive integer."); rc = EINVAL; goto out; } pmd_args[i].reg_num = pmd_id; pmd_args[i].reg_set = evt->id; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { goto out; } } rc = pfm_read(ctx->fd, 0, PFM_RW_PMD, pmd_args, num_pmds * sizeof(*pmd_args)); if (rc) { rc = errno; LOG_ERROR("pfm_read system call returned " "an error: %d.", rc); goto out; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } out: free(pmd_args); return rc; } /** * start_counting * * Arguments: * * Call the pfm_start system-call to start counting for a perfmon context * that was previously stopped. **/ static int start_counting(int argc, char **argv) { struct context *ctx; struct event_set *evt; cpu_set_t old_cpu_set; int ctx_id, event_set_id; int system_wide, rc; ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { return rc; } } rc = pfm_set_state(ctx->fd, 0, PFM_ST_START); if (rc) { rc = errno; LOG_ERROR("pfm_set_state system call returned an error: %d.", rc); return rc; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } LOG_INFO("Started counting for context %d, event-set %d.", ctx_id, event_set_id); return 0; } /** * stop_counting * * Arguments: * * Call the pfm_stop system-call to stop counting for a perfmon context that * was previously loaded. **/ static int stop_counting(int argc, char **argv) { struct context *ctx; cpu_set_t old_cpu_set; int system_wide; int ctx_id; int rc; ctx_id = strtoul(argv[1], NULL, 0); if (ctx_id <= 0) { LOG_ERROR("context ID must be a positive integer."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { return rc; } } rc = pfm_set_state(ctx->fd, 0, PFM_ST_STOP); if (rc) { rc = errno; LOG_ERROR("pfm_set_state(stop) system call returned an error: %d.", rc); return rc; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } LOG_INFO("Stopped counting for context %d.", ctx_id); return 0; } /** * restart_counting * * Arguments: * * Call the pfm_restart system-call to clear the data counters and start * counting from zero for a perfmon context that was previously loaded. **/ static int restart_counting(int argc, char **argv) { struct context *ctx; cpu_set_t old_cpu_set; int system_wide; int ctx_id; int rc; ctx_id = strtoul(argv[1], NULL, 0); if (ctx_id <= 0) { LOG_ERROR("context ID must be a positive integer."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { return rc; } } rc = pfm_set_state(ctx->fd, 0, PFM_ST_RESTART); if (rc) { rc = errno; LOG_ERROR("pfm_set_state(restart) system call returned an error: %d.", rc); return rc; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } LOG_INFO("Restarted counting for context %d.", ctx_id); return 0; } /** * create_eventset * * Arguments: [options] * Options: --timeout * --switch-on-overflow * --exclude-idle **/ static int create_eventset(int argc, char **argv) { pfarg_set_desc_t set_arg; struct context *ctx; struct event_set *evt; cpu_set_t old_cpu_set; int ctx_id, event_set_id; unsigned long timeout = 0; int switch_on_overflow = FALSE; int switch_on_timeout = FALSE; int exclude_idle = FALSE; int new_set = FALSE; int system_wide,c, rc; struct option long_opts[] = { {"next-set", required_argument, NULL, 1}, {"timeout", required_argument, NULL, 2}, {"switch-on-overflow", no_argument, NULL, 3}, {"exclude-idle", no_argument, NULL, 4}, {NULL, 0, NULL, 0} }; memset(&set_arg, 0, sizeof(set_arg)); opterr = 0; optind = 0; while ((c = getopt_long_only(argc, argv, "", long_opts, NULL)) != EOF) { switch (c) { case 1: timeout = strtoul(optarg, NULL, 0); if (!timeout) { LOG_ERROR("timeout must be a " "non-zero integer."); return EINVAL; } switch_on_timeout = TRUE; break; case 2: switch_on_overflow = TRUE; break; case 3: exclude_idle = TRUE; break; default: LOG_ERROR("invalid option: %c", optopt); return EINVAL; } } (void)exclude_idle; if (argc < optind + 2) { USAGE("create_eventset [options] "); return EINVAL; } ctx_id = strtoul(argv[optind], NULL, 0); event_set_id = strtoul(argv[optind+1], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } if (switch_on_timeout && switch_on_overflow) { LOG_ERROR("Cannot switch set %d (context %d) on both " "timeout and overflow.", event_set_id, ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { evt = calloc(1, sizeof(*evt)); if (!evt) { LOG_ERROR("Can't allocate structure for new event-set " "%d in context %d.", event_set_id, ctx_id); return ENOMEM; } evt->id = event_set_id; new_set = TRUE; } set_arg.set_id = event_set_id; set_arg.set_timeout = timeout; /* in nanseconds */ set_arg.set_flags = (switch_on_overflow ? PFM_SETFL_OVFL_SWITCH : 0) | (switch_on_timeout ? PFM_SETFL_TIME_SWITCH : 0); system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { free(evt); return rc; } } rc = pfm_create_sets(ctx->fd, 0, &set_arg, 1); if (rc) { rc = errno; LOG_ERROR("pfm_create_sets system call returned " "an error: %d.", rc); free(evt); return rc; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } if (new_set) { insert_event_set(ctx, evt); } LOG_INFO("%s event-set %d in context %d.", new_set ? "Created" : "Modified", event_set_id, ctx_id); if (switch_on_timeout) { LOG_INFO(" Actual timeout set to %llu ns.", (unsigned long long)set_arg.set_timeout); } return 0; } /** * delete_eventset * * Arguments: **/ static int delete_eventset(int argc, char **argv) { LOG_ERROR("pfm_delete_evtsets not supported in v3.x"); return EINVAL; } /** * getinfo_eventset * * Arguments: **/ static int getinfo_eventset(int argc, char **argv) { pfarg_set_info_t set_arg; struct context *ctx; struct event_set *evt; cpu_set_t old_cpu_set; int ctx_id, event_set_id; int system_wide, rc; memset(&set_arg, 0, sizeof(set_arg)); ctx_id = strtoul(argv[1], NULL, 0); event_set_id = strtoul(argv[2], NULL, 0); if (ctx_id <= 0 || event_set_id < 0) { LOG_ERROR("context ID and event-set ID must be " "positive integers."); return EINVAL; } ctx = find_context(ctx_id); if (!ctx) { LOG_ERROR("Can't find context with ID %d.", ctx_id); return EINVAL; } evt = find_event_set(ctx, event_set_id); if (!evt) { LOG_ERROR("Can't find event-set with ID %d in context %d.", event_set_id, ctx_id); return EINVAL; } set_arg.set_id = evt->id; system_wide = ctx->ctx_flags & PFM_FL_SYSTEM_WIDE; if (system_wide && ctx->cpu >= 0) { rc = set_affinity(ctx->cpu, &old_cpu_set); if (rc) { return rc; } } rc = pfm_getinfo_sets(ctx->fd, 0, &set_arg, 1); if (rc) { rc = errno; LOG_ERROR("pfm_getinfo_evtsets system call returned " "an error: %d.", rc); return rc; } if (system_wide && ctx->cpu >= 0) { revert_affinity(&old_cpu_set); } LOG_INFO("Got info for event-set %d in context %d.", event_set_id, ctx_id); LOG_INFO(" Runs: %llu", (unsigned long long)set_arg.set_runs); LOG_INFO(" Timeout: %"PRIu64, set_arg.set_timeout); return 0; } /** * run_program * * Arguments: * * Start the specified program. After fork'ing but before exec'ing, ptrace * the child so it will remain suspended until a corresponding resume_program * command. We do this so we can load a context for the program before it * actually starts running. This logic is taken from the task.c example in * the libpfm source code tree. **/ static int run_program(int argc, char **argv) { struct program *prog; int program_id; pid_t pid; int rc; program_id = strtoul(argv[1], NULL, 0); if (program_id <= 0) { LOG_ERROR("program ID must be a positive integer."); return EINVAL; } /* Make sure we haven't already started a program with this ID. */ prog = find_program(program_id); if (prog) { LOG_ERROR("Program with ID %d already exists.", program_id); return EINVAL; } prog = calloc(1, sizeof(*prog)); if (!prog) { LOG_ERROR("Can't allocate new program structure to run '%s'.", argv[2]); return ENOMEM; } prog->id = program_id; pid = fork(); if (pid == -1) { /* Error fork'ing. */ LOG_ERROR("Unable to fork child process."); return EINVAL; } else if (!pid) { /* Child */ /* This will cause the program to stop before executing the * first user level instruction. We can only load a context * if the program is in the STOPPED state. This child * process will sit here until we've process a resume_program * command. */ rc = ptrace(PTRACE_TRACEME, 0, NULL, NULL); if (rc) { rc = errno; LOG_ERROR("Error ptrace'ing '%s': %d", argv[2], rc); exit(rc); } execvp(argv[2], argv + 2); rc = errno; LOG_ERROR("Error exec'ing '%s': %d", argv[2], rc); exit(rc); } /* Parent */ prog->pid = pid; insert_program(prog); /* Wait for the child to exec. */ waitpid(pid, &rc, WUNTRACED); /* Check if process exited early. */ if (WIFEXITED(rc)) { LOG_ERROR("Program '%s' exited too early with status " "%d", argv[2], WEXITSTATUS(rc)); return WEXITSTATUS(rc); } LOG_INFO("Started program %d: '%s'.", program_id, argv[2]); return 0; } /** * resume_program * * Arguments: * * A program started with run_program must be 'resumed' before it actually * begins running. This allows us to load a context to the process and * start the counters before the program executes any code. **/ static int resume_program(int argc, char **argv) { struct program *prog; int program_id; int rc; program_id = strtoul(argv[1], NULL, 0); if (program_id <= 0) { LOG_ERROR("program ID must be a positive integer."); return EINVAL; } prog = find_program(program_id); if (!prog) { LOG_ERROR("Can't find program with ID %d.", program_id); return EINVAL; } /* Call ptrace to resume execution of the process. If a context has * been loaded and the counters started, this is where monitoring * is effectively activated. */ rc = ptrace(PTRACE_DETACH, prog->pid, NULL, 0); if (rc) { rc = errno; LOG_ERROR("Error detaching program %d.\n", prog->id); return rc; } LOG_INFO("Resumed program %d.", program_id); return 0; } /** * wait_on_program * * Arguments: * * Wait for the specified program to complete and exit. **/ static int wait_on_program(int argc, char **argv) { struct program *prog; int program_id; int rc; program_id = strtoul(argv[1], NULL, 0); if (program_id <= 0) { LOG_ERROR("program ID must be a positive integer."); return EINVAL; } prog = find_program(program_id); if (!prog) { LOG_ERROR("Can't find program with ID %d.", program_id); return EINVAL; } waitpid(prog->pid, &rc, 0); /* The program has exitted, but if there was a context loaded on that * process, it will still have the latest counts available to read. */ remove_program(prog); free(prog); LOG_INFO("Waited for program %d to complete.", program_id); return 0; } /** * _sleep * * Arguments: